Two goals have always been kept in mind during the development of Incanter 1) to provide an R-like, interactive statistical and graphics environment for performing data analysis, and 2) to provide a collection of libraries that can be embedded in larger data analysis systems, taking advantage of both the power of the Clojure language and the rich set of libraries available on the JVM for accessing and processing data.
Incanter has provided much of what you would expect from R by combining the power of the Clojure language with Java libraries, like Parallel Colt, jFreeChart, and Processing. The Clojure language itself is well suited for data processing due to its data structures, the sequence abstraction, the powerful sequence processing functions, destructuring, data structure literals, and myriad other niceties.
Now, Incanter’s goal of being embed-able in larger data analysis systems has taken a giant step forward with the merger of Flightcaster‘s code. Flightcaster’s service for predicting airline arrival times is a Clojure- and Hadoop-based distributed statistical-learning system that now has Incanter at the core. Their setup demonstrates how Incanter can scale up to larger data sets by leveraging JVM-based projects like Hadoop, AKKA, or other Clojure-based distributed computing systems. And with the Flightcaster code, Incanter has new functionality in classification, information theory, more probability, more io capabilities, a large range of dependence and similarity measures, and a variety of data transformation functions.
And this marks just the beginning of Flightcaster’s participation in the development of Incanter. Together we are working on a number of algorithms for learning models, structure, ensembles, and more. Future work will also ensure that Incanter can work seemlessly with FlightCaster’s recently open-source Crane framework for managing distributed processes on Hadoop and Amazon’s EC2.