Category Archives: Clojure

Dynamic charts with Incanter

I had the opportunity to attend last week’s PragmaticStudio Clojure workshop taught by Rich Hickey and Stuart Halloway (I highly recommend it, and there are still seats open for the May class). During the three days I talked with Rich about features he’d like to see in Incanter, and the first thing he asked about was adding dynamic charts, like are available in Mathematica using the Manipulate function. So I ended up spending much of my lab time working on this feature, the first draft of which is now available.

Incanter has three new macros, sliders, dynamic-xy-plot, and dynamic-scatter-plot. The sliders macro can bind multiple named sequences to an equal number of JSlider widgets. When a slider is manipulated a user defined expression is evaluated. For instance, the following code will display two slider widgets bound to two sequences, x and y.

(sliders [x (range -3 3 0.01)
          y (range 0.01 10 0.1)]
  (foo x y))
  

Each time one of the sliders is manipulated the expression (foo x y) will be evaluated with the new value of either x or y. Presumably, foo has side effects, like changing the value of a ref or manipulating a GUI widget, since it is running in the separate thread used by the slider widget.

I then combined this macro with incanter.charts/set-data function to create dynamic versions of xy-plot and scatter-plot, named appropriately dynamic-xy-plot and dynamic-scatter-plot respectively.

The following example creates an xy-plot of a sequence of values named x versus the normal PDF of x, and displays two sliders bound to the mean and standard deviation of the PDF.

(let [x (range -3 3 0.1)]
  (view (dynamic-xy-plot [mean (range -3 3 0.1)
                          std-dev (range 0.1 10 0.1)]
          [x (pdf-normal x :mean mean :sd std-dev)])))

The expression provided to dynamic-xy-plot must produce a sequence containing either two sequences with N values, or N sequences with two values each. In other words, a N-by-2 matrix or a 2-by-N matrix, where N is the number of points plotted. The expression above,

[x (pdf-normal x :mean mean :sd std-dev)]

produces a vector containing a sequence called x and the sequence produced by calling pdf-normal on x (equivalent to a N-by-2 matrix).

Manipulating the sliders will change the shape and position of the curve.

The dynamic-scatter-plot macro works the same way as dynamic-xy-plot. All three macros are available in the version of Incanter on Github and in repo.incanter.org.

Incanter blog post roundup

There have been several cool blog posts over the last few weeks featuring Incanter that I would like to highlight here.

Incanter code repository

It’s been a long time coming, but Incanter is available in a public code repository (http://repo.incanter.org) once again.

The version available in the Clojars repository has grown increasingly out of date, but couldn’t be updated due to Incanter’s new modular build structure; a structure that lets developers include only the subset of Incanter functionality that they need.

For instance, if you only need the functionality found in incanter.core and incanter.stats, then include the incanter-core dependency in your project.clj file. If you need charts but, not Processing visualizations, include incanter-charts. If you want access to the incanter.chrono library, but nothing else, use the incanter-chrono dependency.

Instructions for building Leiningen-based projects using the repository are available on the Incanter repo’s homepage, http://repo.incanter.org.

Data Sorcery with Clojure & Incanter: Introduction to Datasets & Charts

I put together my slides (pdf) for next week’s National Capital Area Clojure Users Group February Meetup. Being snow-bound this week, I’ve been able to make more slides than I’ll have time to cover during next week’s session, so I’ll be skimming over some of the examples.

Russ Olsen will start the session with an introduction to Clojure, so if you’re in the D.C. area next Thursday (February 18), sign-up for the meetup.

The code used in this presentation is available here, and a more printer-friendly version of the presentation itself, with a white background, is available here.

Dark theme for Incanter charts

JFreeChart has been a fantastic library, I’ve been able to include useful charting functionality in Incanter very quickly because of it, but I’m not a big fan of its default visual theme. Eventually I’d like to create some new themes, or better yet include themes created by others, but in the meantime I have created the set-theme function, which accepts a chart and either a keyword indicating a built-in theme or a JFreeChart ChartTheme object, and applies the theme to the chart.

At the moment, the only built-in themes are :default and :dark, but hopefully that will change in the future.

Here’s an example of using set-theme. First I’ll create a chart with the default theme,

(use '(incanter core charts datasets))

(with-data (get-dataset :iris)
  (view (scatter-plot :Sepal.Length :Sepal.Width :group-by :Species)))

and here’s the same scatter-plot with the dark theme.

(with-data (get-dataset :iris)
  (doto (scatter-plot :Sepal.Length :Sepal.Width :group-by :Species)
    (set-theme :dark)
    view))

The set-theme function is available in the latest version Incanter @ Github.

I have also added the incanter-pdf module discussed in the previous blog post, but it isn’t installed by default. To install it in your local Maven repository, run ‘mvn install’ from the incanter/modules/incanter-pdf directory.

Saving Incanter charts as PDF documents

Incanter charts can be saved as PNG files using the save function, but I had a request earlier today to add the ability to save them as PDF documents.

So I’ve created a new function called save-pdf in a new package called incanter.pdf.

Here’s a basic example.

(use '(incanter core charts pdf))
(save-pdf (function-plot sin -4 4) "./pdf-chart.pdf")

Which outputs the following PDF file.

Working with R from Clojure and Incanter

Joel Boehland has introduced Rincanter, which lets you use R from Clojure and Incanter. This is fantastically cool, as it opens up the vast number of R libraries to Clojure/Incanter, translating between R and Clojure data types, including Incanter datasets.

Check out Joel’s latest blog post, All your datasets R belong to us (I love that name), where he introduces Rincanter and demonstrates its use.

Working with data sets in Clojure with Incanter and MongoDB

This post will cover some new dataset functionality I’ve recently added to Incanter, including the use of MongoDB as a back-end data store for datasets.

Basic dataset functionality

First, load the basic Incanter libraries

(use '(incanter core stats charts io))

Next load some CSV data using the incanter.io/read-dataset function, which takes a string representing either a filename or a URL to the data.

(def data
  (read-dataset
    "http://github.com/liebke/incanter/raw/master/data/cars.csv"
     :header true))

The default delimiter is \, but a different one can be specified with the :delim option (e.g. \tab). The cars.csv file is a small sample data set that is included in the Incanter distribution, and therefore could have been loaded using get-dataset,

(incanter.datasets/get-dataset :cars)

See the documentation for get-dataset for more information on the included sample data.

We can get some information on the dataset, like the number of rows and columns using either the dim function or the nrow and ncol functions, and we can view the columns names with the col-names function.

user> (dim data)
[50 2]
user> (col-names data)
["speed" "dist"]

We can see that there are just 50 rows and two columns and that the column names are “speed” and “dist”. The data are 50 observations, from the 1920s, of automobile breaking distances observed at different speeds.

I will use Incanter’s new with-data macro and $ column-selector function to access the dataset’s columns. Within the body of a with-data expression, columns of the bound dataset can be accessed by name or index, using the $ function, for instance ($ :colname) or ($ 0).

For example, the following code will create a scatter plot of the data (speed vs. dist), and then add a regression line using the fitted values returned from the incanter.stats/linear-model function.

(with-data data
  (def lm (linear-model ($ :dist) ($ :speed)))
  (doto (scatter-plot ($ :speed) ($ :dist))
    (add-lines ($ :speed) (:fitted lm))
    view))

Within the with-data expression, the dataset itself is bound to $data, which can be useful if you want to perform operations on it. For instance, the following code uses the conj-cols function to prepend an integer ID column to the dataset, and then displays it in a window.

(with-data (get-dataset :cars)
  (view (conj-cols (range (nrow $data)) $data)))

The conj-cols function returns a dataset by conjoining sequences together as the columns of the dataset, or by prepending/appending columns to an existing dataset, and the related conj-rows function conjoins rows.

We can create a new dataset that adds the fitted (or predicted values) to the original data using the conj-cols function.

(def results (conj-cols data (:fitted lm)))

You’ll notice that the column names are changed to generic ones (i.e. col-0, col-1, col-2), this is done to prevent naming conflicts when merging datasets. We can add more meaningful names with the col-names function.

(def results (col-names data [:speed :dist :predicted-dist]))

We could have used the -> (thread) macro to perform both steps, as well as add the residuals from the output of linear-model to the dataset

(def results (-> (conj-cols data (:fitted lm) (:residuals lm))
                 (col-names [:speed :dist :predicted :residuals])))

Querying data sets with the $where function

Another new function, $where, lets you query an Incanter dataset using a syntax based on MongoDB and Somnium’s Congomongo Clojure library.

To perform a query, pass a query-map to the $where function. For instance, to get the rows from the results data set where the value of speed is 10, use

($where {:speed 10} results)

For the rows where the speed is between 10 and 20, use

($where {:speed {:$gt 10 :$lt 20}} results)

For rows where the speed is in the set #{4 7 24 25}, use

($where {:speed {:$in #{4 7 24 25}}} results)

Or not in that set,

($where {:speed {:$nin #{4 7 24 25}}} results)

Like the $ function, $where can be used within with-data, where the dataset is passed implicitly. For example, to get the mean speed of the observations that have residuals between -10 and 10 from the results dataset,

(with-data results
  (mean ($ :speed ($where {:residuals {:$gt -10 :$lt 10}}))))

which returns 14.32.

Query-maps don’t support ‘or’ directly, but we can use conj-rows to construct a dataset where speed is either less than 10 or greater than 20 as follows:

(with-data results
  (conj-rows ($where {:speed {:$lt 10}})
             ($where {:speed {:$gt 20}})))

An alternative to conjoining query results is to pass $where a predicate function that accepts a map containing the key/value pairs of a row and returns a boolean indicating whether the row should be included. For example, to perform the above query we could have done this,

(with-data results
  ($where (fn [row] (or (< (:speed row) 10) (> (:speed row) 20)))))

Storing and Retrieving Incanter datasets in MongoDB

The new incanter.mongodb library can be used with Somnium’s Congomongo to store and retrieve datasets in a MongoDB database.

MongoDB is schema-less, document-oriented database that is well suited as a data store for Clojure data structures. Getting started with MongoDB is easy, just download and unpack it, and run the following commands (on Linux or Mac OS X),

$ mkdir -p /data/db
$ ./mongodb/bin/mongod &

For more information, see the MongoDB quick start guide.

Once the database server is running, load Incanter’s MongoDB library and Congomongo,

(use 'somnium.congomongo)
(use 'incanter.mongodb)

and use Congomongo’s mongo! function to connect to the “mydb” database on the server running on the localhost on the default port.

(mongo! :db "mydb")

If mydb doesn’t exist, it will be created. Now we can insert the results dataset into the database with the incanter.mongodb/insert-dataset function.

(insert-dataset :breaking-dists results)

The first argument, :breaking-dists, is the name the collection will have in the database. We can now retrieve the dataset with the incanter.mongodb/fetch-dataset function.

(def breaking-dists (fetch-dataset :breaking-dists))

Take a look at the column names of the retrieved dataset and you’ll notice that MongoDB added a couple, :_ns and :_id, in order to uniquely identify each row.

user> (col-names breaking-dists)
[:speed :_ns :_id :predicted :residuals :dist]

The fetch-dataset function (and the congomongo.fetch function that it’s based on) support queries with the :where option. The following example retrieves only the rows from the :breaking-dists collection in the database where the :speed is between 10 and 20 mph, and then calculates the average breaking distance of the resulting observations.

(with-data (fetch-dataset :breaking-dists
			  :where {:speed {:$gt 10 :$lt 20}})
  (mean ($ :dist)))

The syntax for Congomongo’s query-maps is nearly the same as that for the $where function, although :$in and :$nin take a Clojure vector instead of a Clojure set.

For more information on the available functionality in Somnium’s Congomongo, visit its Github repository or read the documentation for incanter.mongodb

(doc incanter.mongodb)

The complete code for this post can be found here.

Setting up Clojure, Incanter, Emacs, Slime, Swank, and Paredit

Emacs is the favored development environment for the majority of Clojure developers, and there are good reasons for that, but personally, I don’t think it should be the first choice of developers new to Clojure, unless they have used it previously; it’s just too much to learn at once.

I recommend people use an editor they’re comfortable with, combined with a command-line REPL. There is no reason to tackle the complexities of configuring and using Emacs, Slime, and Swank until you’ve got your head around the basics of Clojure and functional programming. Once you’ve got the basics down though, it’s worth venturing into the arcane world of Emacs. You may decide it’s not for you, and luckily there are alternatives, from your favorite editor combined with a REPL to plugins for popular IDEs like Netbeans (Enclojure), IntelliJ (La Clojure), and Eclipse (Counter-Clockwise).

But you’ll never know if it’s for you unless you give it a try. So, I’ll be demonstrating how to build and install Incanter (which includes Clojure and Clojure-contrib), and then set up a development environment with Emacs, Slime, Swank, and Paredit.

Setting up Clojure and Incanter

Incanter is available on Clojars, so you can include it in your projects by adding it as a dependency in your project.clj file (see below). Alternatively, you can clone the full distribution from Github, which includes REPL and Swank start up scripts. Note: The repl scripts are necessary, since Leiningen’s repl task does not work correctly with Clojure 1.2.

The following examples will assume you cloned Incanter’s github repository (see below for instructions) and have the repl and swank scripts included with the distribution.

First, you’ll need Git and Leiningen to grab and build Incanter. First clone Incanter from its Github repository:

$ git clone git://github.com/liebke/incanter.git

This will create an incanter subdirectory

$ cd incanter

Use lein deps to downloads the necessary dependencies:

$ lein deps

Once this process is complete, you can start a Clojure REPL with all of Incanter’s dependencies pre-configured on the CLASSPATH by either using the repl scripts included in the script/ directory.

$ script/repl

or on Windows,

$ script/repl.bat

or you can start it directly with the java command:

$ java -cp 'lib/*' clojure.main

This will present you with the user=> prompt. As a simple example of using Incanter from the REPL, we’ll generate a line plot of the sine function over the range -4 to 4, first load the necessary Incanter libraries:

user=> (use '(incanter core charts))

and then use the function-plot function:

user=> (view (function-plot sin -4 4))

Now that we know Incanter and Clojure are installed correctly, let’s set up an Emacs development environment.

Setting up and using Emacs, Swank, Slime, and Paredit

I’m a long time vi/vim user and I typically use MacVim, but I have recently gone back to Emacs (the editor I used when I first learned Lisp) in order to take advantage of Slime, Swank, and Paredit. Doing most of my development on a Macbook, I like Aquamacs, which blends standard OS X and Emacs behaviors. Another nice option on the Mac is Carbon Emacs.

The procedure I’m going to use to setup the Emacs development environment is based on the instructions provided by Phil Hagelberg (a.k.a Technomancy) in this blog post and in the README for his fork of swank-clojure.

The best way to install the necessary packages (clojure-mode, slime, slime-repl, swank-clojure) is by using the Emacs Lisp Package Archive, or ELPA.

To access ELPA, use the following command:

M-x package-list-packages

The meta-key on the Mac for most flavors of Emacs is the command key, but with Aquamacs it’s the alt/option key.

If the ‘package-list-packages’ command cannot be found, you’ll need to paste the following snippet of elisp in your *scratch* buffer and then evaluate it, (go here for more detailed instructions).

 (let ((buffer (url-retrieve-synchronously
	       "http://tromey.com/elpa/package-install.el")))
  (save-excursion
    (set-buffer buffer)
    (goto-char (point-min))
    (re-search-forward "^$" nil 'move)
    (eval-region (point) (point-max))
    (kill-buffer (current-buffer))))

In Aquamacs, you’ll evaluate it by placing your cursor right after the last parentheses and entering:

C-x C-e

On most other version of Emacs, including Carbon Emacs, you’ll enter

C-j

Once this has been done, you should be able access ELPA with:

M-x package-list-packages

You’ll see a list of packages, either scroll down to find or search for, using C-s, the following packages:

  • clojure-mode
  • slime
  • slime-repl
  • swank-clojure
  • paredit

When you’re cursor is on the appropriate package, hit the i key to select it. Once all the packages are selected, hit x to begin their installation. When it’s complete, you might see some warnings, but don’t worry about them.

Slime is an Emacs-mode for editing Lisp/Clojure code and Swank is a back-end service that Slime connects to, letting you evaluate Clojure code from within Emacs. Paredit provides some additional Clojure/Lisp editing functionality, although, like Emacs, it requires some getting used to (see mudphone’s introduction to Paredit presentation and the Paredit cheat sheet).

Now it’s time to start up a Swank server that will let us run Clojure code from Emacs. We can use Leiningen to start one up that is pre-configured with all of Incanter’s dependencies with the script/swank script, or by running the following Leiningen command Incanter directory:

$ lein swank

This will generate some messages, ending with

Connection opened on local port  4005
#<ServerSocket ServerSocket[addr=0.0.0.0/0.0.0.0,port=0,localport=4005]>

Now we need to connect to the server from Emacs with the following command:

M-x slime-connect

It will prompt you for the IP address and port of the server, just use the defaults it offers. It may then show the following prompt:

Versions differ: nil (slime) vs. 2009-09-14 (swank). Continue? (y or n)

Just say ‘yes’. You will then get a message confirming you’re connected, and a window will open with a Clojure REPL and a ‘user>’ prompt. A cool feature of slime-connect is that you can connect to a swank server on a remote system, just provide the system’s IP address or host name, instead of the default 127.0.0.1, when prompted.

Now open or create a Clojure file, using ‘C-x C-f’ (or using ‘command-o’ or ‘command-n’ in Aquamacs). If you’re creating a new file, give it a *.clj suffix and Emacs will start clojure-mode automatically.

Now start up Paredit,

M-x paredit-mode

You’re now ready to edit Clojure code. Start by loading a few Incanter libraries with the following line:

(use '(incanter core stats charts))

You’ll notice that closing parens are automatically created when you create an opening paren, this is due to Paredit. You can evaluate this block of code by placing your cursor right after the last paren, and entering ‘C-x C-e’. You should see the return value, nil, in the Emacs message pane.

Now let’s generate a plot of the PDF of the Normal distribution, over the range -3 to 3, by entering and evaluating the following line:

(view (function-plot pdf-normal -3 3))

That’s it, you’re all set up. Have fun!

See also:

Funding Open Source Projects

Rich Hickey, the creator of the Clojure language, made an interesting request yesterday, he needs help funding Clojure’s development. For the last few years, he has essentially been the sole financial backer for the project, and now there is a need for additional financial support, so he can continue developing Clojure full-time.

The request spurred an outpouring of support for Rich and Clojure, raising more than half of the target amount within a day, with more than 187 individuals and five companies providing support so far.

The Clojure language is one of the primary reason Incanter has been such a joy to develop and use, and I have been a past and present financial contributor to Clojure, as well as to the other projects that provide the foundation that I built Incanter on, like Parallel Colt and JFreeChart. I hope you’ll join me in helping fund great open-source projects like these.

To help fund Clojure, visit its funding page. To help fund Piotr Wendykier’s Parallel Colt project, visit his donation page, and to help fund JFreeChart, purchase their developer’s guide.

Many other projects lay at the core of Incanter, and although they don’t all require funding, you can help in other ways by contributing your talent. To learn more about contributing to Processing, visit its contribute page, and to learn more about helping out with Incanter itself, visit its Google group.

David