This example will demonstrate how to fit a non-linear least-squares model in Incanter using the non-linear-model function.
This example will use a data set from NIST, which are the result of a NIST study involving ultrasonic calibration. The response variable is ultrasonic response, and the predictor variable is metal distance.
You will need the incanter.core, incanter.optimize, incanter.charts, and incanter.datasets libraries.
Load the necessary Incanter libraries.
(use '(incanter core optimize charts datasets))
For more information on using these packages, see the matrices, datasets, and sample plots pages on the Incanter wiki.
Now load the NIST data set called Chwirut, (you can find more information on this data set at http://www.itl.nist.gov/div898/strd/nls/data/chwirut1.shtml)
(def data (to-matrix (get-dataset :chwirut))) (def x (sel data :cols 1)) (def y (sel data :cols 0))
Now view the data
(def plot (scatter-plot x y :legend true)) (view plot)
Clearly, x and y do not have a linear relationship, so we will try the exponential model used by NIST,
y = exp(-b1*x)/(b2+b3*x) + e
Here’s a Clojure function that implements this model
(defn f [theta x] (let [[b1 b2 b3] theta] (div (exp (mult (minus b1) x)) (plus b2 (mult b3 x)))))
This function will be passed to the non-linear-model function, and will need to take a vector of parameters, theta, and a value (or vector of values) of x as arguments, and return one or more values of y.
The non-linear-model function uses either the Gauss-Newton (default) or Newton Raphson (available using the :method option) algorithms to estimate the values of theta that minimizes the residual sum of squares. Both algorithms require starting values for the parameters. Finding these values is a bit of a black art, and may require plotting different values. In this case we will start by plotting some start values, provided by NIST, on the scatter plot of the data.
(def start1 [0.1 0.01 0.02]) (add-lines plot x (f start1 x))
These values, provide a generally correct shape for the model function, so we will use them as the start values in the non-linear-model function, using the default Gauss-Newton algorithm.
(def nlm1 (non-linear-model f y x start1))
And then plot the fitted values on the scatter-plot
(add-lines plot x (:fitted nlm1))
The fit looks pretty good. Interestingly, these start values failed to converge to the correct solution when the optional Newton-Raphson algorithm was used. The Newton-Raphson algorithm performed correctly with a second set of start values that were closer to the final estimates.
The complete code for this example can be found here.
- Newton’s method (Wikipedia)
- Gauss-Newton algorithm (Wikipedia)
this is all pretty sweet… but are you benchmarking/validating your processes against established stats packages? i do assume that you’re checking against R but what about the other elephants in the room (i.e., SAS, SPSS)?
Hi Matthew, I am indeed validating against R, which at this point makes sense since most of Incanter’s functions map, more or less, directly to R functions and not really to SAS or SPSS functions (yet). As I start developing higher level functions (e.g. EFA/CFA/SEM) I’ll start using other packages for validation (e.g. LISREL).
Another method for validation I use is to compare my results with certified results. NIST has data sets for which they have computed and certified estimates of model parameters and their associated statistics. So far I have validated Incanter’s results on linear and non-linear regression against these benchmarks.
The data sets and estimated parameters and statistics can be found here:
While it is nice to have the pretty picture that we see in the graph, this page is about fitting data. For the discussion to be complete I think one last command should be included. To obtain the parameters that fit the data we use the command:
First of all I’m really fascinated by the possibility to do statistical work in Clojure/Lisp.
As I understand – maybe I am mistaken -, there is currently no function in Incanter for NLP optimiziation with (possibly) non-linear constraints. Is addition of such an algorithm planned ?
Wait a second, the code doesn’t work when the x’s are vectors of values. Is this a dated feature? Or is there some bug in the system?