Curve fitting on the web

| 8 Comments

We once collected the following data from a certain chemical process:

optical_density.png

The curve looks smooth and could be governed by some meaningful physical law. However, what would be a good model? There is probably quite a number of physical laws that would fit the observed data very well. Wouldn't it be nice if a piece of software would examine a large number of known physical laws and check them on this data? ZunZun.com is such a piece of software, and it runs directly from the browser. After plugging my data in, ZunZun gave me a ranked list of functions that fit it, and the best ranked was the "Gaussian peak with offset" (y = a * e(-0.5 * (x-b)^2 / c^2) + d):

DepDataVsIndepData1243490136505.png

Number two was "Sigmoid with offset" (y = a / (1.0 + e(-(x-b)/c)) + d).

In all, ZunZun may help you find a good nonlinear model when all you have is data.

8 Comments

I'm glad you found my hobby web site useful. I'm currently working on adding fit statistics for the fitted parameter estimations. Any suggestions or requests for the site?

James

Nice fit, but from a chemical point of view, this will not be very helpful, in the sense that the fitting function is completely unrelated to chemical laws...

do they return AIC, BIC, DIC, or CV error to help decide which models are good fits and which models are overfitting? is N-th order polynomial (where N=number of points) an option?

James, excellent job! The single best suggestion is actually by Bob - AIC (fast) or cross-validation (less fast) would be a very useful addition, as I've had some problems when complex polynomials would be fitted to a small data set. AIC and CV penalize the number of parameters, so one needs more data to compensate for the complexity. If you do cross-validation, I would recommend performing several replications to increase stability.

Pascal, well, fitting a simple function might help us identify the underlying laws easier than a mass of noisy data points. But I agree that functional dependency is not yet a law.

Do you consider it a problem that with a big enough library of functions, you can describe anything, essentially over fitting your data?

If I recall correctly, fully describing a function without knowing it explicitly requires estimating its infinite derivatives, just as doing so for a probability distribution requires infinite moments.

Is this just another moment in model making where you have to manage the reasonableness of the complexity of the model against the quality of the fit?

OneEyedMan, a Bayesian would assign priors to different functions. Some functions would be a priori more likely than others. AIC, for example, is analogous to a prior in the sense that it favors models with fewer parameters.

Sometimes, however, you need 'metadata' - the knowledge about the nature of your process, beyond the measurements themselves.

Currently adding the fit statistics from
http://scipy.org/Cookbook/OLS which include AIC.

As soon as I finish the covariance matrix for nonlinear functions I'll add this work to the site (http://zunzun.com) and the BSD-licensed source code download (http://sf.net/projects/pythonequations).

I expect to be done this week.

James

Wrapping up fit statistics now.

Leave a comment

Subscribe to Entry

Email:

Recent Comments

  • James Phillips: Wrapping up fit statistics now. read more
  • James R. Phillips: Currently adding the fit statistics from http://scipy.org/Cookbook/OLS which include AIC. read more
  • Aleks: OneEyedMan, a Bayesian would assign priors to different functions. Some read more
  • OneEyedMan: Do you consider it a problem that with a big read more
  • Aleks: James, excellent job! The single best suggestion is actually by read more
  • bob: do they return AIC, BIC, DIC, or CV error to read more
  • Pascal PERNOT: Nice fit, but from a chemical point of view, this read more
  • James R. Phillips: I'm glad you found my hobby web site useful. I'm read more