Nonlinear Least Squares Curve Fitting

Summary

There is no general solution for the minimization of chi-squared to find the best estimates of the parameters when the model is nonlinear. A number of methods have been developed that can often find the best estimates and their uncertainties with arbitrary precision. Iterative methods are usually used to minimize a nonlinear chi-squared. Another brute force method, called the grid method, tests every possible set of parameters with a certain precision. This is usually not possible when the parameter space or the number of parameters are large. Commercial curve fitting software usually implements an iterative method called the Levenberg-Marquardt method. If you have a nonlinear curve fitting software you can also apply it to linear models also if you want, but there is no guarantee you get the analytic solution. In this lesson, we discuss how to implement the grid method for simple problems. We give an example of Newton’s method as a basic iterative method. We explore the gradient descent method, Gauss-Newton method, and the Levenberg Marquardt method both theoretically and in MATLAB. Users of curve fitting software should understand the underlying algorithms and be at least able to check if the software is performing the calculation correctly.

The minimization of nonlinear data

We minimize the \chi^2 error function in least squares theory.

    \begin{align*} \chi^2 = \sum_{i=1}^N \cfrac{ (y_i - f(x_i; \mathbf{a}))^2 }{\sigma_i^2} \end{align*}

We showed in a previous lesson that if f is a linear model, or

    \begin{align*} f(x_i; \mathbf{a}) = a_1 X_1(x_i) + a_2 X_2 (x_i) + \dots + a_M X_M(x_i) \end{align*}

Then \chi^2 has a unique global minimum and we can analytically derive expressions for the best estimates of the model parameters and their uncertainties, \mathbf{a} and {\sigma}_{a_j}.

When f is not of the above form then it is nonlinear. An example is

    \[f(x_i;\mathbf{a}) = a_1 \sin(a_2 x_i) + a_3 e^{a_4 x_i}\]

When you take the derivative of \chi^2 for a nonlinear model, you get a nonlinear equation which may have no general analytical solution in terms of linear algebra. One then has to resort to different methods. One typically resorts to iterative methods that require an initial guess for \mathbf{a} and update rules for the parameters that move towards the minimum of \chi^2. One repeats this a number of times until the algorithm converges to some minimum of \chi^2. This may be a local minimum of the actual global minimum. There is not a general test to show that a local minimum is actually the global minimum. It is always a good idea to start with different initial guesses and see if the same minimum is reached. This is no guarantee, but it is generally tolerated.

We want to work on some nonlinear data in this lesson so we can take a simple nonlinear model. For example, a common nonlinear model is a mass oscillating on a spring with damping.

    \[m \ddot{y} + 2b \dot{y} + ky = 0\]

For a certain choice of constants, b^2 - km = 0, the motion is called critically damped and obeys the following solution.

    \[y = (A + Bt)\exp(-Ct)\]

We can add some random noise to that function and take the average of five samples to generate our data set. This way we know how close different approach fit to the actual parameters. Now we generate a dataset in MATLAB and save it as a .mat file.

Example Generate some sample data that represent critical damping.
We take five simulated measurements and average them.

clear;figure(1);clf;
A = 2;
B = -5;
C = 1;
xi = 0:0.15:5;
Ni = 5;
ymodel = (A+B.*xi).*exp(-C.*xi);
yi = [];
sigi = [];

for i = 1:length(xi)
temp = [];
for j = 1:Ni
temp(j) = (A + B.*xi(i)).*exp(-C.*xi(i)) + normrnd(0,0.5);
end
yi(i) = mean(temp);
sigi(i) = std(temp)/sqrt(Ni);
end

subplot(2,1,1);
plot(xi,ymodel,’.-k’,’markersize’,20);
axis([0 5 -2 3]);
hold on;
xlabel( ‘Time [s]’,’fontsize’,20);
ylabel( ‘Position [m]’,’fontsize’,20);
title(‘True values for critical damping’,’fontsize’,20);
subplot(2,1,2)
errorbar(xi,yi, sigi,’.k’,’markersize’,15);
xlabel( ‘Time [s]’,’fontsize’,20);
ylabel( ‘Position [m]’,’fontsize’,20);
title(‘Simulated values for critical damping’,’fontsize’,20);
axis([0 5 -2 3]);
hold on;
plot(xi,ymodel,’-k’,’markersize’,20)
save criticaldampingdata xi yi sigi

Brute force grid method

If you have the computational power then there is a brute force approach to curve fitting. This method is called the grid method. Suppose from the graph of some linear data you can tell that A is between 3 and 4 and B is between 0.5 and 1.5. Then you can calculate \chi^2 for every value in between to 0.01 or 0.001 resolution. You then just find the least one by direct search in Matlab. Of course, those might take 10,000 or 100,000 calculations but you will get the same answer as the analytical result. You can do a coarse grid to find local minima and then do a fine grid to find the global minimum. One advantage of the grid method is that you don’t need to know much theory and it is pretty obvious how it works. If you cannot narrow the parameter space to include the global minimum then you might require too many calculations to perform the optimization.

We will now study some iterative methods that can be used to minimize chi-squared. Grid method is not really an iterative method like the rest. You try to cast a net over the parameter surface. You see which loop of the net produces the minimum, then you cast a finer net over that loop and rapidly converge towards a minimum. This method is good because you get out of it what you expect. When you have lots of parameters it might not be possible to cast a fine enough net to progress because there are too many combinations.

Example. Implement the grid method on Matlab to fit a straight line. Compare the results to
We can implement the grid method to fit a straight line model. It works just the same with nonlinear models.

%Grid Method

clear; clf;

A = 10;
B = 3;
Ni = 15;
xi = 1:2:15

for i = 1:length(xi)
tx = [];
for j = 1:Ni
tx(j) = A + B*xi(i) + normrnd(0,3);
end

ei(i) = std(tx)/sqrt(Ni);
yi(i) = sum(tx)/Ni;
end

errorbar(xi,yi,ei,’.k’, ‘markersize’,10);
xlabel(‘trial’)
ylabel(‘measurement’)
hold on;
[A sigA B sigB] = lineABx(xi,yi,ei)

A
sigA
B
sigB

a = 9.5:0.0001:10.5;
b = 2.9:0.0001:3.1;

for j = 1:length(a)
for k = 1:length(b)
chisquared(j,k) = sum( 1./ei.^2.*(yi-b(k).*xi – a(j)).^2 );
end
end

g = min(min(chisquared));
[I J] = find(chisquared == g);
AA = a(I);
BB = b(J);

A
AA
B
BB

Important matrix math

We can represent \chi^2 in terms of vectors. Let

    \begin{align*} \chi^2 = \sum_{i=1}^N \cfrac{ (y_i - f(x_i;\mathbf{a}))^2}{\sigma_i^2} \end{align*}

We define the residual vector.

    \begin{align*} \mathbf{r} = y_i - f_i \end{align*}

We define the weight matrix which is diagonal.

    \begin{align*} W_{ij} = \cfrac{1}{{\sigma}_i^2}\delta_{ij} \end{align*}

So we can write \chi^2 as

    \begin{align*} \chi^2(\mathbf{a}) = \mathbf{r^TWr} \end{align*}

We go back and forth between column and row vectors using the transpose property. Some properties of transpose are

    \begin{align*} \mathbf{(A^T)^T} &= \mathbf{A}\\ \mathbf{A^T=A} &\to \mathbf{A} \quad symmetric\\ \mathbf{(A+ B)}^T &= \mathbf{A}^T + \mathbf{B}^T \\ \mathbf{(AB)^T} &= \mathbf{B^TA^T} \end{align*}

When you multiply an NxM matrix with an MxL matrix you get an NxL matrix. You can use size(A) command in Matlab to see how your matrices are arranged. You can use the transpose operator A’ to convert a column vector to a row vector and vice versa. You may have to check your expression if you haven’t kept track of your transposes.

We will be taking partial derivatives of \chi^2 with respect to the parameters which is the Jacobian matrix.

    \begin{align*} \cfrac{\partial \mathbf{f}}{\partial \mathbf{a}}& = \mathbf{J} \\ \cfrac{\partial \mathbf{r}}{\partial \mathbf{a}}& = - \mathbf{J} \end{align*}

We also have the Hessian matrix which is defined as

    \begin{align*} H_{ij} = \cfrac{\partial^2 \chi^2}{\partial a_j \partial a_k} \end{align*}

Here is an important result from Matrix calculus for a symmetric matrix \mathbf{W}.

    \begin{align*} \cfrac{\partial (\mathbf{h^TWh})}{\partial \mathbf{x}}= 2\mathbf{h^TW}\cfrac{\partial \mathbf{h}}{\partial \mathbf{x}} \end{align*}

Newton’s method

We will explore Newton’s method because it is a simple example of an iterative method which we will be using to minimize \chi^2. Consider the following diagram.


Illustration of Newton’s method

We have

    \[f'(x) \approx \cfrac{f(x_n)}{x_n - x_{n+1}}\]

We can rearrange the expression as

    \begin{align*} x_{n+1} &= x_n - \cfrac{f(x_n)}{f'(x_n)} \end{align*}

So if we know x_1 the initial guess, we can find x_2. Then we can plug back in x_2 in the next iteration to find x_3 and so on.
We get closer and closer to the point where f(x^*) = 0 with each iteration.

Example Find the square root of two to high precision.
We can try out Newton’s method to calculate the square root of 2. If we have the equation,

    \begin{align*} x^2 = 2 \end{align*}

Then we can write f(x) = x^2 -2. This function goes to zero at \sqrt{2} so we can apply Newton’s method.

    \begin{align*} x_{n+1} = x_n - \cfrac{x_n^2-2}{2x_n} \end{align*}

clear; format long;
N = 10;
x(1) = 2.
for i = 1:N
x(i+1) = x(i) – (x(i)^2 -2)/(2*x(i));
end

If we look at the contents of x we find,

n x_n
1 2
2 1.500
3 1.416666666666667
4 1.414215686274510
5 1.414213562374690
6 1.414213562373095
7 1.414213562373095

The update function just produces fractions or rational numbers if you choose an initial guess like 2. In old times, before the calculator one could calculate the square root of any number to sufficient accuracy by finding the fraction, then doing long division in the last step. It is quite a good method, 15 digits are perfect after the 6th iteration.

There are a number of things that can go wrong with Newton’s method. Each successive update may result in oscillation or the updates may be directed to another solution you are not looking for. The best way to find a root is to make a plot of f(x) near the root and plan your initial guess such that it climbs down the hill to the root avoiding any bumps or being too far away from the minimum. Also, your function may have a restricted domain or singularities where division by zero occurs. These will obviously crash Newton’s method.

Gauss-Newton method

    \begin{align*} r_i = y_i - f(x_i;\mathbf{a}) \end{align*}

The Gauss-Newton algorithm is another iterative curve fitting method.
One can arrive at the update formula by considering a Taylor expansion of f(x_i;\mathbf{a}) with respect to an update \bm{\delta}.

    \begin{align*} f(x_i; \mathbf{a} + \bm{\delta}) & \approx f(x_i,\mathbf{a}) + \cfrac{\partial f(x_i; \mathbf{a})}{\partial \mathbf{a}} \bm{\delta} \\ f(x_i; \mathbf{a} + \bm{\delta}) & = \mathbf{f} + \mathbf{J}\bm{\delta} \end{align*}

At the minimum of \chi^2, then

    \begin{align*} \cfrac{\partial \chi^2}{\partial \bm{\delta}}=0 \end{align*}

Then we have

    \begin{align*} \chi^2(\mathbf{a} + \bm{\delta}) & \approx (\mathbf{y} - \mathbf{f}(x_i;\mathbf{a}) - \mathbf{J} \bm{\delta})^T\mathbf{W}(\mathbf{y} - \mathbf{f}(x_i;\mathbf{a}) - \mathbf{J} \bm{\delta})\\ \chi^2(\mathbf{a} + \bm{\delta}) &\approx (\mathbf{r} - \mathbf{J}\bm{\delta})^T\mathbf{W}(\mathbf{r} - \mathbf{J}\bm{\delta}) \end{align*}

We want to take the derivative of \chi^2 with respect to \bm{\delta} and set it equal to zero. Remember the important derivative.

    \begin{align*} 0 &= ( \mathbf{r - J\bm{\delta}})^T \mathbf{W} (\mathbf{-J}) \end{align*}

We can take the transpose of that expression to get

    \begin{align*} 0 & = \mathbf{J}^T\mathbf{W}( \mathbf{r-J\bm{\delta}}) \end{align*}

Remember that \mathbf{W} is symmetric. Solving for \bm{\delta} gives the Gauss Newton update rule.

    \begin{align*} \bm{ \delta} &= (\mathbf{J}^T\mathbf{W}\mathbf{J})^{-1}\mathbf{J}^T \mathbf{W}\mathbf{r}\\ \mathbf{a}_{s+1} &= \mathbf{a}_s + \bm{\delta} \end{align*}

To implement the Gauss-Newton method we only need to code for the update rule. Gauss-Newton seems to be more picky about the initial guess than other methods.

Example. Implement the Gauss-Newton method to fit the model data
We change the initial guess and update the delta rule.

clear;clf;figure(1);
load modeldata.mat

errorbar(xi, ]yi, sigi,’.k’);
hold on;
xlabel(‘time [s]’, ‘fontsize’,20);
ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);
W = zeros(N,N);
for i = 1:N
W(i,i) = 1/sigi(i)^2;
end
%some initial random guess
A(1) = 1.4
B(1) = -4.3
C(1) = 1.3

gamma = 0.00001; %Arbitrary step size
S = 10000; %Number of steps

for s = 1:S
for i = 1:N
J(i,1) = exp(-C(s)*xi(i));
J(i,2) = xi(i)*exp(-C(s)*xi(i));
J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));
ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));
end
% we really only have to change one line of code for delta
delta = inv(J’*W*J)*J’*W*ri’;
A(s+1)=A(s)+delta(1);
B(s+1)=B(s)+delta(2);
C(s+1)=C(s)+delta(3);
end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);
plot(xi, ymodel,’r’);
title(‘Gauss Newton fits’,’fontsize’,20);


Gauss-Newton works

Gradient decent method

Imagine \chi^2 as a multidimensional hyper-surface of the possible fit parameters, \mathbf{a} = (a_1, a_2,...,a_N). A choice of \mathbf{a} corresponds to a location on the hypersurface. We can use the result from vector calculus that -\nabla \chi^2 will point towards a local minimum. We can then step in that direction and repeat the process. We take an arbitrary step size of \gamma= 0.00001.The gradient of chi squared is calculated as

    \begin{align*} \nabla \chi^2 = \cfrac{\partial (\mathbf{r^TWr})}{\partial \mathbf{x}}=2\mathbf{r^T W \cfrac{\p \mathbf{r}}{\p \mathbf{x}}} = -2 \mathbf{r}^T \mathbf{W} \mathbf{J} \end{align*}

The gradient points up the slope of a function so our update rule should point in the opposite direction. We choose the step size to be proportional to \gamma which is something we set arbitrarily. We can make it 10^{-5} and see if the calculation converges.

    \begin{align*} \mathbf{a}_{s+1}= \mathbf{a}_s + \gamma \mathbf{r^TWJ} \end{align*}

Example Implement gradient descent with the model data described earlier in this chapter.
We load the data and do the calculation

%Imprementing gradiate descent
clear;clf;figure(1);
load modeldata.mat

errorbar(xi, yi, sigi,’.k’);
hold on;
xlabel(‘time [s]’, ‘fontsize’,20);
ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);
W = zeros(N,N);
for i = 1:N
W(i,i) = 1/sigi(i)^2;
end
%some initial random guess
A(1) = .2
B(1) = -.2
C(1) = .3

gamma = 0.00001; %Arbitrary step size
S = 10000; %Number of steps

for s = 1:S
for i = 1:N
J(i,1) = exp(-C(s)*xi(i));
J(i,2) = xi(i)*exp(-C(s)*xi(i));
J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));
ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));
end
delta = 2*gamma*ri*W*J;
A(s+1)=A(s)+delta(1);
B(s+1)=B(s)+delta(2);
C(s+1)=C(s)+delta(3);
end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);
plot(xi, ymodel,’r’);
title(‘Gradient descent fit’,’fontsize’,20);


Gradient descent works

Levenberg Marquardt Method

The Levenberg-Marquardt algorithm is one of the most powerful curve fitting methods available. Most commercial software packages use it in one form or another. It is robust, meaning that it will converge to a local minimum for a wide range of initial guesses. It does not always find the global minimum. The Levenberg-Marquardt algorithm can be thought of as a combination of the Gauss-Newton method and the gradient descent method. To derive the Levenberg-Marquardt algorithm one starts along the same lines as the Gauss-Newton method and we arrive at

    \begin{align*} (\mathbf{J^TWJ})\bm{\delta} = \mathbf{J^TW}\mathbf{r} \end{align*}

The Levenberg-Marquardt method slightly modifies this equation to

    \begin{align*} (\mathbf{J^TWJ} + \lambda\, \mathrm{diag} (\mathbf{J^TWJ}))\bm{\delta} = \mathbf{J^TW}\mathbf{r} \end{align*}

The diag symbol means only the components along the diagonal are non-zero.
The update \bm{\delta} can be solved for by an inverse matrix.
The Levenberg-Marquardt update rule is given by

    \begin{align*} \bm{\delta} &= [\mathbf{J^TWJ} + \lambda \mathrm{diag} \mathbf{J^TWJ}]^{-1} \mathbf{J^TWr}\\ \mathbf{a}_{s+1} &= \mathbf{a}_s + \bm{\delta} \end{align*}

Example Implement the Levenberg Marquardt algorithm in Matlab
Here is the code

clear; clf; figure(1);
load criticaldampingdata

errorbar(xi, yi, sigi,’.k’);
hold on;
xlabel(‘time [s]’, ‘fontsize’,20);
ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);
W = zeros(N,N);
for i = 1:N
W(i,i) = 1/sigi(i)^2;
end
%some initial random guess
A(1) = 1;
B(1) = -4;
C(1) = 1.3;

S = 10000;
lambda = 0.01;

for s = 1:S
for i = 1:N
J(i,1) = exp(-C(s)*xi(i));
J(i,2) = xi(i)*exp(-C(s)*xi(i));
J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));
ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));
end
Q = J’*W*J;
QQ = zeros(3,3);
QQ(1,1) = Q(1,1)*(1+lambda);
QQ(2,2) = Q(2,2)*(1+lambda);
QQ(3,3) = Q(3,3)*(1+lambda);
delta = inv(Q+QQ)*J’*W*ri’;
A(s+1)=A(s)+delta(1);
B(s+1)=B(s)+delta(2);
C(s+1)=C(s)+delta(3);
end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);
plot(xi, ymodel,’-.’);
title(‘Levenberg Marquardt fits’,’fontsize’,20);\end{verbatim}

Nonlinear least squares uncertainties

If you can have already found the optimal parameters by one of the various methods we are about to discuss then it is straightforward to calculate the estimated parameter uncertainties. Recall from the generalized linear theory, we had the matrix D_{jk}. There is a second way to calculate this matrix just take

    \[D_{jk} = (1/2) \cfrac{ \p^2 \chi^2}{\p a_j \p a_k}\]

Now we can find C_{jk} and that leads to the parameter uncertainties a_j = \sqrt{C_{jj}}. One should verify that this method works by testing in on the y = A + Bx model to generate D_{jk}.

For nonlinear models we can write out the matrix elements by calculating the derivatives.

    \begin{align*} D_{jk} = \sum_{i=1}^N \cfrac{1}{\sigma_i^2}\left[\left(\cfrac{\partial f}{\partial a_j} \right) \left( \cfrac{\partial f}{\partial a_k}\right)+ (f-y_i) \cfrac{\partial^2 f}{\partial a_j \partial a_k}\right] \end{align*}

Example. Find the parameter estimate for B and its uncertainty in the straight line model through the origin by calculating D_{jk}. The formula for \chi^2 when we fit the line through the origin was

    \begin{align*} \chi^2 & = \sum_{i=1}^N \cfrac{(y_i - Bx)^2}{\sigma_i^2}\\ \chi^2 &= S_{wyy} - 2BS_{wxy} + B^2 S_{wxx} \end{align*}

We can easily find \mathbf{D} it is a 1 by 1 matrix.

    \begin{align*} \mathbf{D} = (1/2) \cfrac{d^2}{dB^2} \chi^2 = S_{wxx} \end{align*}

Since \mathbf{C} is the inverse of \mathbf{D} we have

    \begin{align*} \mathbf{C} &= \cfrac{1}{S_{wxx}} \end{align*}

Therefore

    \begin{align*} {\sigma}_B &= \cfrac{1}{ \sqrt{ S_{wxx} } } \end{align*}

This is exactly what we got before with this model using different methods.

Example Calculate D_{jk} for the critical damping problem and then find the uncertainties in the estimates.
For this task, I definitely want to use Mathematica. I notice that D_{jk} is symmetric so we only have to calculate six quantities. To find the derivatives in Mathematica I type the following.

D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, A], A]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, B], B]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, C], C]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, A], B]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, A], C]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, B], C]
The results are implemented in MATLAB

wi = sigi.^(-2);
A=A(end);
B=B(end);
C=C(end);

E1 = exp(-C.*xi);
E2 = exp(-2.*C.*xi);
P1 = (A + B.*xi);
X2 = xi.*xi;
D(1,1) = sum( wi.*E2);
D(2,2) = sum( wi.*X2.*E2);
D(3,3) = sum( wi.*(E2.*X2.*P1.^2)-E1.*X2.*P1.*(-E1.*P1 + yi));
D(1,2) = sum( wi.*E2.*xi);
D(1,3) = sum( -wi.*E2.*xi.*P1 + wi.*E1.*xi.*(-E1.*P1 + yi));
D(2,3) = sum( -wi.*E2.*X2.*P1 + wi.*E1.*X2.*(-E1.*P1 +yi));

D
M = inv(D)
A
sqrt(M(1,1))
B
sqrt(M(2,2))
C
sqrt(M(3,3))

With five samples the values are usually pretty close to the true parameters. If you increase the number of samples Ni then you get even closer to the true values. I was expecting that the error bars would be larger and enclose the true values, but this is not always the case. The curve fit (dotted-dashed line) is visually pretty close to the true model, yet small differences remain. One can experiment with the code to try different starting models and initial guesses. This exercise was meant to show the LM method and how to implement it in MATLAB. You can easily now adapt the code to your curve fitting functions.


Levenberg Marquardt works. For this plot, the agreement with the true values is reasonable. A = 2.077(93), B = -5.17(12), and C = 1.026(16)

We have to get into further discussions of statistics later to determine what it means to have a good fit. This depends on the value of chi-squared and the number of degrees of freedom. For the meantime, we are stuck with chi by eye.

NonlinearLeastSquaresCurveFitting.pdf

Visit www.sciencemathmastery.com for more lessons on error analysis, probability, and statistics.

Generalized Linear Least Squares Curve Fitting

Summary

Least squares curve fitting can be generalized to work with any linear model to determine unique parameter estimates and uncertainties in terms of matrix equations. Linear models follow a general form. All other models are classified as nonlinear. While nonlinear models do not have a general analytic solution, some nonlinear models can be transformed into linear models. We give two examples of transforming an exponential model and a transformation of the enzymatic rate equation to straight line models using propagation of error. Polynomials form a class of linear models which we implement in MATLAB.

Linear and nonlinear models

In least square curve fitting, there is a step where we take derivatives to find the parameter estimates. If those equations are too complicated then we will have a hard time solving them. It is straightforward to solve these equations for so called linear models. Linear models are of the form

    \begin{align*} f(x_i; \mathbf{a})&= a_1 X_1(x_i) + a_2 X_2(x_i) + \ldots + a_M X_M(x_i) \end{align*}

One example of a linear model with M parameters is a polynomial.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 + a_2x_i + \ldots + a_{M}x_i^{M-1} \end{align*}

Other linear models might have the x_i replaced with general functions X_j(x_i). This gives a lot of different possible functions, but some are unnatural and do not occur in usual data sets.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 \sin(x_i) + a_2 e^{x_i} + \ldots + a_{M}\ln(x_i) \end{align*}

Linear means the parameter is to the first power out in front. The following model is not linear.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 \sin(a_2 x_i) + a_3 e^{a_4 x_i} \end{align*}

For models of the nonlinear form, the optimization problem is usually quite difficult to solve. We will consider what to do with nonlinear models later.

To perform a curve fit on a model with M parameters you need at least M different points. A good practice is to have many more measured points than parameters. When models are bloated with a lot of parameters you can usually fit a wide variety of functions. Keep it as simple as possible but not simpler. Use only as few parameters are necessary.

The generalized least squares problem

Now with our linear model of M parameters we would like to derive the best parameter estimates and their uncertainties in the least squares weighted framework. The model is.

    \[f(x_i;a_1,a_2,...,a_M) = f(x_i; \mathbf{a}) = \sum_{k=1}^M a_k X_k(x_i)\]

where X_k(x_i) are some generally defined continuous functions.

    \begin{align*} \chi^2 &= \sum_{i=1}^N \cfrac{({y}_i- \sum_{k=1}^M a_kX_k(x_i))^2}{{\sigma}_i^2} \end{align*}

To minizise \chi^2, we set each derivative equal to zero.

    \[\cfrac{\partial}{ \partial a_k} \chi^2=0\]

This gives a k= 1,2,\ldots, M linear system of equations. These “normal equations” give the optimal fit parameters \mathbf{a} which we can write as a vector.

    \[\sum_{i=1}^N \left ( y_i - \sum_{j=1}^M a_j X_j(x_i) \right) \cfrac{X_k(x_i)}{\sigma_i^2} =0\]

Taking terms on each side of the equal sign.

    \begin{align*} \sum_{i=1}^N \cfrac{y_i X_k(x_i)}{\sigma_i^2}=\sum_{i=1}^N \sum_{j=1}^M {a}_j X_j(x_i)\cfrac{X_k(x_i)}{{\sigma}_i^2} \end{align*}

It is useful to do some linear algebra to solve these equations for the parameters and their uncertainties. The “design matrix,” A_{ij}, is defined as the N row and M column matrix

    \begin{align*} A_{ij} = \cfrac{X_j(x_i)}{{\sigma}_i} \end{align*}

Second we define the column vector \mathbf{B} of M rows.

    \begin{align*} B_k = \sum_{i=1}^N \cfrac{{y}_iX_k(x_i)}{{\sigma}_i^2} \end{align*}

A third M\times M matrix, D_{jk} has inverse C_{jk}.

    \begin{align*} {D}_{jk} &= \sum_{i=1}^N \cfrac{X_j(x_i)X_k(x_i)}{{\sigma}_i^2} \to \mathbf{D} = \mathbf{A}^T \mathbf{A}\\ \mathbf{C}&= \mathbf{D}^{-1} \end{align*}

Now the normal equations can be rewritten as

    \begin{align*} \mathbf{D} \mathbf{{a}} = \mathbf{B} \end{align*}

To solve for the fit parameters \mathbf{{a}} it is easy to numerically compute the inverse in MATLAB.

    \begin{align*} \mathbf{{a}}& = \mathbf{D}^{-1} \mathbf{B} = \mathbf{C}\mathbf{B} \\ {a}_j &= \sum_{k=1}^M C_{jk} \left[ \sum_{i=1}^N \cfrac{y_iX_k(x_i)}{{\sigma}_i^2}\right] \end{align*}

General parameter uncertainties

Now, we can find the uncertainty in the parameters by the propagation of error formula.

    \begin{align*} {\sigma}_{a_j}^2 &= \sum_{i=1}^N \left( \cfrac{\partial a_j}{\partial {y}_i}\right)^2{\sigma}_i^2\\ \cfrac{\partial a_j}{\partial {y}_i}&= \sum_{k=1}^M \cfrac{C_{jk} X_{k}(x_i)}{{\sigma}_i^2}\\ {\sigma}^2_{a_j}&= \sum_{k=1}^M \sum_{l=1}^M C_{jk} C_{jl} \left(\sum_{i=1}^N \cfrac{X_k(x_i)X_l(x_i)}{{\sigma}_i^2} \right)\\ {\sigma}^2_{a_j}&=\sum_{k=1}^M \sum_{l=1}^M C_{jk} C_{jl} D_{lk} \end{align*}

But \displaystyle \sum_{l=1}^M C_{jl}D_{lk} = \delta_{jk} the Kroenecker delta because \mathbf{C} is the inverse of \mathbf{D}.

    \begin{align*} \delta_{ij} = \begin{cases} 1, & \text{if } i=j,\\ 0, & \text{if } i\neq j. \end{cases} \end{align*}

so the sum reduces to

    \begin{align*} {\sigma}^2_{{a}_j} = C_{jj} \end{align*}

These are the uncertainties in the fitting parameters.
The off diagonal terms of \mathbf{C} also give the covariances. If two different groups use the least squares theory on a linear model they should arrive at the same results for the fitting parameters and their uncertainties which are derived here. To summarize the results for linear least squares models

    \begin{align*} {a}_j &= \sum_{k=1}^M C_{jk} \left[ \sum_{i=1}^N \cfrac{{y}_iX_k(x_i)}{{\sigma}_i^2}\right] \qquad {\sigma}_{{a}_j} = \sqrt{C_{jj}} \end{align*}

Linearizing nonlinear models

Sometimes data following a nonlinear relationship may be transformed to a linear model. One can then estimate the transformed parameters exactly in conjunction with propagation of error. Here is a well known example from biochemistry for the velocity of a chemical reaction illustrating linearization.

    \begin{align*} V = \cfrac{V_{max}[S]}{K_m + [S]} \end{align*}

This equation can be rewritten as

    \begin{align*} \cfrac{1}{V} = \cfrac{K_m}{V_{max}}\cfrac{1}{[S]} + \cfrac{1}{V_{max}} \end{align*}

We start with ([S]_i, V_i, \sigma_{V_i}) as the set of information. \sigma_{V_i} is the standard deviation of the mean of the repeated measurement of V_i. The transformed variables are

    \begin{align*} x_i &= 1/[S]_i \quad y_i = 1/V_i \quad \sigma_i = \cfrac{\sigma_{V_i}}{V_i^2}\\ a_2 &= \cfrac{K_m}{V_{max}} \qquad a_1 = \cfrac{1}{V_{max}} \end{align*}

Propagation of error for a single variable has been used to find \sigma_i. One can then weighted fit to a line to find a_1, a_2, \sigma_{a_1}, and \sigma_{a_2}. K_m and V_{max} can then be found also by propagation of error. This transformation generates a linear plot called the Lineweaver-Burke plot.

Another straightforward example of linearization involves exponential decay.

    \begin{align*} N(t) = N_0e^{-kt} \end{align*}

One has the set of information (t_i,N_i,\sigma_{N_i}) where \sigma_{N_i} are the standard deviations of the mean.
Now take the logarithm of both sides.

    \begin{align*} \log(N_i) = \log(N_0) - kt_i \end{align*}

Now we transform

    \begin{align*} x_i &= t_i \quad y_i = \log(N_i) \quad \sigma_i = \sigma_{N_i}/N_i\\ a_1 &= \log(N_0) \quad a_2= -k \end{align*}

If the error bars were originally similar for most points on the linear plot they will be stretched out at longer times after transformation. After finding a_1 and \sigma_{a_1}, you can use error propagation to find N_0 and its uncertainty.

    \[\sigma_{N_0}= e^{a_1}\sigma_{a_1}\]

Example. Fit some simulated exponentially decaying data with linear least squares by linearizing the model.
We generate some model data which are the average of five points of y = 100e^{-t} + normrnd(0,4).

clear;
ti = 0.0:0.1:2.5
N0 = 100;
L = 1;
Ni = 5;
for i = 1:length(ti)
temp = [];
for j = 1:Ni
temp(j) = N0*exp(-L*ti(i))+normrnd(0,4);
end
yi(i) = mean(temp);
ei(i) = std(temp)/sqrt(Ni);
end

figure(1);clf;
errorbar(ti,yi,ei,’.k’);
axis([-0.5 3 0 125]);
xlabel(‘x’);
ylabel(‘N’);

figure(2);clf;
zi = log(yi);
si = ei./yi;
errorbar(ti,zi,si,’.k’);
axis([-0.5 3 0 5])
xlabel(‘t’);
ylabel(‘z’);
hold on;

[A sigA B sigB chi2] = lineABx(ti, zi, si);
XX = 0:0.1:3
YY = A + XX.*B;
plot(XX,YY,’k’);

estN = exp(A)
estsigN = exp(A)*sigA
estL = -B
estsigL = sigB

figure(1);
hold on;
XX = 0:0.01:3
YY = estN.*exp(-estL.*XX);
plot(XX,YY,’k’);


Exponential decay with best fit


Linearization and weighted fit

Least squares polynomial

Example. Implement a polynomial least squares fitting function in MATLAB.
Here is the MATLAB function we will use.

function [ A sigA ] = genpolyfit( M, xi, yi, sigi )
% [A sigA ] = genpolyfi( M, xi, yi, sigi)
% Least sqaures polynomial fit to degree M-1
%

for k = 1:M
X(k,:) = xi.^(k-1);
end

for j = 1:M
for k = 1:M
D(j,k) = sum(sigi.^(-2).*X(j,:).*X(k,:));
end
end

for k = 1:M
B(k) = sum(X(k,:).*yi.*sigi.^(-2));
end

A = inv(D)*B’;
C = inv(D);
for k = 1:M
sigA(k) = sqrt(C(k,k));
end

Example. Simulate some data that can be fit by a parabola.
We can use the least square polynomial function.

clear;figure(1);clf;
A = 1;
B = 2;
C = 3;
Ni = 5;
xi = -10:1:10;
for i = 1:length(xi)
temp = [];
for j = 1:Ni
temp(j) = A + B*xi(i) + C*xi(i).^2 + normrnd(0,10);
end
yi(i) = mean(temp);
si(i) = std(temp)/sqrt(Ni);
end
figure(1);clf;
errorbar(xi,yi,si,’.k’, ‘markersize’,10);
xlabel(‘x’)
ylabel(‘y’)
hold on;
[A sigA] = genpolyfit(3,xi,yi,si);
x1 = -11:0.1:11;
y1 = A(3).*x1.^2 + A(2).*x1 + A(1);
plot(x1,y1, ‘r’);
axis( [ -11 11 -50 1.1*max(y1)]);’


Curve fit to a parabola

GeneralizedLeastSquaresCurveFitting.pdf

Linear Least Squares Curve Fitting of Straight Line Models

Summary

Curve fitting is a useful procedure for testing theoretical models and estimating model parameters and their uncertainties. Curve fitting is carried out by choosing the set of estimated parameters that minimize an error function. The conventional error function for analytic results with calculus is the sum of the squares of residuals. The residuals are the difference between the model prediction and the experimental observation. Weighted fitting takes into account the size of the error bars where measurements with smaller error bars count more heavily towards the minimization. Weighted fits are more robust to the effects of outlying points which generally have larger error bars. The weighted mean is a least squares procedure, fit to y = A, which is used to combine different experimental results of the same quantity. This is how CODATA and NIST calculate the accepted fundamental constants of nature from various experiments. Curve fitting is a tedious calculation and is best done with a software package like MATLAB. We illustrate how to fit the various linear models that are discussed with simulated data in MATLAB. One can either put the total data set into the curve fit or average repeated measurements first and fit those.

The least squares error formula for curve fitting

Suppose we have a set of measurements (x_i, y_i) and we assume the x_i span some range on the x axis, but there may be repetitions. This is in fact recommended. Always repeat measurements so you can calculate their uncertainties directly from the data. We will not factor in how to use the uncertainties the best way in curve fitting yet, but we will consider it later. If we have a model f = A + Bx we define a residual as the difference between the model and the measurement

    \[r_i = y_i - A - Bx_i\]

or in general

    \[r_i = y_i - f(x_i;A,B,\ldots)\]

Now we can start defining some trial error functions. We could choose to minimize the residuals directly.

    \[E = \sum_{i=1}^N (y_i - A - Bx_i)\]

The problem with this error function is that some of the errors will be positive and some will be negative. They basically cancel each other out and you can’t find the best A and B. If we wanted to make the signs the same then we could try absolute values.

    \[E = \sum_{i=1}^N |y_i - A - Bx_i|\]

The problem with this error function is that taking derivatives of absolute value functions is nasty. You can’t get many nice results with calculus using this error function. The next simplest modification of the error function is to take

    \[E = \sum_{i=1}^N (y_i - A - Bx_i)^2\]

Then we look for the least squares of the residuals. This is the convention that most people use, but it is not totally arbitrary. We will discuss in a more advanced lesson some justification for the least squares method.

Fitting y = A + Bx

Expanding the 9 terms in E for f(x_i; A,B) = A + Bx) gives

    \[E = \sum_{i=1}^N (A^2 + 2AB x_i + B^2 x_i^2 - 2A y_i -2B x_i y_i + y_i^2)\]

Using the “S” notation we can write E as

    \[E = A^2S_1 + 2ABS_x + B^2 S_{xx} -2A S_y -2B S_{xy} + S_{yy}\]

where S_1 = N and S_{xy} = \sum_{i=1}^N x_iy_i etc.
The way we find the minimum of E is to take the partial derivatives with respect to A and B and set them equal to zero.

    \begin{align*} \cfrac{\partial }{\partial A} \chi^2 = 0 \qquad \cfrac{\partial }{\partial B} \chi^2 = 0 \end{align*}

This gives two equation which can be solved for A and B.

    \begin{align*} {A} S_1 + {B} S_{x}  &=  S_{y} \\ {A} S_{x} + {B}S_{xx} & =  S_{xy}  \end{align*}

This equation can be written as a matrix equation.

    \[\left( \begin{array}{cc} S_1 & S_x \\ S_x & S_{xx} \end{array}\right)  \left( \begin{array}{c} A \\ B\end{array} \right)=  \left( \begin{array}{c} S_y \\ S_{xy}\end{array} \right)\]

As long as the determinant of the first matrix on the left is not zero we can solve these linear equations with Cramer’s rule. The condition is basically that two of the x_i be different. You can’t make any sense of the fit if all the measurements are just at one point. You need two different x_i to define a linear function.

    \begin{align*} {A} &= \cfrac{   \left| \begin{array}{cc} S_{y} & S_{x}  \\ S_{xy} & S_{xx}   \end{array} \right| }{   \left| \begin{array}{cc} S_1 & S_{x}  \\ S_{x} & S_{xx}   \end{array} \right|} =  \cfrac{S_{y}S_{xx} - S_{x}S_{xy}}{S_1 S_{xx} - S_{x}^2} \\ \\ {B} &= \cfrac{   \left| \begin{array}{cc} S_{1} & S_{y}  \\ S_{x} & S_{xy}   \end{array} \right| }{   \left| \begin{array}{cc} S_1 & S_{x}  \\ S_{x} & S_{xx}   \end{array} \right|} =  \cfrac{S_{1}S_{xy} - S_{x}S_{y}}{S_1 S_{xx} - S_{x}^2} \end{align*}

A common expression is \Delta = NS_{xx} - S_x^2.

Now we find the uncertainties in A and B by using the propagation of error formula.

    \begin{align*} {\sigma}_A^2 &= \sum_{i=1}^N \left( \cfrac{\partial A}{\partial {y}_i}\right)^2 {\sigma}_i^2 \quad  {\sigma}_B^2 = \sum_{i=1}^N \left( \cfrac{\partial B}{\partial {y}_i}\right)^2 {\sigma}_i^2  \end{align*}

If we have not repeated the measurements at each x_i several times then we don’t know what the \sigma_i are. This is a good justification for repeating the measurements. If we used all different x_i then we can just assume that \sigma_i=\sigma_y is the same for all the points. If we use some arguments from statistics we can say

    \[E \approx N-2 \approx  \cfrac{\displaystyle \sum_{i=1}^n( y_i - A - Bx_i)^2}{\sigma_y^2}\]

There are N points, but two degrees of freedom less because we have two fit parameters. The expression in each parenthesis is roughly equal to \sigma_y^2. So we can write

    \[\sigma_y^2 = \cfrac{1}{N-2}  \sum_{i=1}^n( y_i - A - Bx_i)^2\]

Now we can do the calculation of the uncertainty. We have

    \[\sum_{i=1}^N y_i = y_1 + y_2 + \ldots + y_n\]

I define 1_i = \cfrac{\p y}{\p y_i}. It has the properties that 1_i=1 for each i. \sum_{i=1}^N 1_i = N, (1_i)^2 = 1_i, \sum_{i=1}^N (1_i) x_i = \sum_{i=1}^Nx_i

    \[\left( \cfrac{ \p A}{ \p y_i} \right) = (1_iS_{xx} - S_x x_i)/\Delta\]

    \[\left( \cfrac{ \p A}{ \p y_i} \right)^2 = (1_i)^2S_{xx}^2 +S_x^2 x_i^2 - 2(1_i)S_{xx} S_x^2 x_i^2)/\Delta^2\]

    \[\sum_{i=1}^N \left( \cfrac{ \p A}{ \p y_i} \right)^2 = (NS_{xx}^2 + S_x^2 S_{xx} -2S_{xx} S_x^2)/\Delta^2\]

    \[{\sigma}_A^2 = (S_{xx}/\Delta) \sigma_y^2\]

    \[\sigma_A = \sigma_y\sqrt{\cfrac{S_{xx}}{\Delta}}\]

A similar calculation for \sigma_B gives

    \[\sigma_B =  \sigma_y \sqrt{\cfrac{N}{\Delta}}\]

If we know each \sigma_i then it makes sense to just do the weighted calculation which we will describe later in this lesson.

Fitting y = Bx

For a model like Ohm’s law, V=IR, we want to fit a straight line forced to pass through the origin, y=Bx.
The calculus is similar.

    \[E =  \sum_{i=1}^N  (y_i - Bx_i)^2  = S_{wyy} - 2BS_{wxy} + B^2 S_{wxx}\]

Minimizing for B using \cfrac{d \chi^2}{dB} = 0 we find

    \[-2S_{wxy} + 2\bar{B} S_{wxx} = 0\]

    \[{B} = \cfrac{S_{wxy}}{S_{wxx}}\]

The propagation of error formula gives the uncertainty of B.

    \[\sigma_B^2 = \sum_{i=1}^N \left( \cfrac{\p B}{\p y_i}\right)^2 \sigma_i^2\]

Again if we don’t know the uncertainties we can use

    \[\sigma_y^2 = \cfrac{1}{N-1} \sum_{i=1}^N (y_i - Bx_i)^2\]

    \[\left( \cfrac{\p B}{\p y_i}\right) = x_i/S_{xx}\]

    \[\left( \cfrac{\p B}{\p y_i}\right)^2 = x_i^2/S_{xx}^2\]

    \[\sigma_B^2 = \sum_{i=1}^N  \left( \cfrac{\p B}{\p y_i}\right)^2 \sigma_y^2 = \cfrac{\sigma_y^2}{S_{xx}}\]

    \[\sigma_B = \cfrac{\sigma_y}{S_{xx}}\]

Fitting y = A

The last model we will consider is a horizontal line.

    \[E = \sum_{i=1}^N (y_i - A)^2 = S_{yy} - 2A S_y + A^2 S_1\]

.
So we take the first derivative E'(A) = 0 and find

    \[0 = -2S_y + 2A S_1\]

or

    \[A = S_y/S_1 = S_y/N = \cfrac{ y_1 + y_2 +\ldots + y_N}{N} = \bar{y}\]

That is just the arithmetic mean for A. We can find the standard deviation in the mean by the same procedure we used before.

    \[\left( \cfrac{\p A }{\p y_i}\right) = (1_i)/N\]

    \[\sum_{i=1}^N \left( \cfrac{\p A }{\p y_i}\right)\sigma_y^2 = \sigma_y^2 \sum_{i=1}^N (1_i)^2/N^2 = \sigma_y^2 /N\]

We use for \sigma_y where A = \bar{y}.

    \[\sigma_y^2 = \cfrac{1}{N-1} \sum_{i=1}^n ( y_i - \bar{y})^2\]

So the standard deviation of the mean is given by

    \[\bar{\sigma} = \sigma_y/\sqrt{N}\]

.

Weighted fits and using chi squared

It is more useful to do curve fitting when you have repeated the measurements. Then you can know what the different standard deviations are. If the standard deviation is large then you don’t want to count those points as much as the ones for which the standard deviation is small. Another method of doing least squares curve fitting is to weight the residuals by w_i = 1/\sigma_i^2. This sum is called chi squared, \chi^2.

    \[\chi^2 = \sum_{i=1}^N \cfrac{(y - f(x_i;A,B))^2 }{\sigma_i^2}\]

This does not change the math so much that we cannot solve for A and B. You just have to replace each S_? formula with S_{w?}. For example, S_1 \to S_w, S_x \to S_{wx}, and so on.

There are slight differences in how you interpret chi squared as a statistic if you use each pair of data points in the equation or first average the points and calculate their standard deviation of the mean. This boils down to using (x_i,y_i,\sigma_i) or (x_i,y_i = \bar{y}_i, \sigma_i = \bar{\sigma}_i). Often you don’t have access to the individual data points if you look at someone else’s data, so you have to use the average values and the standard deviations of the mean instead of (y_i, \sigma_i). We can then summarize the results from before with the new weighting scheme.

For y = A + Bx.

    \begin{align*}  {A} &= \cfrac{S_{wy}S_{wxx} - S_{wx}S_{wxy}}{S_w S_{wxx} - S_{wx}^2} \qquad &&{\sigma}_A = \sqrt{\cfrac{S_{wxx}}{S_w S_{wxx} - S_{wx}^2} }\\  {B} &= \cfrac{S_{w}S_{wxy} - S_{wx}S_{wy}}{S_w S_{wxx} - S_{wx}^2} \qquad &&{\sigma}_B = \sqrt{\cfrac{S_w}{S_w S_{wxx} - S_{wx}^2} } \end{align*}

For y = Bx. We have B = \cfrac{S_{wxy}}{S_{wxx}}. So we can calculate the uncertainty.

    \[\left( \cfrac{ \p B}{ \p y_i} \right) = w_ix_i/S_{wx}\]

    \[\left( \cfrac{ \p B}{ \p y_i} \right)^2 w_i^{-1} = w_i x_i^2/ S_{wxx}^2\]

    \[\sigma_B^2 = \sum_{i=1}^N \left( \cfrac{ \p B}{ \p y_i} \right)^2 w_i^{-1} = \cfrac{1}{S_{wxx}}\]

    \[\sigma_B = \cfrac{1}{\sqrt{S_{wxx}}}\]

Probably the most useful weighted formula is for the weighted mean when y = A= y_w. This allows you to average together measurements.

    \[\chi^2 = \sum_{i=1}^N \cfrac{ (y_i - y_w)^2}{\sigma_i^2} = \sum_{i=1}^N w_i(y_i - y_w)^2\]

    \[\chi^2 = S_{wyy} - 2S_{wy}y_w + S_w y_w^2\]

Taking \cfrac{d\chi^2}{d y_w} = 0 we find

    \[y_w = \cfrac{ S_{wy}}{S_w} = \cfrac{ w_1 y_1 + w_2 y_2 + \ldots w_N y_N}{w_1 + w_2 + \ldots + w_N}\]

This is referred to as the weighted mean. We can find the uncertainty in the weighted mean

    \[\sigma_{y_w} ^2= \sum_{i=1}^N ( w_i/S_w)^2w_i^{-1} = 1/S_w\]

    \[\sigma_{y_w} = \cfrac{1}{\sqrt{ w_1 + w_2 + \ldots + w_N}}\]

In the limit, that all w_i are the same we recover the arithmetic mean and the standard deviation of the mean for the uncertainty.

This formula is quite useful. If you have a series of accurate measurements some of which are precise and others not so much you can combine them to get a more accurate and precise average based on the weighting. Most people are probably already familiar with this in terms of calculating their grades. Say 25 percent for homework, 25 percent for the midterm and 50 percent for the final. Then an average score might look like

    \[\cfrac{ 87 \times 25 + 95 \times 25 + 88 \times 50}{25 + 25 + 50} = 89.5\]

Pretty close, let’s give them an A minus anyways.

MATLAB functions for straight line curve fitting

y = A + Bx

Example. Write a Matlab function that fits a straight line y = A + Bx through a dataset.
We can easily compute all the sums in Matlab. Computing the sums by hand would be quite tedious.

function [ A sigA B sigB chi2] = lineABx( xi, yi, sigi )
%[A sigA B sigB chi2] = lineBx(xi, yi, sigi)
% Least sqaures fit to a line y = A + Bx
wi = sigi.^(-2);
Sw = sum(wi);
Swx = sum(wi.*xi);
Swxx = sum(wi.*xi.*xi);
Swy = sum(wi.*yi);
Swxy = sum(wi.*xi.*yi);
Delta = Sw*Swxx – Swx^2;
A = (Swy*Swxx – Swx*Swxy)/Delta;
B = (Sw*Swxy – Swx*Swy)/Delta;
sigA = sqrt(Swxx/Delta);
sigB = sqrt(Sw/Delta);
chi2 = sum( wi.*(yi – A- B.*xi).^2)
end

Example. Write a Matlab function that fits an unweighted straight line y = A + Bx through a dataset.
We have

function [ A sigA B sigB] = lineABxU( xi, yi )
%[A sigA B sigB] = lineBx(xi, yi)
%Unweighted least sqaures fit to a line y = A + Bx
S = length(xi);
Sx = sum(xi);
Sxx = sum(xi.*xi);
Sy = sum(yi);
Sxy = sum(xi.*yi);
Delta = S*Sxx – Sx^2;
A = (Sy*Sxx – Sx*Sxy)/Delta;
B = (S*Sxy – Sx*Sy)/Delta;
sigma = sqrt(1/(length(xi)-2)*sum((yi – A – B.*xi).^2));
sigA = sigma*sqrt(Sxx/Delta);
sigB = sigma*sqrt(S/Delta);
end

y = Bx

Example. Write a Matlab function to fit a straight line through the origin
We can evaluate the sums easily in Matlab.

function [ B sigB ] = lineBx( xi, yi, sigi )
%[B sigB chi2] = lineBx(xi, yi, sigi)
%Weightef least sqaured fit to a line y = Bx
wi = sigi.^(-2);
Swxy = sum(wi.*xi.*yi);
Swxx = sum(wi.*xi.*xi);
B = Swxy/Swxx;
sigB = 1/sqrt(Swxx);
chi2 = sum( wi.*(yi – B.*xi).^2)
end

Example. Write a Matlab function to fit an unweighted straight line through the origin
The code for the unweighted fit is as follows

function [ BU sigBU ] = lineBxU( xi, yi )
%[B sigB] = lineBx(xi, yi )
%Weightef least sqaured fit to a line y = Bx
Sxy = sum(xi.*yi);
Sxx = sum(xi.*xi);
BU = Sxy/Sxx;
variance = 1/length(yi)* sum( (yi – BU).^2 );
sigma = sqrt(variance);
sigBU = sigma/sqrt(Sxx);
end

y = A

Example. Write a Matlab function to calculated the weighted mean and its uncertainty.
Implementing the theory we have

function [ A sigA chi2] = lineA( xi, yi, sigi )
%[A sigA chi2] = lineA(xi, yi, sigi)
%Least sqaures fit to a constant y = A
wi = sigi.^(-2);
Sw = sum(wi);
Swy = sum(wi.*yi);
A = Swy/Sw;
sigA = 1/sqrt(Sw);
chi2 = sum( wi.*(yi – A).^2 );
end

MATLAB curve fitting with simulated data

Example. Illustrate a weighted linear fit to y = A + Bx.
For the purpose of simulating, data we will assume a linear model plus random gaussian noise. The gaussian distribution noise can be generated with the command \mathrm{normrnd}(\mu, \sigma)=\mathrm{normrnd}(0,3). We will
take N_i = 5 measurements at each point for x = 1,2,3,\ldots, 10. With the model y = 10 + 3x. The goal of this simulation is to compare fitting the measurements and their standard deviations with the means at each x and their standard deviation of the mean. It will turn out that fitting either way gives identical estimates.

clear; figure(1); clf;
format long;

A = 10;
B = 3;
mu = 0;
sig = 3;
Ni = 5;

x = 1:1:10
allx = [];
ally = [];
allysig = [];
avgy = [];
for i = 1:length(x)
xtemp = [];
temp = [];
tempsig = [];
for j = 1:Ni
err = normrnd(mu,sig);
temp(j) = A + B*x(i) + err;
xtemp = [xtemp x(i)];
end
sigy1 = std(temp);
allx = [allx xtemp];
ally = [ally temp];
allysig = [allysig ones(1,Ni)*sigy1 ];
avgy(i) = [mean(temp)];
avgysig(i) = [sigy1/sqrt(Ni)];
end

[A1 sigA1 B1 sigB1 chi1] = lineABx(allx,ally,allysig);
[A2 sigA2 B2 sigB2 chi2] = lineABx(x, avgy, avgysig);

figure(1);
plot(allx, ally,’.k’,’markersize’,20);
hold on;
plot(x, avgy, ‘xr’,’markersize’,20);
axis( [ 0 11 0 1.1*max(ally)]);
xlabel(‘x’,’fontsize’,20);
ylabel(‘y’,’fontsize’,20);
x1 = [0 11]
y1 = [ A1 (A1 +B1*11)];
plot(x1,y1);

You can check the fit parameters A1, A2, B1, and B2 to see that they are numerically the same. Using the measurements with the standard deviations is equivalent to using the means with the standard deviations of the mean. Whichever you choose to display is up to you. In physics usually one shows the standard error of the mean because the samples are assumed to be identical and there are often a surplus of points. It might make more sense in biology to show the actual variation of the samples where they can be different and the number of measurements is less. You can modify the code by increasing N_i to show that the parameter estimates become more accurate and precise as N_i is increased.


Model y = 10 + 3x + normrnd(0,3), with five samples using standard deviation and measurements

Model y = 10 + 3x + normrnd(0,3), with five samples using means and standard deviation of the means. We obtain the same fit as before

Example. Show the effect of an outlying point when a linear fit is weighted or unweighted.
We will use simulated data like before but where the last point is an outlier. We will show the effect in an unweighted fit and a weighted fit which is more robust and fits the data better.

clear;
format long;
x = 1:10
y = [ 13.69 14.9 17.1 24.4 25.4 26.6 29.48 32.0 37.36 48.0 ]
sigy = [ 0.58 1.0 1.0 1.2 1.6 1.4 0.86 1.8 0.84 8.1 ]

figure(1);clf;
errorbar(x,y,sigy,’.k’,’markersize’,10);
hold on;
axis( [ 0 11 0 1.3*max(y)]);
xlabel(‘x’,’fontsize’,20);
ylabel(‘y’,’fontsize’,20);
title(‘Effect of an outlying point on unweighted fit’,’fontsize’,18);
[A1 sigA1 B1 sigB1 chi1] = lineABx(x,y, sigy);
[A2 sigA2 B2 sigB2 ] = lineABxU(x, y);

x1 = [0 11]
y1 = [ A1 (A1 +B1*11)];
plot(x1,y1,’k-‘);
x2 = [0 11]
y2 = [ A2 (A2 +B2*11)];
plot(x2,y2,’k-.’);


Model y = 10 + 3x + normrnd(0,3), with five samples, standard deviation of the mean used and an outlying point. The outlying point shifts the slope of the unweighted fit, but does not effect the weighted fit too much.

LinearLeastSquaresCurveFittingOfStraightLineModels.pdf