Monthly Archives: February 2017

Nonlinear Least Squares Curve Fitting

Summary

There is no general solution for the minimization of chi-squared to find the best estimates of the parameters when the model is nonlinear. A number of methods have been developed that can often find the best estimates and their uncertainties with arbitrary precision. Iterative methods are usually used to minimize a nonlinear chi-squared. Another brute force method, called the grid method, tests every possible set of parameters with a certain precision. This is usually not possible when the parameter space or the number of parameters are large. Commercial curve fitting software usually implements an iterative method called the Levenberg-Marquardt method. If you have a nonlinear curve fitting software you can also apply it to linear models also if you want, but there is no guarantee you get the analytic solution. In this lesson, we discuss how to implement the grid method for simple problems. We give an example of Newton’s method as a basic iterative method. We explore the gradient descent method, Gauss-Newton method, and the Levenberg Marquardt method both theoretically and in MATLAB. Users of curve fitting software should understand the underlying algorithms and be at least able to check if the software is performing the calculation correctly.

The minimization of nonlinear data

We minimize the \chi^2 error function in least squares theory.

    \begin{align*} \chi^2 = \sum_{i=1}^N \cfrac{ (y_i - f(x_i; \mathbf{a}))^2 }{\sigma_i^2} \end{align*}

We showed in a previous lesson that if f is a linear model, or

    \begin{align*} f(x_i; \mathbf{a}) = a_1 X_1(x_i) + a_2 X_2 (x_i) + \dots + a_M X_M(x_i) \end{align*}

Then \chi^2 has a unique global minimum and we can analytically derive expressions for the best estimates of the model parameters and their uncertainties, \mathbf{a} and {\sigma}_{a_j}.

When f is not of the above form then it is nonlinear. An example is

    \[f(x_i;\mathbf{a}) = a_1 \sin(a_2 x_i) + a_3 e^{a_4 x_i}\]

When you take the derivative of \chi^2 for a nonlinear model, you get a nonlinear equation which may have no general analytical solution in terms of linear algebra. One then has to resort to different methods. One typically resorts to iterative methods that require an initial guess for \mathbf{a} and update rules for the parameters that move towards the minimum of \chi^2. One repeats this a number of times until the algorithm converges to some minimum of \chi^2. This may be a local minimum of the actual global minimum. There is not a general test to show that a local minimum is actually the global minimum. It is always a good idea to start with different initial guesses and see if the same minimum is reached. This is no guarantee, but it is generally tolerated.

We want to work on some nonlinear data in this lesson so we can take a simple nonlinear model. For example, a common nonlinear model is a mass oscillating on a spring with damping.

    \[m \ddot{y} + 2b \dot{y} + ky = 0\]

For a certain choice of constants, b^2 - km = 0, the motion is called critically damped and obeys the following solution.

    \[y = (A + Bt)\exp(-Ct)\]

We can add some random noise to that function and take the average of five samples to generate our data set. This way we know how close different approach fit to the actual parameters. Now we generate a dataset in MATLAB and save it as a .mat file.

Example Generate some sample data that represent critical damping.
We take five simulated measurements and average them.

clear;figure(1);clf;
A = 2;
B = -5;
C = 1;
xi = 0:0.15:5;
Ni = 5;
ymodel = (A+B.*xi).*exp(-C.*xi);
yi = [];
sigi = [];

for i = 1:length(xi)
temp = [];
for j = 1:Ni
temp(j) = (A + B.*xi(i)).*exp(-C.*xi(i)) + normrnd(0,0.5);
end
yi(i) = mean(temp);
sigi(i) = std(temp)/sqrt(Ni);
end

subplot(2,1,1);
plot(xi,ymodel,’.-k’,’markersize’,20);
axis([0 5 -2 3]);
hold on;
xlabel( ‘Time [s]’,’fontsize’,20);
ylabel( ‘Position [m]’,’fontsize’,20);
title(‘True values for critical damping’,’fontsize’,20);
subplot(2,1,2)
errorbar(xi,yi, sigi,’.k’,’markersize’,15);
xlabel( ‘Time [s]’,’fontsize’,20);
ylabel( ‘Position [m]’,’fontsize’,20);
title(‘Simulated values for critical damping’,’fontsize’,20);
axis([0 5 -2 3]);
hold on;
plot(xi,ymodel,’-k’,’markersize’,20)
save criticaldampingdata xi yi sigi

Brute force grid method

If you have the computational power then there is a brute force approach to curve fitting. This method is called the grid method. Suppose from the graph of some linear data you can tell that A is between 3 and 4 and B is between 0.5 and 1.5. Then you can calculate \chi^2 for every value in between to 0.01 or 0.001 resolution. You then just find the least one by direct search in Matlab. Of course, those might take 10,000 or 100,000 calculations but you will get the same answer as the analytical result. You can do a coarse grid to find local minima and then do a fine grid to find the global minimum. One advantage of the grid method is that you don’t need to know much theory and it is pretty obvious how it works. If you cannot narrow the parameter space to include the global minimum then you might require too many calculations to perform the optimization.

We will now study some iterative methods that can be used to minimize chi-squared. Grid method is not really an iterative method like the rest. You try to cast a net over the parameter surface. You see which loop of the net produces the minimum, then you cast a finer net over that loop and rapidly converge towards a minimum. This method is good because you get out of it what you expect. When you have lots of parameters it might not be possible to cast a fine enough net to progress because there are too many combinations.

Example. Implement the grid method on Matlab to fit a straight line. Compare the results to
We can implement the grid method to fit a straight line model. It works just the same with nonlinear models.

%Grid Method

clear; clf;

A = 10;
B = 3;
Ni = 15;
xi = 1:2:15

for i = 1:length(xi)
tx = [];
for j = 1:Ni
tx(j) = A + B*xi(i) + normrnd(0,3);
end

ei(i) = std(tx)/sqrt(Ni);
yi(i) = sum(tx)/Ni;
end

errorbar(xi,yi,ei,’.k’, ‘markersize’,10);
xlabel(‘trial’)
ylabel(‘measurement’)
hold on;
[A sigA B sigB] = lineABx(xi,yi,ei)

A
sigA
B
sigB

a = 9.5:0.0001:10.5;
b = 2.9:0.0001:3.1;

for j = 1:length(a)
for k = 1:length(b)
chisquared(j,k) = sum( 1./ei.^2.*(yi-b(k).*xi – a(j)).^2 );
end
end

g = min(min(chisquared));
[I J] = find(chisquared == g);
AA = a(I);
BB = b(J);

A
AA
B
BB

Important matrix math

We can represent \chi^2 in terms of vectors. Let

    \begin{align*} \chi^2 = \sum_{i=1}^N \cfrac{ (y_i - f(x_i;\mathbf{a}))^2}{\sigma_i^2} \end{align*}

We define the residual vector.

    \begin{align*} \mathbf{r} = y_i - f_i \end{align*}

We define the weight matrix which is diagonal.

    \begin{align*} W_{ij} = \cfrac{1}{{\sigma}_i^2}\delta_{ij} \end{align*}

So we can write \chi^2 as

    \begin{align*} \chi^2(\mathbf{a}) = \mathbf{r^TWr} \end{align*}

We go back and forth between column and row vectors using the transpose property. Some properties of transpose are

    \begin{align*} \mathbf{(A^T)^T} &= \mathbf{A}\\ \mathbf{A^T=A} &\to \mathbf{A} \quad symmetric\\ \mathbf{(A+ B)}^T &= \mathbf{A}^T + \mathbf{B}^T \\ \mathbf{(AB)^T} &= \mathbf{B^TA^T} \end{align*}

When you multiply an NxM matrix with an MxL matrix you get an NxL matrix. You can use size(A) command in Matlab to see how your matrices are arranged. You can use the transpose operator A’ to convert a column vector to a row vector and vice versa. You may have to check your expression if you haven’t kept track of your transposes.

We will be taking partial derivatives of \chi^2 with respect to the parameters which is the Jacobian matrix.

    \begin{align*} \cfrac{\partial \mathbf{f}}{\partial \mathbf{a}}& = \mathbf{J} \\ \cfrac{\partial \mathbf{r}}{\partial \mathbf{a}}& = - \mathbf{J} \end{align*}

We also have the Hessian matrix which is defined as

    \begin{align*} H_{ij} = \cfrac{\partial^2 \chi^2}{\partial a_j \partial a_k} \end{align*}

Here is an important result from Matrix calculus for a symmetric matrix \mathbf{W}.

    \begin{align*} \cfrac{\partial (\mathbf{h^TWh})}{\partial \mathbf{x}}= 2\mathbf{h^TW}\cfrac{\partial \mathbf{h}}{\partial \mathbf{x}} \end{align*}

Newton’s method

We will explore Newton’s method because it is a simple example of an iterative method which we will be using to minimize \chi^2. Consider the following diagram.


Illustration of Newton’s method

We have

    \[f'(x) \approx \cfrac{f(x_n)}{x_n - x_{n+1}}\]

We can rearrange the expression as

    \begin{align*} x_{n+1} &= x_n - \cfrac{f(x_n)}{f'(x_n)} \end{align*}

So if we know x_1 the initial guess, we can find x_2. Then we can plug back in x_2 in the next iteration to find x_3 and so on.
We get closer and closer to the point where f(x^*) = 0 with each iteration.

Example Find the square root of two to high precision.
We can try out Newton’s method to calculate the square root of 2. If we have the equation,

    \begin{align*} x^2 = 2 \end{align*}

Then we can write f(x) = x^2 -2. This function goes to zero at \sqrt{2} so we can apply Newton’s method.

    \begin{align*} x_{n+1} = x_n - \cfrac{x_n^2-2}{2x_n} \end{align*}

clear; format long;
N = 10;
x(1) = 2.
for i = 1:N
x(i+1) = x(i) – (x(i)^2 -2)/(2*x(i));
end

If we look at the contents of x we find,

n x_n
1 2
2 1.500
3 1.416666666666667
4 1.414215686274510
5 1.414213562374690
6 1.414213562373095
7 1.414213562373095

The update function just produces fractions or rational numbers if you choose an initial guess like 2. In old times, before the calculator one could calculate the square root of any number to sufficient accuracy by finding the fraction, then doing long division in the last step. It is quite a good method, 15 digits are perfect after the 6th iteration.

There are a number of things that can go wrong with Newton’s method. Each successive update may result in oscillation or the updates may be directed to another solution you are not looking for. The best way to find a root is to make a plot of f(x) near the root and plan your initial guess such that it climbs down the hill to the root avoiding any bumps or being too far away from the minimum. Also, your function may have a restricted domain or singularities where division by zero occurs. These will obviously crash Newton’s method.

Gauss-Newton method

    \begin{align*} r_i = y_i - f(x_i;\mathbf{a}) \end{align*}

The Gauss-Newton algorithm is another iterative curve fitting method.
One can arrive at the update formula by considering a Taylor expansion of f(x_i;\mathbf{a}) with respect to an update \bm{\delta}.

    \begin{align*} f(x_i; \mathbf{a} + \bm{\delta}) & \approx f(x_i,\mathbf{a}) + \cfrac{\partial f(x_i; \mathbf{a})}{\partial \mathbf{a}} \bm{\delta} \\ f(x_i; \mathbf{a} + \bm{\delta}) & = \mathbf{f} + \mathbf{J}\bm{\delta} \end{align*}

At the minimum of \chi^2, then

    \begin{align*} \cfrac{\partial \chi^2}{\partial \bm{\delta}}=0 \end{align*}

Then we have

    \begin{align*} \chi^2(\mathbf{a} + \bm{\delta}) & \approx (\mathbf{y} - \mathbf{f}(x_i;\mathbf{a}) - \mathbf{J} \bm{\delta})^T\mathbf{W}(\mathbf{y} - \mathbf{f}(x_i;\mathbf{a}) - \mathbf{J} \bm{\delta})\\ \chi^2(\mathbf{a} + \bm{\delta}) &\approx (\mathbf{r} - \mathbf{J}\bm{\delta})^T\mathbf{W}(\mathbf{r} - \mathbf{J}\bm{\delta}) \end{align*}

We want to take the derivative of \chi^2 with respect to \bm{\delta} and set it equal to zero. Remember the important derivative.

    \begin{align*} 0 &= ( \mathbf{r - J\bm{\delta}})^T \mathbf{W} (\mathbf{-J}) \end{align*}

We can take the transpose of that expression to get

    \begin{align*} 0 & = \mathbf{J}^T\mathbf{W}( \mathbf{r-J\bm{\delta}}) \end{align*}

Remember that \mathbf{W} is symmetric. Solving for \bm{\delta} gives the Gauss Newton update rule.

    \begin{align*} \bm{ \delta} &= (\mathbf{J}^T\mathbf{W}\mathbf{J})^{-1}\mathbf{J}^T \mathbf{W}\mathbf{r}\\ \mathbf{a}_{s+1} &= \mathbf{a}_s + \bm{\delta} \end{align*}

To implement the Gauss-Newton method we only need to code for the update rule. Gauss-Newton seems to be more picky about the initial guess than other methods.

Example. Implement the Gauss-Newton method to fit the model data
We change the initial guess and update the delta rule.

clear;clf;figure(1);
load modeldata.mat

errorbar(xi, ]yi, sigi,’.k’);
hold on;
xlabel(‘time [s]’, ‘fontsize’,20);
ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);
W = zeros(N,N);
for i = 1:N
W(i,i) = 1/sigi(i)^2;
end
%some initial random guess
A(1) = 1.4
B(1) = -4.3
C(1) = 1.3

gamma = 0.00001; %Arbitrary step size
S = 10000; %Number of steps

for s = 1:S
for i = 1:N
J(i,1) = exp(-C(s)*xi(i));
J(i,2) = xi(i)*exp(-C(s)*xi(i));
J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));
ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));
end
% we really only have to change one line of code for delta
delta = inv(J’*W*J)*J’*W*ri’;
A(s+1)=A(s)+delta(1);
B(s+1)=B(s)+delta(2);
C(s+1)=C(s)+delta(3);
end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);
plot(xi, ymodel,’r’);
title(‘Gauss Newton fits’,’fontsize’,20);


Gauss-Newton works

Gradient decent method

Imagine \chi^2 as a multidimensional hyper-surface of the possible fit parameters, \mathbf{a} = (a_1, a_2,...,a_N). A choice of \mathbf{a} corresponds to a location on the hypersurface. We can use the result from vector calculus that -\nabla \chi^2 will point towards a local minimum. We can then step in that direction and repeat the process. We take an arbitrary step size of \gamma= 0.00001.The gradient of chi squared is calculated as

    \begin{align*} \nabla \chi^2 = \cfrac{\partial (\mathbf{r^TWr})}{\partial \mathbf{x}}=2\mathbf{r^T W \cfrac{\p \mathbf{r}}{\p \mathbf{x}}} = -2 \mathbf{r}^T \mathbf{W} \mathbf{J} \end{align*}

The gradient points up the slope of a function so our update rule should point in the opposite direction. We choose the step size to be proportional to \gamma which is something we set arbitrarily. We can make it 10^{-5} and see if the calculation converges.

    \begin{align*} \mathbf{a}_{s+1}= \mathbf{a}_s + \gamma \mathbf{r^TWJ} \end{align*}

Example Implement gradient descent with the model data described earlier in this chapter.
We load the data and do the calculation

%Imprementing gradiate descent
clear;clf;figure(1);
load modeldata.mat

errorbar(xi, yi, sigi,’.k’);
hold on;
xlabel(‘time [s]’, ‘fontsize’,20);
ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);
W = zeros(N,N);
for i = 1:N
W(i,i) = 1/sigi(i)^2;
end
%some initial random guess
A(1) = .2
B(1) = -.2
C(1) = .3

gamma = 0.00001; %Arbitrary step size
S = 10000; %Number of steps

for s = 1:S
for i = 1:N
J(i,1) = exp(-C(s)*xi(i));
J(i,2) = xi(i)*exp(-C(s)*xi(i));
J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));
ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));
end
delta = 2*gamma*ri*W*J;
A(s+1)=A(s)+delta(1);
B(s+1)=B(s)+delta(2);
C(s+1)=C(s)+delta(3);
end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);
plot(xi, ymodel,’r’);
title(‘Gradient descent fit’,’fontsize’,20);


Gradient descent works

Levenberg Marquardt Method

The Levenberg-Marquardt algorithm is one of the most powerful curve fitting methods available. Most commercial software packages use it in one form or another. It is robust, meaning that it will converge to a local minimum for a wide range of initial guesses. It does not always find the global minimum. The Levenberg-Marquardt algorithm can be thought of as a combination of the Gauss-Newton method and the gradient descent method. To derive the Levenberg-Marquardt algorithm one starts along the same lines as the Gauss-Newton method and we arrive at

    \begin{align*} (\mathbf{J^TWJ})\bm{\delta} = \mathbf{J^TW}\mathbf{r} \end{align*}

The Levenberg-Marquardt method slightly modifies this equation to

    \begin{align*} (\mathbf{J^TWJ} + \lambda\, \mathrm{diag} (\mathbf{J^TWJ}))\bm{\delta} = \mathbf{J^TW}\mathbf{r} \end{align*}

The diag symbol means only the components along the diagonal are non-zero.
The update \bm{\delta} can be solved for by an inverse matrix.
The Levenberg-Marquardt update rule is given by

    \begin{align*} \bm{\delta} &= [\mathbf{J^TWJ} + \lambda \mathrm{diag} \mathbf{J^TWJ}]^{-1} \mathbf{J^TWr}\\ \mathbf{a}_{s+1} &= \mathbf{a}_s + \bm{\delta} \end{align*}

Example Implement the Levenberg Marquardt algorithm in Matlab
Here is the code

clear; clf; figure(1);
load criticaldampingdata

errorbar(xi, yi, sigi,’.k’);
hold on;
xlabel(‘time [s]’, ‘fontsize’,20);
ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);
W = zeros(N,N);
for i = 1:N
W(i,i) = 1/sigi(i)^2;
end
%some initial random guess
A(1) = 1;
B(1) = -4;
C(1) = 1.3;

S = 10000;
lambda = 0.01;

for s = 1:S
for i = 1:N
J(i,1) = exp(-C(s)*xi(i));
J(i,2) = xi(i)*exp(-C(s)*xi(i));
J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));
ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));
end
Q = J’*W*J;
QQ = zeros(3,3);
QQ(1,1) = Q(1,1)*(1+lambda);
QQ(2,2) = Q(2,2)*(1+lambda);
QQ(3,3) = Q(3,3)*(1+lambda);
delta = inv(Q+QQ)*J’*W*ri’;
A(s+1)=A(s)+delta(1);
B(s+1)=B(s)+delta(2);
C(s+1)=C(s)+delta(3);
end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);
plot(xi, ymodel,’-.’);
title(‘Levenberg Marquardt fits’,’fontsize’,20);\end{verbatim}

Nonlinear least squares uncertainties

If you can have already found the optimal parameters by one of the various methods we are about to discuss then it is straightforward to calculate the estimated parameter uncertainties. Recall from the generalized linear theory, we had the matrix D_{jk}. There is a second way to calculate this matrix just take

    \[D_{jk} = (1/2) \cfrac{ \p^2 \chi^2}{\p a_j \p a_k}\]

Now we can find C_{jk} and that leads to the parameter uncertainties a_j = \sqrt{C_{jj}}. One should verify that this method works by testing in on the y = A + Bx model to generate D_{jk}.

For nonlinear models we can write out the matrix elements by calculating the derivatives.

    \begin{align*} D_{jk} = \sum_{i=1}^N \cfrac{1}{\sigma_i^2}\left[\left(\cfrac{\partial f}{\partial a_j} \right) \left( \cfrac{\partial f}{\partial a_k}\right)+ (f-y_i) \cfrac{\partial^2 f}{\partial a_j \partial a_k}\right] \end{align*}

Example. Find the parameter estimate for B and its uncertainty in the straight line model through the origin by calculating D_{jk}. The formula for \chi^2 when we fit the line through the origin was

    \begin{align*} \chi^2 & = \sum_{i=1}^N \cfrac{(y_i - Bx)^2}{\sigma_i^2}\\ \chi^2 &= S_{wyy} - 2BS_{wxy} + B^2 S_{wxx} \end{align*}

We can easily find \mathbf{D} it is a 1 by 1 matrix.

    \begin{align*} \mathbf{D} = (1/2) \cfrac{d^2}{dB^2} \chi^2 = S_{wxx} \end{align*}

Since \mathbf{C} is the inverse of \mathbf{D} we have

    \begin{align*} \mathbf{C} &= \cfrac{1}{S_{wxx}} \end{align*}

Therefore

    \begin{align*} {\sigma}_B &= \cfrac{1}{ \sqrt{ S_{wxx} } } \end{align*}

This is exactly what we got before with this model using different methods.

Example Calculate D_{jk} for the critical damping problem and then find the uncertainties in the estimates.
For this task, I definitely want to use Mathematica. I notice that D_{jk} is symmetric so we only have to calculate six quantities. To find the derivatives in Mathematica I type the following.

D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, A], A]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, B], B]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, C], C]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, A], B]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, A], C]
D[D[ (1/SS^2)*(y - (A + B*t)*Exp[-C*t])^2, B], C]
The results are implemented in MATLAB

wi = sigi.^(-2);
A=A(end);
B=B(end);
C=C(end);

E1 = exp(-C.*xi);
E2 = exp(-2.*C.*xi);
P1 = (A + B.*xi);
X2 = xi.*xi;
D(1,1) = sum( wi.*E2);
D(2,2) = sum( wi.*X2.*E2);
D(3,3) = sum( wi.*(E2.*X2.*P1.^2)-E1.*X2.*P1.*(-E1.*P1 + yi));
D(1,2) = sum( wi.*E2.*xi);
D(1,3) = sum( -wi.*E2.*xi.*P1 + wi.*E1.*xi.*(-E1.*P1 + yi));
D(2,3) = sum( -wi.*E2.*X2.*P1 + wi.*E1.*X2.*(-E1.*P1 +yi));

D
M = inv(D)
A
sqrt(M(1,1))
B
sqrt(M(2,2))
C
sqrt(M(3,3))

With five samples the values are usually pretty close to the true parameters. If you increase the number of samples Ni then you get even closer to the true values. I was expecting that the error bars would be larger and enclose the true values, but this is not always the case. The curve fit (dotted-dashed line) is visually pretty close to the true model, yet small differences remain. One can experiment with the code to try different starting models and initial guesses. This exercise was meant to show the LM method and how to implement it in MATLAB. You can easily now adapt the code to your curve fitting functions.


Levenberg Marquardt works. For this plot, the agreement with the true values is reasonable. A = 2.077(93), B = -5.17(12), and C = 1.026(16)

We have to get into further discussions of statistics later to determine what it means to have a good fit. This depends on the value of chi-squared and the number of degrees of freedom. For the meantime, we are stuck with chi by eye.

NonlinearLeastSquaresCurveFitting.pdf

Visit www.sciencemathmastery.com for more lessons on error analysis, probability, and statistics.

Generalized Linear Least Squares Curve Fitting

Summary

Least squares curve fitting can be generalized to work with any linear model to determine unique parameter estimates and uncertainties in terms of matrix equations. Linear models follow a general form. All other models are classified as nonlinear. While nonlinear models do not have a general analytic solution, some nonlinear models can be transformed into linear models. We give two examples of transforming an exponential model and a transformation of the enzymatic rate equation to straight line models using propagation of error. Polynomials form a class of linear models which we implement in MATLAB.

Linear and nonlinear models

In least square curve fitting, there is a step where we take derivatives to find the parameter estimates. If those equations are too complicated then we will have a hard time solving them. It is straightforward to solve these equations for so called linear models. Linear models are of the form

    \begin{align*} f(x_i; \mathbf{a})&= a_1 X_1(x_i) + a_2 X_2(x_i) + \ldots + a_M X_M(x_i) \end{align*}

One example of a linear model with M parameters is a polynomial.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 + a_2x_i + \ldots + a_{M}x_i^{M-1} \end{align*}

Other linear models might have the x_i replaced with general functions X_j(x_i). This gives a lot of different possible functions, but some are unnatural and do not occur in usual data sets.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 \sin(x_i) + a_2 e^{x_i} + \ldots + a_{M}\ln(x_i) \end{align*}

Linear means the parameter is to the first power out in front. The following model is not linear.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 \sin(a_2 x_i) + a_3 e^{a_4 x_i} \end{align*}

For models of the nonlinear form, the optimization problem is usually quite difficult to solve. We will consider what to do with nonlinear models later.

To perform a curve fit on a model with M parameters you need at least M different points. A good practice is to have many more measured points than parameters. When models are bloated with a lot of parameters you can usually fit a wide variety of functions. Keep it as simple as possible but not simpler. Use only as few parameters are necessary.

The generalized least squares problem

Now with our linear model of M parameters we would like to derive the best parameter estimates and their uncertainties in the least squares weighted framework. The model is.

    \[f(x_i;a_1,a_2,...,a_M) = f(x_i; \mathbf{a}) = \sum_{k=1}^M a_k X_k(x_i)\]

where X_k(x_i) are some generally defined continuous functions.

    \begin{align*} \chi^2 &= \sum_{i=1}^N \cfrac{({y}_i- \sum_{k=1}^M a_kX_k(x_i))^2}{{\sigma}_i^2} \end{align*}

To minizise \chi^2, we set each derivative equal to zero.

    \[\cfrac{\partial}{ \partial a_k} \chi^2=0\]

This gives a k= 1,2,\ldots, M linear system of equations. These “normal equations” give the optimal fit parameters \mathbf{a} which we can write as a vector.

    \[\sum_{i=1}^N \left ( y_i - \sum_{j=1}^M a_j X_j(x_i) \right) \cfrac{X_k(x_i)}{\sigma_i^2} =0\]

Taking terms on each side of the equal sign.

    \begin{align*} \sum_{i=1}^N \cfrac{y_i X_k(x_i)}{\sigma_i^2}=\sum_{i=1}^N \sum_{j=1}^M {a}_j X_j(x_i)\cfrac{X_k(x_i)}{{\sigma}_i^2} \end{align*}

It is useful to do some linear algebra to solve these equations for the parameters and their uncertainties. The “design matrix,” A_{ij}, is defined as the N row and M column matrix

    \begin{align*} A_{ij} = \cfrac{X_j(x_i)}{{\sigma}_i} \end{align*}

Second we define the column vector \mathbf{B} of M rows.

    \begin{align*} B_k = \sum_{i=1}^N \cfrac{{y}_iX_k(x_i)}{{\sigma}_i^2} \end{align*}

A third M\times M matrix, D_{jk} has inverse C_{jk}.

    \begin{align*} {D}_{jk} &= \sum_{i=1}^N \cfrac{X_j(x_i)X_k(x_i)}{{\sigma}_i^2} \to \mathbf{D} = \mathbf{A}^T \mathbf{A}\\ \mathbf{C}&= \mathbf{D}^{-1} \end{align*}

Now the normal equations can be rewritten as

    \begin{align*} \mathbf{D} \mathbf{{a}} = \mathbf{B} \end{align*}

To solve for the fit parameters \mathbf{{a}} it is easy to numerically compute the inverse in MATLAB.

    \begin{align*} \mathbf{{a}}& = \mathbf{D}^{-1} \mathbf{B} = \mathbf{C}\mathbf{B} \\ {a}_j &= \sum_{k=1}^M C_{jk} \left[ \sum_{i=1}^N \cfrac{y_iX_k(x_i)}{{\sigma}_i^2}\right] \end{align*}

General parameter uncertainties

Now, we can find the uncertainty in the parameters by the propagation of error formula.

    \begin{align*} {\sigma}_{a_j}^2 &= \sum_{i=1}^N \left( \cfrac{\partial a_j}{\partial {y}_i}\right)^2{\sigma}_i^2\\ \cfrac{\partial a_j}{\partial {y}_i}&= \sum_{k=1}^M \cfrac{C_{jk} X_{k}(x_i)}{{\sigma}_i^2}\\ {\sigma}^2_{a_j}&= \sum_{k=1}^M \sum_{l=1}^M C_{jk} C_{jl} \left(\sum_{i=1}^N \cfrac{X_k(x_i)X_l(x_i)}{{\sigma}_i^2} \right)\\ {\sigma}^2_{a_j}&=\sum_{k=1}^M \sum_{l=1}^M C_{jk} C_{jl} D_{lk} \end{align*}

But \displaystyle \sum_{l=1}^M C_{jl}D_{lk} = \delta_{jk} the Kroenecker delta because \mathbf{C} is the inverse of \mathbf{D}.

    \begin{align*} \delta_{ij} = \begin{cases} 1, & \text{if } i=j,\\ 0, & \text{if } i\neq j. \end{cases} \end{align*}

so the sum reduces to

    \begin{align*} {\sigma}^2_{{a}_j} = C_{jj} \end{align*}

These are the uncertainties in the fitting parameters.
The off diagonal terms of \mathbf{C} also give the covariances. If two different groups use the least squares theory on a linear model they should arrive at the same results for the fitting parameters and their uncertainties which are derived here. To summarize the results for linear least squares models

    \begin{align*} {a}_j &= \sum_{k=1}^M C_{jk} \left[ \sum_{i=1}^N \cfrac{{y}_iX_k(x_i)}{{\sigma}_i^2}\right] \qquad {\sigma}_{{a}_j} = \sqrt{C_{jj}} \end{align*}

Linearizing nonlinear models

Sometimes data following a nonlinear relationship may be transformed to a linear model. One can then estimate the transformed parameters exactly in conjunction with propagation of error. Here is a well known example from biochemistry for the velocity of a chemical reaction illustrating linearization.

    \begin{align*} V = \cfrac{V_{max}[S]}{K_m + [S]} \end{align*}

This equation can be rewritten as

    \begin{align*} \cfrac{1}{V} = \cfrac{K_m}{V_{max}}\cfrac{1}{[S]} + \cfrac{1}{V_{max}} \end{align*}

We start with ([S]_i, V_i, \sigma_{V_i}) as the set of information. \sigma_{V_i} is the standard deviation of the mean of the repeated measurement of V_i. The transformed variables are

    \begin{align*} x_i &= 1/[S]_i \quad y_i = 1/V_i \quad \sigma_i = \cfrac{\sigma_{V_i}}{V_i^2}\\ a_2 &= \cfrac{K_m}{V_{max}} \qquad a_1 = \cfrac{1}{V_{max}} \end{align*}

Propagation of error for a single variable has been used to find \sigma_i. One can then weighted fit to a line to find a_1, a_2, \sigma_{a_1}, and \sigma_{a_2}. K_m and V_{max} can then be found also by propagation of error. This transformation generates a linear plot called the Lineweaver-Burke plot.

Another straightforward example of linearization involves exponential decay.

    \begin{align*} N(t) = N_0e^{-kt} \end{align*}

One has the set of information (t_i,N_i,\sigma_{N_i}) where \sigma_{N_i} are the standard deviations of the mean.
Now take the logarithm of both sides.

    \begin{align*} \log(N_i) = \log(N_0) - kt_i \end{align*}

Now we transform

    \begin{align*} x_i &= t_i \quad y_i = \log(N_i) \quad \sigma_i = \sigma_{N_i}/N_i\\ a_1 &= \log(N_0) \quad a_2= -k \end{align*}

If the error bars were originally similar for most points on the linear plot they will be stretched out at longer times after transformation. After finding a_1 and \sigma_{a_1}, you can use error propagation to find N_0 and its uncertainty.

    \[\sigma_{N_0}= e^{a_1}\sigma_{a_1}\]

Example. Fit some simulated exponentially decaying data with linear least squares by linearizing the model.
We generate some model data which are the average of five points of y = 100e^{-t} + normrnd(0,4).

clear;
ti = 0.0:0.1:2.5
N0 = 100;
L = 1;
Ni = 5;
for i = 1:length(ti)
temp = [];
for j = 1:Ni
temp(j) = N0*exp(-L*ti(i))+normrnd(0,4);
end
yi(i) = mean(temp);
ei(i) = std(temp)/sqrt(Ni);
end

figure(1);clf;
errorbar(ti,yi,ei,’.k’);
axis([-0.5 3 0 125]);
xlabel(‘x’);
ylabel(‘N’);

figure(2);clf;
zi = log(yi);
si = ei./yi;
errorbar(ti,zi,si,’.k’);
axis([-0.5 3 0 5])
xlabel(‘t’);
ylabel(‘z’);
hold on;

[A sigA B sigB chi2] = lineABx(ti, zi, si);
XX = 0:0.1:3
YY = A + XX.*B;
plot(XX,YY,’k’);

estN = exp(A)
estsigN = exp(A)*sigA
estL = -B
estsigL = sigB

figure(1);
hold on;
XX = 0:0.01:3
YY = estN.*exp(-estL.*XX);
plot(XX,YY,’k’);


Exponential decay with best fit


Linearization and weighted fit

Least squares polynomial

Example. Implement a polynomial least squares fitting function in MATLAB.
Here is the MATLAB function we will use.

function [ A sigA ] = genpolyfit( M, xi, yi, sigi )
% [A sigA ] = genpolyfi( M, xi, yi, sigi)
% Least sqaures polynomial fit to degree M-1
%

for k = 1:M
X(k,:) = xi.^(k-1);
end

for j = 1:M
for k = 1:M
D(j,k) = sum(sigi.^(-2).*X(j,:).*X(k,:));
end
end

for k = 1:M
B(k) = sum(X(k,:).*yi.*sigi.^(-2));
end

A = inv(D)*B’;
C = inv(D);
for k = 1:M
sigA(k) = sqrt(C(k,k));
end

Example. Simulate some data that can be fit by a parabola.
We can use the least square polynomial function.

clear;figure(1);clf;
A = 1;
B = 2;
C = 3;
Ni = 5;
xi = -10:1:10;
for i = 1:length(xi)
temp = [];
for j = 1:Ni
temp(j) = A + B*xi(i) + C*xi(i).^2 + normrnd(0,10);
end
yi(i) = mean(temp);
si(i) = std(temp)/sqrt(Ni);
end
figure(1);clf;
errorbar(xi,yi,si,’.k’, ‘markersize’,10);
xlabel(‘x’)
ylabel(‘y’)
hold on;
[A sigA] = genpolyfit(3,xi,yi,si);
x1 = -11:0.1:11;
y1 = A(3).*x1.^2 + A(2).*x1 + A(1);
plot(x1,y1, ‘r’);
axis( [ -11 11 -50 1.1*max(y1)]);’


Curve fit to a parabola

GeneralizedLeastSquaresCurveFitting.pdf

Linear Least Squares Curve Fitting of Straight Line Models

Summary

Curve fitting is a useful procedure for testing theoretical models and estimating model parameters and their uncertainties. Curve fitting is carried out by choosing the set of estimated parameters that minimize an error function. The conventional error function for analytic results with calculus is the sum of the squares of residuals. The residuals are the difference between the model prediction and the experimental observation. Weighted fitting takes into account the size of the error bars where measurements with smaller error bars count more heavily towards the minimization. Weighted fits are more robust to the effects of outlying points which generally have larger error bars. The weighted mean is a least squares procedure, fit to y = A, which is used to combine different experimental results of the same quantity. This is how CODATA and NIST calculate the accepted fundamental constants of nature from various experiments. Curve fitting is a tedious calculation and is best done with a software package like MATLAB. We illustrate how to fit the various linear models that are discussed with simulated data in MATLAB. One can either put the total data set into the curve fit or average repeated measurements first and fit those.

The least squares error formula for curve fitting

Suppose we have a set of measurements (x_i, y_i) and we assume the x_i span some range on the x axis, but there may be repetitions. This is in fact recommended. Always repeat measurements so you can calculate their uncertainties directly from the data. We will not factor in how to use the uncertainties the best way in curve fitting yet, but we will consider it later. If we have a model f = A + Bx we define a residual as the difference between the model and the measurement

    \[r_i = y_i - A - Bx_i\]

or in general

    \[r_i = y_i - f(x_i;A,B,\ldots)\]

Now we can start defining some trial error functions. We could choose to minimize the residuals directly.

    \[E = \sum_{i=1}^N (y_i - A - Bx_i)\]

The problem with this error function is that some of the errors will be positive and some will be negative. They basically cancel each other out and you can’t find the best A and B. If we wanted to make the signs the same then we could try absolute values.

    \[E = \sum_{i=1}^N |y_i - A - Bx_i|\]

The problem with this error function is that taking derivatives of absolute value functions is nasty. You can’t get many nice results with calculus using this error function. The next simplest modification of the error function is to take

    \[E = \sum_{i=1}^N (y_i - A - Bx_i)^2\]

Then we look for the least squares of the residuals. This is the convention that most people use, but it is not totally arbitrary. We will discuss in a more advanced lesson some justification for the least squares method.

Fitting y = A + Bx

Expanding the 9 terms in E for f(x_i; A,B) = A + Bx) gives

    \[E = \sum_{i=1}^N (A^2 + 2AB x_i + B^2 x_i^2 - 2A y_i -2B x_i y_i + y_i^2)\]

Using the “S” notation we can write E as

    \[E = A^2S_1 + 2ABS_x + B^2 S_{xx} -2A S_y -2B S_{xy} + S_{yy}\]

where S_1 = N and S_{xy} = \sum_{i=1}^N x_iy_i etc.
The way we find the minimum of E is to take the partial derivatives with respect to A and B and set them equal to zero.

    \begin{align*} \cfrac{\partial }{\partial A} \chi^2 = 0 \qquad \cfrac{\partial }{\partial B} \chi^2 = 0 \end{align*}

This gives two equation which can be solved for A and B.

    \begin{align*} {A} S_1 + {B} S_{x}  &=  S_{y} \\ {A} S_{x} + {B}S_{xx} & =  S_{xy}  \end{align*}

This equation can be written as a matrix equation.

    \[\left( \begin{array}{cc} S_1 & S_x \\ S_x & S_{xx} \end{array}\right)  \left( \begin{array}{c} A \\ B\end{array} \right)=  \left( \begin{array}{c} S_y \\ S_{xy}\end{array} \right)\]

As long as the determinant of the first matrix on the left is not zero we can solve these linear equations with Cramer’s rule. The condition is basically that two of the x_i be different. You can’t make any sense of the fit if all the measurements are just at one point. You need two different x_i to define a linear function.

    \begin{align*} {A} &= \cfrac{   \left| \begin{array}{cc} S_{y} & S_{x}  \\ S_{xy} & S_{xx}   \end{array} \right| }{   \left| \begin{array}{cc} S_1 & S_{x}  \\ S_{x} & S_{xx}   \end{array} \right|} =  \cfrac{S_{y}S_{xx} - S_{x}S_{xy}}{S_1 S_{xx} - S_{x}^2} \\ \\ {B} &= \cfrac{   \left| \begin{array}{cc} S_{1} & S_{y}  \\ S_{x} & S_{xy}   \end{array} \right| }{   \left| \begin{array}{cc} S_1 & S_{x}  \\ S_{x} & S_{xx}   \end{array} \right|} =  \cfrac{S_{1}S_{xy} - S_{x}S_{y}}{S_1 S_{xx} - S_{x}^2} \end{align*}

A common expression is \Delta = NS_{xx} - S_x^2.

Now we find the uncertainties in A and B by using the propagation of error formula.

    \begin{align*} {\sigma}_A^2 &= \sum_{i=1}^N \left( \cfrac{\partial A}{\partial {y}_i}\right)^2 {\sigma}_i^2 \quad  {\sigma}_B^2 = \sum_{i=1}^N \left( \cfrac{\partial B}{\partial {y}_i}\right)^2 {\sigma}_i^2  \end{align*}

If we have not repeated the measurements at each x_i several times then we don’t know what the \sigma_i are. This is a good justification for repeating the measurements. If we used all different x_i then we can just assume that \sigma_i=\sigma_y is the same for all the points. If we use some arguments from statistics we can say

    \[E \approx N-2 \approx  \cfrac{\displaystyle \sum_{i=1}^n( y_i - A - Bx_i)^2}{\sigma_y^2}\]

There are N points, but two degrees of freedom less because we have two fit parameters. The expression in each parenthesis is roughly equal to \sigma_y^2. So we can write

    \[\sigma_y^2 = \cfrac{1}{N-2}  \sum_{i=1}^n( y_i - A - Bx_i)^2\]

Now we can do the calculation of the uncertainty. We have

    \[\sum_{i=1}^N y_i = y_1 + y_2 + \ldots + y_n\]

I define 1_i = \cfrac{\p y}{\p y_i}. It has the properties that 1_i=1 for each i. \sum_{i=1}^N 1_i = N, (1_i)^2 = 1_i, \sum_{i=1}^N (1_i) x_i = \sum_{i=1}^Nx_i

    \[\left( \cfrac{ \p A}{ \p y_i} \right) = (1_iS_{xx} - S_x x_i)/\Delta\]

    \[\left( \cfrac{ \p A}{ \p y_i} \right)^2 = (1_i)^2S_{xx}^2 +S_x^2 x_i^2 - 2(1_i)S_{xx} S_x^2 x_i^2)/\Delta^2\]

    \[\sum_{i=1}^N \left( \cfrac{ \p A}{ \p y_i} \right)^2 = (NS_{xx}^2 + S_x^2 S_{xx} -2S_{xx} S_x^2)/\Delta^2\]

    \[{\sigma}_A^2 = (S_{xx}/\Delta) \sigma_y^2\]

    \[\sigma_A = \sigma_y\sqrt{\cfrac{S_{xx}}{\Delta}}\]

A similar calculation for \sigma_B gives

    \[\sigma_B =  \sigma_y \sqrt{\cfrac{N}{\Delta}}\]

If we know each \sigma_i then it makes sense to just do the weighted calculation which we will describe later in this lesson.

Fitting y = Bx

For a model like Ohm’s law, V=IR, we want to fit a straight line forced to pass through the origin, y=Bx.
The calculus is similar.

    \[E =  \sum_{i=1}^N  (y_i - Bx_i)^2  = S_{wyy} - 2BS_{wxy} + B^2 S_{wxx}\]

Minimizing for B using \cfrac{d \chi^2}{dB} = 0 we find

    \[-2S_{wxy} + 2\bar{B} S_{wxx} = 0\]

    \[{B} = \cfrac{S_{wxy}}{S_{wxx}}\]

The propagation of error formula gives the uncertainty of B.

    \[\sigma_B^2 = \sum_{i=1}^N \left( \cfrac{\p B}{\p y_i}\right)^2 \sigma_i^2\]

Again if we don’t know the uncertainties we can use

    \[\sigma_y^2 = \cfrac{1}{N-1} \sum_{i=1}^N (y_i - Bx_i)^2\]

    \[\left( \cfrac{\p B}{\p y_i}\right) = x_i/S_{xx}\]

    \[\left( \cfrac{\p B}{\p y_i}\right)^2 = x_i^2/S_{xx}^2\]

    \[\sigma_B^2 = \sum_{i=1}^N  \left( \cfrac{\p B}{\p y_i}\right)^2 \sigma_y^2 = \cfrac{\sigma_y^2}{S_{xx}}\]

    \[\sigma_B = \cfrac{\sigma_y}{S_{xx}}\]

Fitting y = A

The last model we will consider is a horizontal line.

    \[E = \sum_{i=1}^N (y_i - A)^2 = S_{yy} - 2A S_y + A^2 S_1\]

.
So we take the first derivative E'(A) = 0 and find

    \[0 = -2S_y + 2A S_1\]

or

    \[A = S_y/S_1 = S_y/N = \cfrac{ y_1 + y_2 +\ldots + y_N}{N} = \bar{y}\]

That is just the arithmetic mean for A. We can find the standard deviation in the mean by the same procedure we used before.

    \[\left( \cfrac{\p A }{\p y_i}\right) = (1_i)/N\]

    \[\sum_{i=1}^N \left( \cfrac{\p A }{\p y_i}\right)\sigma_y^2 = \sigma_y^2 \sum_{i=1}^N (1_i)^2/N^2 = \sigma_y^2 /N\]

We use for \sigma_y where A = \bar{y}.

    \[\sigma_y^2 = \cfrac{1}{N-1} \sum_{i=1}^n ( y_i - \bar{y})^2\]

So the standard deviation of the mean is given by

    \[\bar{\sigma} = \sigma_y/\sqrt{N}\]

.

Weighted fits and using chi squared

It is more useful to do curve fitting when you have repeated the measurements. Then you can know what the different standard deviations are. If the standard deviation is large then you don’t want to count those points as much as the ones for which the standard deviation is small. Another method of doing least squares curve fitting is to weight the residuals by w_i = 1/\sigma_i^2. This sum is called chi squared, \chi^2.

    \[\chi^2 = \sum_{i=1}^N \cfrac{(y - f(x_i;A,B))^2 }{\sigma_i^2}\]

This does not change the math so much that we cannot solve for A and B. You just have to replace each S_? formula with S_{w?}. For example, S_1 \to S_w, S_x \to S_{wx}, and so on.

There are slight differences in how you interpret chi squared as a statistic if you use each pair of data points in the equation or first average the points and calculate their standard deviation of the mean. This boils down to using (x_i,y_i,\sigma_i) or (x_i,y_i = \bar{y}_i, \sigma_i = \bar{\sigma}_i). Often you don’t have access to the individual data points if you look at someone else’s data, so you have to use the average values and the standard deviations of the mean instead of (y_i, \sigma_i). We can then summarize the results from before with the new weighting scheme.

For y = A + Bx.

    \begin{align*}  {A} &= \cfrac{S_{wy}S_{wxx} - S_{wx}S_{wxy}}{S_w S_{wxx} - S_{wx}^2} \qquad &&{\sigma}_A = \sqrt{\cfrac{S_{wxx}}{S_w S_{wxx} - S_{wx}^2} }\\  {B} &= \cfrac{S_{w}S_{wxy} - S_{wx}S_{wy}}{S_w S_{wxx} - S_{wx}^2} \qquad &&{\sigma}_B = \sqrt{\cfrac{S_w}{S_w S_{wxx} - S_{wx}^2} } \end{align*}

For y = Bx. We have B = \cfrac{S_{wxy}}{S_{wxx}}. So we can calculate the uncertainty.

    \[\left( \cfrac{ \p B}{ \p y_i} \right) = w_ix_i/S_{wx}\]

    \[\left( \cfrac{ \p B}{ \p y_i} \right)^2 w_i^{-1} = w_i x_i^2/ S_{wxx}^2\]

    \[\sigma_B^2 = \sum_{i=1}^N \left( \cfrac{ \p B}{ \p y_i} \right)^2 w_i^{-1} = \cfrac{1}{S_{wxx}}\]

    \[\sigma_B = \cfrac{1}{\sqrt{S_{wxx}}}\]

Probably the most useful weighted formula is for the weighted mean when y = A= y_w. This allows you to average together measurements.

    \[\chi^2 = \sum_{i=1}^N \cfrac{ (y_i - y_w)^2}{\sigma_i^2} = \sum_{i=1}^N w_i(y_i - y_w)^2\]

    \[\chi^2 = S_{wyy} - 2S_{wy}y_w + S_w y_w^2\]

Taking \cfrac{d\chi^2}{d y_w} = 0 we find

    \[y_w = \cfrac{ S_{wy}}{S_w} = \cfrac{ w_1 y_1 + w_2 y_2 + \ldots w_N y_N}{w_1 + w_2 + \ldots + w_N}\]

This is referred to as the weighted mean. We can find the uncertainty in the weighted mean

    \[\sigma_{y_w} ^2= \sum_{i=1}^N ( w_i/S_w)^2w_i^{-1} = 1/S_w\]

    \[\sigma_{y_w} = \cfrac{1}{\sqrt{ w_1 + w_2 + \ldots + w_N}}\]

In the limit, that all w_i are the same we recover the arithmetic mean and the standard deviation of the mean for the uncertainty.

This formula is quite useful. If you have a series of accurate measurements some of which are precise and others not so much you can combine them to get a more accurate and precise average based on the weighting. Most people are probably already familiar with this in terms of calculating their grades. Say 25 percent for homework, 25 percent for the midterm and 50 percent for the final. Then an average score might look like

    \[\cfrac{ 87 \times 25 + 95 \times 25 + 88 \times 50}{25 + 25 + 50} = 89.5\]

Pretty close, let’s give them an A minus anyways.

MATLAB functions for straight line curve fitting

y = A + Bx

Example. Write a Matlab function that fits a straight line y = A + Bx through a dataset.
We can easily compute all the sums in Matlab. Computing the sums by hand would be quite tedious.

function [ A sigA B sigB chi2] = lineABx( xi, yi, sigi )
%[A sigA B sigB chi2] = lineBx(xi, yi, sigi)
% Least sqaures fit to a line y = A + Bx
wi = sigi.^(-2);
Sw = sum(wi);
Swx = sum(wi.*xi);
Swxx = sum(wi.*xi.*xi);
Swy = sum(wi.*yi);
Swxy = sum(wi.*xi.*yi);
Delta = Sw*Swxx – Swx^2;
A = (Swy*Swxx – Swx*Swxy)/Delta;
B = (Sw*Swxy – Swx*Swy)/Delta;
sigA = sqrt(Swxx/Delta);
sigB = sqrt(Sw/Delta);
chi2 = sum( wi.*(yi – A- B.*xi).^2)
end

Example. Write a Matlab function that fits an unweighted straight line y = A + Bx through a dataset.
We have

function [ A sigA B sigB] = lineABxU( xi, yi )
%[A sigA B sigB] = lineBx(xi, yi)
%Unweighted least sqaures fit to a line y = A + Bx
S = length(xi);
Sx = sum(xi);
Sxx = sum(xi.*xi);
Sy = sum(yi);
Sxy = sum(xi.*yi);
Delta = S*Sxx – Sx^2;
A = (Sy*Sxx – Sx*Sxy)/Delta;
B = (S*Sxy – Sx*Sy)/Delta;
sigma = sqrt(1/(length(xi)-2)*sum((yi – A – B.*xi).^2));
sigA = sigma*sqrt(Sxx/Delta);
sigB = sigma*sqrt(S/Delta);
end

y = Bx

Example. Write a Matlab function to fit a straight line through the origin
We can evaluate the sums easily in Matlab.

function [ B sigB ] = lineBx( xi, yi, sigi )
%[B sigB chi2] = lineBx(xi, yi, sigi)
%Weightef least sqaured fit to a line y = Bx
wi = sigi.^(-2);
Swxy = sum(wi.*xi.*yi);
Swxx = sum(wi.*xi.*xi);
B = Swxy/Swxx;
sigB = 1/sqrt(Swxx);
chi2 = sum( wi.*(yi – B.*xi).^2)
end

Example. Write a Matlab function to fit an unweighted straight line through the origin
The code for the unweighted fit is as follows

function [ BU sigBU ] = lineBxU( xi, yi )
%[B sigB] = lineBx(xi, yi )
%Weightef least sqaured fit to a line y = Bx
Sxy = sum(xi.*yi);
Sxx = sum(xi.*xi);
BU = Sxy/Sxx;
variance = 1/length(yi)* sum( (yi – BU).^2 );
sigma = sqrt(variance);
sigBU = sigma/sqrt(Sxx);
end

y = A

Example. Write a Matlab function to calculated the weighted mean and its uncertainty.
Implementing the theory we have

function [ A sigA chi2] = lineA( xi, yi, sigi )
%[A sigA chi2] = lineA(xi, yi, sigi)
%Least sqaures fit to a constant y = A
wi = sigi.^(-2);
Sw = sum(wi);
Swy = sum(wi.*yi);
A = Swy/Sw;
sigA = 1/sqrt(Sw);
chi2 = sum( wi.*(yi – A).^2 );
end

MATLAB curve fitting with simulated data

Example. Illustrate a weighted linear fit to y = A + Bx.
For the purpose of simulating, data we will assume a linear model plus random gaussian noise. The gaussian distribution noise can be generated with the command \mathrm{normrnd}(\mu, \sigma)=\mathrm{normrnd}(0,3). We will
take N_i = 5 measurements at each point for x = 1,2,3,\ldots, 10. With the model y = 10 + 3x. The goal of this simulation is to compare fitting the measurements and their standard deviations with the means at each x and their standard deviation of the mean. It will turn out that fitting either way gives identical estimates.

clear; figure(1); clf;
format long;

A = 10;
B = 3;
mu = 0;
sig = 3;
Ni = 5;

x = 1:1:10
allx = [];
ally = [];
allysig = [];
avgy = [];
for i = 1:length(x)
xtemp = [];
temp = [];
tempsig = [];
for j = 1:Ni
err = normrnd(mu,sig);
temp(j) = A + B*x(i) + err;
xtemp = [xtemp x(i)];
end
sigy1 = std(temp);
allx = [allx xtemp];
ally = [ally temp];
allysig = [allysig ones(1,Ni)*sigy1 ];
avgy(i) = [mean(temp)];
avgysig(i) = [sigy1/sqrt(Ni)];
end

[A1 sigA1 B1 sigB1 chi1] = lineABx(allx,ally,allysig);
[A2 sigA2 B2 sigB2 chi2] = lineABx(x, avgy, avgysig);

figure(1);
plot(allx, ally,’.k’,’markersize’,20);
hold on;
plot(x, avgy, ‘xr’,’markersize’,20);
axis( [ 0 11 0 1.1*max(ally)]);
xlabel(‘x’,’fontsize’,20);
ylabel(‘y’,’fontsize’,20);
x1 = [0 11]
y1 = [ A1 (A1 +B1*11)];
plot(x1,y1);

You can check the fit parameters A1, A2, B1, and B2 to see that they are numerically the same. Using the measurements with the standard deviations is equivalent to using the means with the standard deviations of the mean. Whichever you choose to display is up to you. In physics usually one shows the standard error of the mean because the samples are assumed to be identical and there are often a surplus of points. It might make more sense in biology to show the actual variation of the samples where they can be different and the number of measurements is less. You can modify the code by increasing N_i to show that the parameter estimates become more accurate and precise as N_i is increased.


Model y = 10 + 3x + normrnd(0,3), with five samples using standard deviation and measurements

Model y = 10 + 3x + normrnd(0,3), with five samples using means and standard deviation of the means. We obtain the same fit as before

Example. Show the effect of an outlying point when a linear fit is weighted or unweighted.
We will use simulated data like before but where the last point is an outlier. We will show the effect in an unweighted fit and a weighted fit which is more robust and fits the data better.

clear;
format long;
x = 1:10
y = [ 13.69 14.9 17.1 24.4 25.4 26.6 29.48 32.0 37.36 48.0 ]
sigy = [ 0.58 1.0 1.0 1.2 1.6 1.4 0.86 1.8 0.84 8.1 ]

figure(1);clf;
errorbar(x,y,sigy,’.k’,’markersize’,10);
hold on;
axis( [ 0 11 0 1.3*max(y)]);
xlabel(‘x’,’fontsize’,20);
ylabel(‘y’,’fontsize’,20);
title(‘Effect of an outlying point on unweighted fit’,’fontsize’,18);
[A1 sigA1 B1 sigB1 chi1] = lineABx(x,y, sigy);
[A2 sigA2 B2 sigB2 ] = lineABxU(x, y);

x1 = [0 11]
y1 = [ A1 (A1 +B1*11)];
plot(x1,y1,’k-‘);
x2 = [0 11]
y2 = [ A2 (A2 +B2*11)];
plot(x2,y2,’k-.’);


Model y = 10 + 3x + normrnd(0,3), with five samples, standard deviation of the mean used and an outlying point. The outlying point shifts the slope of the unweighted fit, but does not effect the weighted fit too much.

LinearLeastSquaresCurveFittingOfStraightLineModels.pdf

Propagation of Error with Single and Multiple Independent Variables

Summary

Often it is necessary to calculate the uncertainty of derived quantities. This procedure and convention is called propagation of error. Calculating the propagation of error formula requires knowing how to take derivatives/ We give the propagation of error formulas for a single variable and multiple independent variable functions. We list some common error propagation formulas that work in most cases. Step by step error propagation can often be used as a shortcut to calculate the uncertainty.
If we know the error propagation formula we can find the terms that are causing the most uncertainty and try to reduce these dominant errors. It is useful to write a short MATLAB script to calculate error propagation, especially when the formula is complex or a lot of significant digits are involved.

How error propagates

As a math problem, I ask you to add 5.2 and 10.11. The result is straightforward 15.31. The numbers have different numbers of digits and precision. 5.2 is precise to the tenths while 10.11 is precise to the hundredths. This is not a problem in mathematics, but the situation is more subtle in error analysis. To see what is going on you need to look at the uncertainties.

Now, if I ask you to add 5.2 \pm 1.2 and 10.11 \pm 0.76 then what do you do? The first thing you want to ask yourself is if these numbers are properly reported measurements. After a quick check, we see there are two digits of uncertainty that match the precision of the measurement. In error analysis, you still add the numbers to find the sum, but to find the combined error we need a propagation of error formula. In this situation, the rule is

    \[{\sigma}_{A + B} = \sqrt{ {\sigma}_A^2 + {\sigma}_B^2 } = 1.4204\ldots\]

This chapter is all about finding the rules like this. We can now calculate the uncertainty {\sigma}_{A+B} = 1.4204 \to 1.4 and round to two significant digits. Then we match the precision of the calculated quantity to the uncertainty. Our final result is that

    \[(5.2 \pm 1.2) + (10.11 \pm 0.76) = 15.31 \pm 1.4204 \to 15.3 \pm 1.4\]

Notice that if you always start with uncertainties that have two significant digits, you get out a result that has two significant digits of uncertainty. You don’t have to worry that the first number 5.2 has only two significant digits, but the final result 15.3 has three significant digits. This comes about naturally by the golden rule of two significant digits of uncertainty.

There are error propagation rules for single variable functions and multiple variable functions. The single variable situation is not too complicated. You just need to understand differential calculus. For the situation with multiple variables functions, you have to worry if the variables are correlated or independent. In most cases, we don’t have to worry about correlations and the formulas are straightforward.

Rules of differential calculus refresher

We now do a little review of differentiation because we need derivatives for error propagation. Ordinary derivatives are needed for a single variable function, and partial derivatives are needed for multiple variable functions. The derivative is defined in calculus as

    \begin{align*} f'(x) = \cfrac{df}{dx} = \lim_{\Delta x \to 0} \cfrac{ f( x + \Delta x) - f(x)}{\Delta x} \end{align*}

The following list contains the basic formulas to master differential calculus.

    \[c' = 0 \qquad x' = 1 \qquad (x^n)' = nx^{n-1} \qquad |x|' = \cfrac{|x|}{x}\]

    \[(cu)' = cu' \qquad (c_1u +c_2 v)' = c_1u' + c_2v'\]

    \[(uv)' = u'v + uv' \qquad (1/u)' = -u'/u^2 \qquad (u/v)' = \cfrac{ u'v - uv'}{v^2}\]

    \[f(u(x))' = \cfrac{df}{du}\cfrac{du}{dx} = f'(u)u'(x)\]

    \[(e^x)' = e^x \qquad (a^x)' = a^x \ln a \qquad (\ln x)' = \cfrac{1}{x} \qquad (\log_a x)' = \cfrac{1}{x \ln a}\]

    \[\cfrac{d}{dx} \sin x = +\cos x \quad \cfrac{d}{dx} \tan x = +\sec^2 x \quad \cfrac{d}{dx} \sec x = +\sec x \tan x\]

    \[\cfrac{d}{dx} \cos x = -\sin x \quad \cfrac{d}{dx} \cot x = -\csc^2 x \quad \cfrac{d}{dx} \csc x = -\csc x \cot x\]

    \[\cfrac{d}{dx} \sinh x = +\cosh x \quad \cfrac{d}{dx} \tanh x = +\sech\!^2 x \quad \cfrac{d}{dx} \sech x = - \sech x \,\tanh x\]

    \[\cfrac{d}{dx} \cosh x =+\sinh x \quad \cfrac{d}{dx} \coth x = -\csch\!^2 x \quad \cfrac{d}{dx} \csch x = - \csch x \,\coth x\]

    \[\cfrac{d}{dx} \arcsin x = +\cfrac{1}{\sqrt{ 1 - x^2}} \quad \cfrac{d}{dx} \arctan x = +\cfrac{1}{1 + x^2} \quad \cfrac{d}{dx} \arcsec x = +\cfrac{1}{|x| \sqrt{ x^2-1}}\]

    \[\cfrac{d}{dx} \arccos x = -\cfrac{1}{\sqrt{ 1 - x^2}} \quad \cfrac{d}{dx} \arccot x = -\cfrac{1}{1 + x^2} \quad \cfrac{d}{dx} \arccsc x = -\cfrac{1}{|x| \sqrt{ x^2-1}}\]

    \[\cfrac{d}{dx} \arcsinh x = +\cfrac{1}{\sqrt{ x^2+1}} \quad \cfrac{d}{dx} \arctanh x = +\cfrac{1}{1-x^2} \quad \cfrac{d}{dx} \arcsech x = -\cfrac{1}{x \sqrt{ 1-x^2}}\]

    \[\cfrac{d}{dx} \arccosh x = +\cfrac{1}{\sqrt{ x^2-1}} \quad \cfrac{d}{dx} \arccoth x = +\cfrac{1}{1-x^2} \quad \cfrac{d}{dx} \arccsch x = -\cfrac{1}{|x| \sqrt{1+x^2}}\]

The partial derivative is like the normal derivative except you hold the other variables constant when you take it. Here is the definition

    \[\cfrac{ \partial f(x,y)}{ \partial x} = \lim_{\Delta x \to 0} \cfrac{f(x + \Delta x, y) - f(x,y)}{\Delta x}\]

    \[\cfrac{ \partial f(x,y)}{ \partial y} = \lim_{\Delta y \to 0} \cfrac{f(x, y+ \Delta y) - f(x,y)}{\Delta y}\]

For example, if we wanted to find all the partial derivatives of f(x,y,z) = xy^2z^3 we would use the power rule (x^n)' = nx^{n-1}.
There are three possible partial derivatives

    \[\cfrac{ \partial f}{ \partial x} = y^2 z^3 \quad \cfrac{ \partial f}{ \partial y} = 2xyz^3 \qquad \cfrac{ \partial f}{\partial z} = 3xy^2 z^2\]

Error propagation with one variable

If we have measured Q \pm \sigma_Q then the uncertainty in Q(A) is given by

    \[{\sigma}_Q = \left | Q'({A}) \right | {\sigma}_A\]

I assume you are familiar with differentiation. One can use a program like Mathematica or your graphing CAS calculator to find derivatives.

Example. The volume of a cube from error propagation.
Suppose a machinist has constructed a precise cube on a milling machine, and you want to find its volume. You repeatedly measure one of the sides to be

    \[s = 1.053 \pm 0.010 \, \mathrm{cm}\]

What is the volume and its uncertainty assuming the length, width, and height are identical?

We calculate the volume as V = s^3 and wait to round until we know the uncertainty.

    \[V = (1.053\, \mathrm{cm})^3 = 1.16757587\ldots \, \mathrm{cm}^3\]

To find the uncertainty we use the error propagation formula

    \begin{align*} {\sigma}_V &= |V'(s)| {\sigma}_s = 3s^2 {\sigma}_s \\ {\sigma}_V&= 3(1.053 \, \mathrm{cm})^2(0.010\, \mathrm{cm}) = 0.033502\, \mathrm{cm}^3 \end{align*}

After rounding we get V = 1.168(34)\, \mathrm{cm}^3.

Error propagation with multiple independent variables

Often we want to calculate some quantity that requires a series of individual measurements that must be combined. For example, if we wanted to calculate A/B we would first measure A then measure B and then we could calculate A/B. If we are doing error analysis, then to find the error in A/B we will need to apply a formula of error propagation. There is a significant simplification if the two quantities are thought to be independent like the charge and mass of an electron. The error propagation formula for multiple independent variables labeled a_1, a_2, \ldots, a_M is given by

    \[{\sigma}_f^2 = \sum_{j = 1}^M \left( \cfrac{ \partial f}{ \partial a_j}\right)^2 {\sigma}_{a_j}^2\]

We will often use the notation that a_1 = A \quad a_2 = B for simplification when we do calculations. Propagation of error for two independent variables is given by

    \[{\sigma}_f^2 = \left( \cfrac{ \partial f}{\partial A}\right)^2{\sigma}_A^2 + \left( \cfrac{ \partial f}{\partial B}\right)^2{\sigma}_B^2\]

Example. Show that {\sigma}_{A \pm B} = \sqrt{{\sigma}_A^2 + {\sigma}_B^2}.
Evaluating the partial derivatives

    \[\cfrac{ \partial (A \pm B)}{\partial A} = 1 \qquad \cfrac{\partial(A \pm B)}{\partial B} =\pm 1\]

Applying the error propagation formula we have

    \[{\sigma}_{A \pm B} = \sqrt{ (1)^2 {\sigma}_A^2 + (\pm 1)^2 {\sigma}_B^2 } = \sqrt{ {\sigma}_A^2 + {\sigma}_B^2 }\]

Example. Find the perimeter of a rectangle of l = 1.25(22) and w = 4.44(33).
The perimeter is equal to P = 2( l + w )
We can find the perimeter by plugging in the values
P = 2(1.25 + 4.44) = 11.3800\ldots
We can find the uncertainty by applying the propagation of error formula

    \[{\sigma}_P^2 = \left( \cfrac{\partial P}{\partial l}\right)^2 {\sigma}_l^2 + \left( \cfrac{\partial P}{\partial w}\right)^2 {\sigma}_w^2\]

    \[\sigma_P = \sqrt{ 4 \sigma_l^2 + 4 \sigma_w^2} = \sqrt{ 4\times 0.22^2 + 4\times 0.33^2} = 0.7932\ldots\]

    \[\sigma_P \to 0.79\]

Our final answer is P = 11.38 \pm 0.79

Example. If Q = AB, then show that

    \[{\sigma}_{AB} = |AB| \sqrt{\cfrac{{\sigma}_A^2}{A^2} + \cfrac{{\sigma}_B^2}{B^2}}\]

Evaluating the partial derivatives

    \[\cfrac{ \partial (AB)}{\partial A} = B \qquad \cfrac{\partial(AB)}{\partial B} = A\]

Applying the error propagation formula we have

    \[{\sigma}_{AB} = \sqrt{ B^2 {\sigma}_A^2 + A^2 {\sigma}_B^2 }\]

Divide both sides by |AB|.

    \[\cfrac{{\sigma}_{AB}}{|AB|} = \cfrac{1}{|AB|}\sqrt{ B^2{\sigma}_A^2 + A^2 {\sigma}_B^2 } =\sqrt{ \cfrac{{\sigma}_A^2}{A^2} + \cfrac{{\sigma}_B^2 }{B^2}}\]

Example. If Q = A/B, then show that

    \[{\sigma}_{AB} = |A/B| \sqrt{\cfrac{{\sigma}_A^2}{A^2} + \cfrac{{\sigma}_B^2}{B^2}}\]

Evaluating the partial derivatives

    \[\cfrac{ \partial (A/B)}{\partial A} = 1/B \qquad \cfrac{\partial(A/B)}{\partial B} =-A/B^2\]

Applying the error propagation formula we have

    \[{\sigma}_{A/B} = \sqrt{ (1/B)^2 {\sigma}_A^2 + (A^2/B^4) {\sigma}_B^2 }\]

Divide both sides by |A/B|.

    \[\cfrac{{\sigma}_{A/B}}{|A/B|} = \cfrac{1}{|A/B|}\sqrt{ (1/B)^2{\sigma}_A^2 + (A^2/B^4) {\sigma}_B^2 } =\sqrt{ \cfrac{{\sigma}_A^2}{A^2} + \cfrac{{\sigma}_B^2 }{B^2}}\]

Common error formulas

Q= A + B then

    \[{\sigma}_Q = \sqrt{ {\sigma}_A^2 + {\sigma}_B^2}\]

Q = A - B then

    \[{\sigma}_Q = \sqrt{ {\sigma}_A^2 + {\sigma}_B^2}\]

Q = AB then

    \[\cfrac{{\sigma}_Q}{|Q|} = \sqrt{ \cfrac{{\sigma}_A^2}{A^2} + \cfrac{{\sigma}_B^2}{B^2}}\]

Q = A/B then

    \[\cfrac{{\sigma}_Q}{|Q|} = \sqrt{ \cfrac{{\sigma}_A^2}{A^2} + \cfrac{{\sigma}_B^2}{B^2}}\]

Q = AB/C\ldots then

    \[\cfrac{{\sigma}_Q}{|Q|} =\sqrt{ \cfrac{{\sigma}_A^2}{A^2} + \cfrac{{\sigma}_B^2}{B^2} + \cfrac{{\sigma}_C^2}{C^2} + \ldots }\]

Q = A^aB^B/C^c\ldots then

    \[\cfrac{{\sigma}_Q}{|Q|} = \sqrt{ \cfrac{a^2{\sigma}_A^2}{A^2} + \cfrac{b^2{\sigma}_B^2}{B^2} +\cfrac{c^2{\sigma}_C^2}{C^2} + \ldots }\]

Many formulas in physics are of the last form. For example, F = GmM/r^2 or \omega = \sqrt{k/m}.

Error propagation step by step

Suppose we want to compute the error in the formula

    \[Q(A,B,C,D) = \cfrac{A + B}{C + D}\]

It is convenient to do it step by step. Find the uncertainty in the numerator and denominator individually then combine the results.
We can let Y = A + B and Z = C+D then find the uncertainties in Y and Z first.

    \[{\sigma}_Y = \sqrt{ {\sigma}_A^2 + {\sigma}_B^2} \qquad {\sigma}_Z = \sqrt{ {\sigma}_C^2 + {\sigma}_D^2}\]

We can then calculate the uncertainty for Y/Z which would be

    \[\cfrac{{\sigma}_Q}{|Q|} =\sqrt{ \cfrac{{\sigma}_Y^2}{Y^2} + \cfrac{{\sigma}_Z^2}{Z^2}}\]

You can check for yourself that you get the same result as if you did the calculation in one step.

If the formula was

    \[f = \cfrac{A-B}{A+B}\]

then we couldn’t do it step by step the same way because the numerator and the denominator contain the same variables that repeat. The partial derivatives of the numerator and denominator cannot be separated.

The dominant error

The error propagation formula tells you where you should focus your efforts if you want to reduce the uncertainty in a derived quantity.

Example. Suppose you are calculating g from experiments with a pendulum with small oscillations.

    \[\omega^2 =g/l \qquad g = 4\pi^2 \cfrac{l}{T^2}\]

If the fractional uncertainty in the length and period are both ten percent, then how should you proceed to improve the experimental determination of g?
The error propagation formula for this situation.
We can easily write down the error propagation formula

    \[\cfrac{{\sigma}_g}{g} = \sqrt{ \cfrac{{\sigma}_l^2}{l^2} + 4\cfrac{ {\sigma}_T^2}{T^2}}\]

If the fractional uncertainty in the length is 10 percent and the fractional uncertainty in the period is 10 percent then the expression in the square root is

    \[\sqrt{ 0.1^2 + 4\times 0.1^2 }\]

Clearly measuring the period more precisely will have the largest effect on reducing the dominant error in the uncertainty.

MATLAB examples

Example. Finding the uncertainty in e^N.
Let N = 3.2524(35) find Q(N) = e^N and the uncertainty.
We know that (e^x)' = e^x. So we easily calculate the uncertainty using

    \[{\sigma}_Q = |e^N| {\sigma}_N = e^N {\sigma}_N\]

Here is what the Matlab code looks like to calculate that.

format long;
N = 3.2524;
sigN = 0.0035;
f = exp(N)
sigf = exp(N)*sigN
>> f = 25.852311068629906
>> sigf = 0.090483088740205
We do the rounding and find f = 25.852(90)

Example. Finding the uncertainty in \sin(\theta).
What is the uncertainty in Q=\sin( \theta) when \theta = 30.0^\circ \pm 2.5 ^\circ?
First, we convert the numbers to radians. The error propagation formula is then

    \[{\sigma}_Q = | \cos \theta | {\sigma}_\theta\]

It is easy to do all these calculations in Matlab.

format long;
t = 30;
sigt = 2.5;
trad = 30 * pi/180;
sigtrad = 2.5 * pi/180;
df = cos(trad)*sigtrad
f = sin(trad)
>> 0.03778748675488
>> 0.50000000000000
So the answer is \sin \theta = 0.500(38).

Example. Finding the Stefan-Boltzmann constant and its uncertainty.
What is the value and uncertainty of the Stefan-Boltzmann constant

    \[\sigma = \cfrac{\pi^2 k_B^4}{60 \hbar^3 c^2}\]

It’s funny that the notation for the Stefan-Boltzmann constant is already \sigma, don’t get confused.
The propagation of error formula is a common one.

    \[{ {\sigma}_{\sigma}} = \sigma \sqrt{4^2\cfrac{ {\sigma}_{k_B}^2}{ k_B^2} + 3^2\cfrac{{\sigma}_{\hbar}^2}{\hbar^2}}\]

Since c is defined it has no uncertainty. Also constant factors like \pi or \sqrt{2} have no uncertainty.
A quick google for “NIST hbar”, “NIST c”, and “NIST Boltzmann’s constant” gives the necessary data.

format long;
hb = 1.054571800e-34;
shb = 0.000000013e-34;
c = 299792458;
kb = 1.38064852e-23;
skb = 0.00000079e-23;

SB = pi^2/60 * kb^4/(c^2*hb^3)
sigSB = SB*(16*skb^2/kb^2+9*shb^2/hb^2)^(0.5)
>> 5.670366818327269e-08
>> 1.297991325923970e-13

The uncertainty is 0.000013 \times 10^{-8} J m^{-2}s^{-1}K^{-4}. So the final result is

    \[\sigma = 5.670367(13) \times 10^{-8} \mathrm{Jm}^{-2}\mathrm{s}^{-1}\mathrm{K}^{-4}\]

This calculation agrees with the NIST 2016 value.

ErrorPropagationSingleAndMultipleIndependentVariables.pdf

Reporting Measurements and Uncertainties, Significant Digits, and Rounding

Summary

In this lesson, we talk about the number of significant digits used to report measurements and uncertainties. There are conventional rules for the number of significant digits in a number. Experts find that the actual number of significant digits in a measurement is a function of how many significant digits in the uncertainty are used. The best practice is to use two significant digits of uncertainty and to match the precision of the measurement and the uncertainty. Reported measurements should be recorded as a value plus or minus another value, with parenthesis indicating the error in the last digits, or with scientific notation. For measurements that end in zero or zeros, there may be different conventions for indicating or calculating the number of significant digits. An alternative method is to use scientific notation, where all the digits are significant. Calculation of derived quantities from properly reported measurements and uncertainties also make new reportable quantities. Using two significant digits of uncertainty minimizes potential rounding errors.

Correctly Reporting Measurements and Scientific Notation

The best way to learn how to correctly report measurements and uncertainties is to see some examples.
It would be correct to write any of the following.

    \[57.91 \pm 0.46 \quad 57.91(46)\]

    \[(5.791 \pm 0.046) \times 10^1 \quad 5.791(46) \times 10^1\]

We have introduced here scientific notation. A number in scientific notation is of the form A \times 10^B where 1\le A < 10 and B is an integer. The two digits in parenthesis represent the uncertainty in the last two digits of the measurement. It would be incorrect to report any of these because the precision does match.

    \[10.47 \pm 0.022 \qquad 10.4 \pm 0.22 \qquad (1.047 \pm 0.22) \times 10^1\]

I would be very suspicious of the data analysis of any scientific report or book that makes this basic error.

Rules for significant digits

The classic rules for the significant digits in a number are as follows. All non-zero digits are significant. In 1204 1, 2, and 4 are significant for sure. All zeroes between non-zero digits are significant. In 1204, the 0 is also significant. Leading zeros are not significant. 0.012 has only two significant digits Trailing zeroes after the decimal point are significant. 12.0 and 0.0400 both have three significant digits. Trailing zeroes without a decimal point are not significant. 1000 has only one significant digit.

Keeping 2 significant digits

I see a lot of books that give the rule, only use one significant digit of uncertainty. Unfortunately, this is not what the professionals are doing. If you go to the NIST or CODATA website that reports the fundamental constants. They are all given with two digits of uncertainty. We should try to emulate their example.

There is in fact only one rule. Round to two significant digits of uncertainty and match the precision of the measurement by rounding. For example, if we have (4200 \pm 43) or (4200 \pm 1500) then we see that 4200 has a different number of significant digits in each instance. In the first case, 4200 has four significant digits, it just happens to end in two zeros. In the second case, 4200 has only two significant digits because the uncertainty has two significant digits and the precision must match. It would be incorrect to write (4212 \pm 1500). The digits 12 are insignificant in this instance and should be changed to zeros.

Rules for Rounding

We also do not write (671941.283962 \pm 1.56932). Although the precision does match, one cannot justify actually knowing that many digits of uncertainty. Such an expression can be fixed however and written as (671941.3 \pm 1.6). Notice some rounding is involved. The rules for rounding are as follows. Consider the last two digits of the number as xy. If y is 0, 1, 2, 3, or 4 keep x. If y is 5, 6, 7, 8, or 9 then add one to x. Either remove y or replace y with zero depending on the situation.

Rounding the uncertainty is easy (0.0445 \to 0.045), (0.0444 \to 0.44), (12.2 \to 12), (12.9 \to 13), (2501 \to 2500), (2555 \to 2600), (0.999 \to 1.0), and (99.64 \to 100). Notice that 100 still only has two significant digits the first 1 and the first 0. For example, we could write (145.32\pm 99.9921) as (150 \pm 100).

How to do the calculation?

In practice, we keep all the digits for our calculation but then do the rounding at the end. We will usually have an error propagation formula or a method to calculate the spread of our data determining the uncertainty. We then round this to two significant digits. Then we can match our calculator value of the measurement. For example, (4.221242 \pm 0.43412) \to 4.22(43) If you feel there is ever any ambiguity then use scientific notation to indicate your significant digits.

Uncertainty in derived quantities

If we wanted to add two numbers with uncertainty like (12.1 \pm 1.0) and (11.8 \pm 1.0) then we would get (23.9 \pm 1.4). The uncertainty is calculated as \sqrt{1.0^2 + 1.0^2} in this situation. We will learn this later using propagation of error. If we only used one significant digit there would be a major 41 percent rounding error, (\sqrt{2} = 1). This is one of the main reasons for using (at least) two significant digits. It is not necessary to keep an uncertainty to one part in a thousand by convention.

Exact quantities

Certain fundamental physical constants or mathematical constants are defined so you don’t consider their significant digits. They have no uncertainty. You can pretend they have an infinite number of significant digits. If you are working with the speed of light then plug into your calculator

    \[c = 299792458 \mathrm{\,ms}^{-1}\]

If you are calculating the area of a circle and your radius is 4.34 then you should use (\pi = 3.1416) just to have more digits than the number you are working with. You can use more digits but it won’t affect the rounding in the end.

Another example, where significant digits do not come into play, is conversion factors. The definition of an inch is that 25.4 mm equals one inch. This doesn’t mean that all your calculations have to come out with 3 significant digits. Keep using the methods that are outlined in this lesson.

ReportingMeasurementsSignificantDigitsAndRounding.pdf