The post Nonlinear Least Squares Curve Fitting appeared first on www.sciencemathmastery.com.

]]>There is no general solution for the minimization of chi-squared to find the best estimates of the parameters when the model is nonlinear. A number of methods have been developed that can often find the best estimates and their uncertainties with arbitrary precision. Iterative methods are usually used to minimize a nonlinear chi-squared. Another brute force method, called the grid method, tests every possible set of parameters with a certain precision. This is usually not possible when the parameter space or the number of parameters are large. Commercial curve fitting software usually implements an iterative method called the Levenberg-Marquardt method. If you have a nonlinear curve fitting software you can also apply it to linear models also if you want, but there is no guarantee you get the analytic solution. In this lesson, we discuss how to implement the grid method for simple problems. We give an example of Newton’s method as a basic iterative method. We explore the gradient descent method, Gauss-Newton method, and the Levenberg Marquardt method both theoretically and in MATLAB. Users of curve fitting software should understand the underlying algorithms and be at least able to check if the software is performing the calculation correctly.

We minimize the error function in least squares theory.

We showed in a previous lesson that if is a linear model, or

Then has a unique global minimum and we can analytically derive expressions for the best estimates of the model parameters and their uncertainties, and .

When is not of the above form then it is nonlinear. An example is

When you take the derivative of for a nonlinear model, you get a nonlinear equation which may have no general analytical solution in terms of linear algebra. One then has to resort to different methods. One typically resorts to iterative methods that require an initial guess for and update rules for the parameters that move towards the minimum of . One repeats this a number of times until the algorithm converges to some minimum of . This may be a local minimum of the actual global minimum. There is not a general test to show that a local minimum is actually the global minimum. It is always a good idea to start with different initial guesses and see if the same minimum is reached. This is no guarantee, but it is generally tolerated.

We want to work on some nonlinear data in this lesson so we can take a simple nonlinear model. For example, a common nonlinear model is a mass oscillating on a spring with damping.

For a certain choice of constants, , the motion is called critically damped and obeys the following solution.

We can add some random noise to that function and take the average of five samples to generate our data set. This way we know how close different approach fit to the actual parameters. Now we generate a dataset in MATLAB and save it as a .mat file.

Example Generate some sample data that represent critical damping.

We take five simulated measurements and average them.

clear;figure(1);clf;

A = 2;

B = -5;

C = 1;

xi = 0:0.15:5;

Ni = 5;

ymodel = (A+B.*xi).*exp(-C.*xi);

yi = [];

sigi = [];

for i = 1:length(xi)

temp = [];

for j = 1:Ni

temp(j) = (A + B.*xi(i)).*exp(-C.*xi(i)) + normrnd(0,0.5);

end

yi(i) = mean(temp);

sigi(i) = std(temp)/sqrt(Ni);

end

subplot(2,1,1);

plot(xi,ymodel,’.-k’,’markersize’,20);

axis([0 5 -2 3]);

hold on;

xlabel( ‘Time [s]’,’fontsize’,20);

ylabel( ‘Position [m]’,’fontsize’,20);

title(‘True values for critical damping’,’fontsize’,20);

subplot(2,1,2)

errorbar(xi,yi, sigi,’.k’,’markersize’,15);

xlabel( ‘Time [s]’,’fontsize’,20);

ylabel( ‘Position [m]’,’fontsize’,20);

title(‘Simulated values for critical damping’,’fontsize’,20);

axis([0 5 -2 3]);

hold on;

plot(xi,ymodel,’-k’,’markersize’,20)

save criticaldampingdata xi yi sigi

If you have the computational power then there is a brute force approach to curve fitting. This method is called the grid method. Suppose from the graph of some linear data you can tell that is between 3 and 4 and is between 0.5 and 1.5. Then you can calculate for every value in between to 0.01 or 0.001 resolution. You then just find the least one by direct search in Matlab. Of course, those might take 10,000 or 100,000 calculations but you will get the same answer as the analytical result. You can do a coarse grid to find local minima and then do a fine grid to find the global minimum. One advantage of the grid method is that you don’t need to know much theory and it is pretty obvious how it works. If you cannot narrow the parameter space to include the global minimum then you might require too many calculations to perform the optimization.

We will now study some iterative methods that can be used to minimize chi-squared. Grid method is not really an iterative method like the rest. You try to cast a net over the parameter surface. You see which loop of the net produces the minimum, then you cast a finer net over that loop and rapidly converge towards a minimum. This method is good because you get out of it what you expect. When you have lots of parameters it might not be possible to cast a fine enough net to progress because there are too many combinations.

Example. Implement the grid method on Matlab to fit a straight line. Compare the results to

We can implement the grid method to fit a straight line model. It works just the same with nonlinear models.

%Grid Method

clear; clf;

A = 10;

B = 3;

Ni = 15;

xi = 1:2:15

for i = 1:length(xi)

tx = [];

for j = 1:Ni

tx(j) = A + B*xi(i) + normrnd(0,3);

end

ei(i) = std(tx)/sqrt(Ni);

yi(i) = sum(tx)/Ni;

end

errorbar(xi,yi,ei,’.k’, ‘markersize’,10);

xlabel(‘trial’)

ylabel(‘measurement’)

hold on;

[A sigA B sigB] = lineABx(xi,yi,ei)

A

sigA

B

sigB

a = 9.5:0.0001:10.5;

b = 2.9:0.0001:3.1;

for j = 1:length(a)

for k = 1:length(b)

chisquared(j,k) = sum( 1./ei.^2.*(yi-b(k).*xi – a(j)).^2 );

end

end

g = min(min(chisquared));

[I J] = find(chisquared == g);

AA = a(I);

BB = b(J);

A

AA

B

BB

We can represent in terms of vectors. Let

We define the residual vector.

We define the weight matrix which is diagonal.

So we can write as

We go back and forth between column and row vectors using the transpose property. Some properties of transpose are

When you multiply an matrix with an matrix you get an matrix. You can use size(A) command in Matlab to see how your matrices are arranged. You can use the transpose operator A’ to convert a column vector to a row vector and vice versa. You may have to check your expression if you haven’t kept track of your transposes.

We will be taking partial derivatives of with respect to the parameters which is the Jacobian matrix.

We also have the Hessian matrix which is defined as

Here is an important result from Matrix calculus for a symmetric matrix .

We will explore Newton’s method because it is a simple example of an iterative method which we will be using to minimize . Consider the following diagram.

Illustration of Newton’s method

We have

We can rearrange the expression as

So if we know the initial guess, we can find . Then we can plug back in in the next iteration to find and so on.

We get closer and closer to the point where with each iteration.

Example Find the square root of two to high precision.

We can try out Newton’s method to calculate the square root of 2. If we have the equation,

Then we can write . This function goes to zero at so we can apply Newton’s method.

clear; format long;

N = 10;

x(1) = 2.

for i = 1:N

x(i+1) = x(i) – (x(i)^2 -2)/(2*x(i));

end

If we look at the contents of we find,

n x_n

1 2

2 1.500

3 1.416666666666667

4 1.414215686274510

5 1.414213562374690

6 1.414213562373095

7 1.414213562373095

The update function just produces fractions or rational numbers if you choose an initial guess like 2. In old times, before the calculator one could calculate the square root of any number to sufficient accuracy by finding the fraction, then doing long division in the last step. It is quite a good method, 15 digits are perfect after the 6th iteration.

There are a number of things that can go wrong with Newton’s method. Each successive update may result in oscillation or the updates may be directed to another solution you are not looking for. The best way to find a root is to make a plot of near the root and plan your initial guess such that it climbs down the hill to the root avoiding any bumps or being too far away from the minimum. Also, your function may have a restricted domain or singularities where division by zero occurs. These will obviously crash Newton’s method.

The Gauss-Newton algorithm is another iterative curve fitting method.

One can arrive at the update formula by considering a Taylor expansion of with respect to an update .

At the minimum of , then

Then we have

We want to take the derivative of with respect to and set it equal to zero. Remember the important derivative.

We can take the transpose of that expression to get

Remember that is symmetric. Solving for gives the Gauss Newton update rule.

To implement the Gauss-Newton method we only need to code for the update rule. Gauss-Newton seems to be more picky about the initial guess than other methods.

Example. Implement the Gauss-Newton method to fit the model data

We change the initial guess and update the delta rule.

clear;clf;figure(1);

load modeldata.mat

errorbar(xi, ]yi, sigi,’.k’);

hold on;

xlabel(‘time [s]’, ‘fontsize’,20);

ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);

W = zeros(N,N);

for i = 1:N

W(i,i) = 1/sigi(i)^2;

end

%some initial random guess

A(1) = 1.4

B(1) = -4.3

C(1) = 1.3

gamma = 0.00001; %Arbitrary step size

S = 10000; %Number of steps

for s = 1:S

for i = 1:N

J(i,1) = exp(-C(s)*xi(i));

J(i,2) = xi(i)*exp(-C(s)*xi(i));

J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));

ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));

end

% we really only have to change one line of code for delta

delta = inv(J’*W*J)*J’*W*ri’;

A(s+1)=A(s)+delta(1);

B(s+1)=B(s)+delta(2);

C(s+1)=C(s)+delta(3);

end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);

plot(xi, ymodel,’r’);

title(‘Gauss Newton fits’,’fontsize’,20);

Gauss-Newton works

Imagine as a multidimensional hyper-surface of the possible fit parameters, . A choice of corresponds to a location on the hypersurface. We can use the result from vector calculus that will point towards a local minimum. We can then step in that direction and repeat the process. We take an arbitrary step size of .The gradient of chi squared is calculated as

The gradient points up the slope of a function so our update rule should point in the opposite direction. We choose the step size to be proportional to which is something we set arbitrarily. We can make it and see if the calculation converges.

Example Implement gradient descent with the model data described earlier in this chapter.

We load the data and do the calculation

%Imprementing gradiate descent

clear;clf;figure(1);

load modeldata.mat

errorbar(xi, yi, sigi,’.k’);

hold on;

xlabel(‘time [s]’, ‘fontsize’,20);

ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);

W = zeros(N,N);

for i = 1:N

W(i,i) = 1/sigi(i)^2;

end

%some initial random guess

A(1) = .2

B(1) = -.2

C(1) = .3

gamma = 0.00001; %Arbitrary step size

S = 10000; %Number of steps

for s = 1:S

for i = 1:N

J(i,1) = exp(-C(s)*xi(i));

J(i,2) = xi(i)*exp(-C(s)*xi(i));

J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));

ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));

end

delta = 2*gamma*ri*W*J;

A(s+1)=A(s)+delta(1);

B(s+1)=B(s)+delta(2);

C(s+1)=C(s)+delta(3);

end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);

plot(xi, ymodel,’r’);

title(‘Gradient descent fit’,’fontsize’,20);

Gradient descent works

The Levenberg-Marquardt algorithm is one of the most powerful curve fitting methods available. Most commercial software packages use it in one form or another. It is robust, meaning that it will converge to a local minimum for a wide range of initial guesses. It does not always find the global minimum. The Levenberg-Marquardt algorithm can be thought of as a combination of the Gauss-Newton method and the gradient descent method. To derive the Levenberg-Marquardt algorithm one starts along the same lines as the Gauss-Newton method and we arrive at

The Levenberg-Marquardt method slightly modifies this equation to

The diag symbol means only the components along the diagonal are non-zero.

The update can be solved for by an inverse matrix.

The Levenberg-Marquardt update rule is given by

Example Implement the Levenberg Marquardt algorithm in Matlab

Here is the code

clear; clf; figure(1);

load criticaldampingdata

errorbar(xi, yi, sigi,’.k’);

hold on;

xlabel(‘time [s]’, ‘fontsize’,20);

ylabel(‘distance [m]’, ‘fontsize’,20);

N = length(xi);

W = zeros(N,N);

for i = 1:N

W(i,i) = 1/sigi(i)^2;

end

%some initial random guess

A(1) = 1;

B(1) = -4;

C(1) = 1.3;

S = 10000;

lambda = 0.01;

for s = 1:S

for i = 1:N

J(i,1) = exp(-C(s)*xi(i));

J(i,2) = xi(i)*exp(-C(s)*xi(i));

J(i,3) = (A(s) + B(s)*xi(i) )*(-xi(i))*exp(-C(s)*xi(i));

ri(i) = (yi(i) – (A(s)+ B(s)*xi(i))*exp(-C(s)*xi(i)));

end

Q = J’*W*J;

QQ = zeros(3,3);

QQ(1,1) = Q(1,1)*(1+lambda);

QQ(2,2) = Q(2,2)*(1+lambda);

QQ(3,3) = Q(3,3)*(1+lambda);

delta = inv(Q+QQ)*J’*W*ri’;

A(s+1)=A(s)+delta(1);

B(s+1)=B(s)+delta(2);

C(s+1)=C(s)+delta(3);

end;

ymodel = (A(end)+B(end).*xi).*exp(-C(end).*xi);

plot(xi, ymodel,’-.’);

title(‘Levenberg Marquardt fits’,’fontsize’,20);\end{verbatim}

If you can have already found the optimal parameters by one of the various methods we are about to discuss then it is straightforward to calculate the estimated parameter uncertainties. Recall from the generalized linear theory, we had the matrix . There is a second way to calculate this matrix just take

Now we can find and that leads to the parameter uncertainties . One should verify that this method works by testing in on the model to generate .

For nonlinear models we can write out the matrix elements by calculating the derivatives.

Example. Find the parameter estimate for and its uncertainty in the straight line model through the origin by calculating . The formula for when we fit the line through the origin was

We can easily find it is a 1 by 1 matrix.

Since is the inverse of we have

Therefore

This is exactly what we got before with this model using different methods.

Example Calculate for the critical damping problem and then find the uncertainties in the estimates.

For this task, I definitely want to use Mathematica. I notice that is symmetric so we only have to calculate six quantities. To find the derivatives in Mathematica I type the following.

The results are implemented in MATLAB

wi = sigi.^(-2);

A=A(end);

B=B(end);

C=C(end);

E1 = exp(-C.*xi);

E2 = exp(-2.*C.*xi);

P1 = (A + B.*xi);

X2 = xi.*xi;

D(1,1) = sum( wi.*E2);

D(2,2) = sum( wi.*X2.*E2);

D(3,3) = sum( wi.*(E2.*X2.*P1.^2)-E1.*X2.*P1.*(-E1.*P1 + yi));

D(1,2) = sum( wi.*E2.*xi);

D(1,3) = sum( -wi.*E2.*xi.*P1 + wi.*E1.*xi.*(-E1.*P1 + yi));

D(2,3) = sum( -wi.*E2.*X2.*P1 + wi.*E1.*X2.*(-E1.*P1 +yi));

D

M = inv(D)

A

sqrt(M(1,1))

B

sqrt(M(2,2))

C

sqrt(M(3,3))

With five samples the values are usually pretty close to the true parameters. If you increase the number of samples then you get even closer to the true values. I was expecting that the error bars would be larger and enclose the true values, but this is not always the case. The curve fit (dotted-dashed line) is visually pretty close to the true model, yet small differences remain. One can experiment with the code to try different starting models and initial guesses. This exercise was meant to show the LM method and how to implement it in MATLAB. You can easily now adapt the code to your curve fitting functions.

Levenberg Marquardt works. For this plot, the agreement with the true values is reasonable. A = 2.077(93), B = -5.17(12), and C = 1.026(16)

We have to get into further discussions of statistics later to determine what it means to have a good fit. This depends on the value of chi-squared and the number of degrees of freedom. For the meantime, we are stuck with chi by eye.

NonlinearLeastSquaresCurveFitting.pdf

Visit www.sciencemathmastery.com for more lessons on error analysis, probability, and statistics.

The post Nonlinear Least Squares Curve Fitting appeared first on www.sciencemathmastery.com.

]]>The post Generalized Linear Least Squares Curve Fitting appeared first on www.sciencemathmastery.com.

]]>Least squares curve fitting can be generalized to work with any linear model to determine unique parameter estimates and uncertainties in terms of matrix equations. Linear models follow a general form. All other models are classified as nonlinear. While nonlinear models do not have a general analytic solution, some nonlinear models can be transformed into linear models. We give two examples of transforming an exponential model and a transformation of the enzymatic rate equation to straight line models using propagation of error. Polynomials form a class of linear models which we implement in MATLAB.

In least square curve fitting, there is a step where we take derivatives to find the parameter estimates. If those equations are too complicated then we will have a hard time solving them. It is straightforward to solve these equations for so called linear models. Linear models are of the form

One example of a linear model with parameters is a polynomial.

Other linear models might have the replaced with general functions . This gives a lot of different possible functions, but some are unnatural and do not occur in usual data sets.

Linear means the parameter is to the first power out in front. The following model is not linear.

For models of the nonlinear form, the optimization problem is usually quite difficult to solve. We will consider what to do with nonlinear models later.

To perform a curve fit on a model with parameters you need at least different points. A good practice is to have many more measured points than parameters. When models are bloated with a lot of parameters you can usually fit a wide variety of functions. Keep it as simple as possible but not simpler. Use only as few parameters are necessary.

Now with our linear model of parameters we would like to derive the best parameter estimates and their uncertainties in the least squares weighted framework. The model is.

where are some generally defined continuous functions.

To minizise , we set each derivative equal to zero.

This gives a linear system of equations. These “normal equations” give the optimal fit parameters which we can write as a vector.

Taking terms on each side of the equal sign.

It is useful to do some linear algebra to solve these equations for the parameters and their uncertainties. The “design matrix,” , is defined as the row and column matrix

Second we define the column vector of rows.

A third matrix, has inverse .

Now the normal equations can be rewritten as

To solve for the fit parameters it is easy to numerically compute the inverse in MATLAB.

Now, we can find the uncertainty in the parameters by the propagation of error formula.

But the Kroenecker delta because is the inverse of .

so the sum reduces to

These are the uncertainties in the fitting parameters.

The off diagonal terms of also give the covariances. If two different groups use the least squares theory on a linear model they should arrive at the same results for the fitting parameters and their uncertainties which are derived here. To summarize the results for linear least squares models

Sometimes data following a nonlinear relationship may be transformed to a linear model. One can then estimate the transformed parameters exactly in conjunction with propagation of error. Here is a well known example from biochemistry for the velocity of a chemical reaction illustrating linearization.

This equation can be rewritten as

We start with as the set of information. is the standard deviation of the mean of the repeated measurement of . The transformed variables are

Propagation of error for a single variable has been used to find . One can then weighted fit to a line to find , , , and . and can then be found also by propagation of error. This transformation generates a linear plot called the Lineweaver-Burke plot.

Another straightforward example of linearization involves exponential decay.

One has the set of information where are the standard deviations of the mean.

Now take the logarithm of both sides.

Now we transform

If the error bars were originally similar for most points on the linear plot they will be stretched out at longer times after transformation. After finding and , you can use error propagation to find and its uncertainty.

Example. Fit some simulated exponentially decaying data with linear least squares by linearizing the model.

We generate some model data which are the average of five points of + normrnd(0,4).

clear;

ti = 0.0:0.1:2.5

N0 = 100;

L = 1;

Ni = 5;

for i = 1:length(ti)

temp = [];

for j = 1:Ni

temp(j) = N0*exp(-L*ti(i))+normrnd(0,4);

end

yi(i) = mean(temp);

ei(i) = std(temp)/sqrt(Ni);

end

figure(1);clf;

errorbar(ti,yi,ei,’.k’);

axis([-0.5 3 0 125]);

xlabel(‘x’);

ylabel(‘N’);

figure(2);clf;

zi = log(yi);

si = ei./yi;

errorbar(ti,zi,si,’.k’);

axis([-0.5 3 0 5])

xlabel(‘t’);

ylabel(‘z’);

hold on;

[A sigA B sigB chi2] = lineABx(ti, zi, si);

XX = 0:0.1:3

YY = A + XX.*B;

plot(XX,YY,’k’);

estN = exp(A)

estsigN = exp(A)*sigA

estL = -B

estsigL = sigB

figure(1);

hold on;

XX = 0:0.01:3

YY = estN.*exp(-estL.*XX);

plot(XX,YY,’k’);

Exponential decay with best fit

Linearization and weighted fit

Example. Implement a polynomial least squares fitting function in MATLAB.

Here is the MATLAB function we will use.

function [ A sigA ] = genpolyfit( M, xi, yi, sigi )

% [A sigA ] = genpolyfi( M, xi, yi, sigi)

% Least sqaures polynomial fit to degree M-1

%

for k = 1:M

X(k,:) = xi.^(k-1);

end

for j = 1:M

for k = 1:M

D(j,k) = sum(sigi.^(-2).*X(j,:).*X(k,:));

end

end

for k = 1:M

B(k) = sum(X(k,:).*yi.*sigi.^(-2));

end

A = inv(D)*B’;

C = inv(D);

for k = 1:M

sigA(k) = sqrt(C(k,k));

end

Example. Simulate some data that can be fit by a parabola.

We can use the least square polynomial function.

clear;figure(1);clf;

A = 1;

B = 2;

C = 3;

Ni = 5;

xi = -10:1:10;

for i = 1:length(xi)

temp = [];

for j = 1:Ni

temp(j) = A + B*xi(i) + C*xi(i).^2 + normrnd(0,10);

end

yi(i) = mean(temp);

si(i) = std(temp)/sqrt(Ni);

end

figure(1);clf;

errorbar(xi,yi,si,’.k’, ‘markersize’,10);

xlabel(‘x’)

ylabel(‘y’)

hold on;

[A sigA] = genpolyfit(3,xi,yi,si);

x1 = -11:0.1:11;

y1 = A(3).*x1.^2 + A(2).*x1 + A(1);

plot(x1,y1, ‘r’);

axis( [ -11 11 -50 1.1*max(y1)]);’

Curve fit to a parabola

GeneralizedLeastSquaresCurveFitting.pdf

The post Generalized Linear Least Squares Curve Fitting appeared first on www.sciencemathmastery.com.

]]>The post Linear Least Squares Curve Fitting of Straight Line Models appeared first on www.sciencemathmastery.com.

]]>Curve fitting is a useful procedure for testing theoretical models and estimating model parameters and their uncertainties. Curve fitting is carried out by choosing the set of estimated parameters that minimize an error function. The conventional error function for analytic results with calculus is the sum of the squares of residuals. The residuals are the difference between the model prediction and the experimental observation. Weighted fitting takes into account the size of the error bars where measurements with smaller error bars count more heavily towards the minimization. Weighted fits are more robust to the effects of outlying points which generally have larger error bars. The weighted mean is a least squares procedure, fit to , which is used to combine different experimental results of the same quantity. This is how CODATA and NIST calculate the accepted fundamental constants of nature from various experiments. Curve fitting is a tedious calculation and is best done with a software package like MATLAB. We illustrate how to fit the various linear models that are discussed with simulated data in MATLAB. One can either put the total data set into the curve fit or average repeated measurements first and fit those.

Suppose we have a set of measurements and we assume the span some range on the x axis, but there may be repetitions. This is in fact recommended. Always repeat measurements so you can calculate their uncertainties directly from the data. We will not factor in how to use the uncertainties the best way in curve fitting yet, but we will consider it later. If we have a model we define a residual as the difference between the model and the measurement

or in general

Now we can start defining some trial error functions. We could choose to minimize the residuals directly.

The problem with this error function is that some of the errors will be positive and some will be negative. They basically cancel each other out and you can’t find the best and . If we wanted to make the signs the same then we could try absolute values.

The problem with this error function is that taking derivatives of absolute value functions is nasty. You can’t get many nice results with calculus using this error function. The next simplest modification of the error function is to take

Then we look for the least squares of the residuals. This is the convention that most people use, but it is not totally arbitrary. We will discuss in a more advanced lesson some justification for the least squares method.

Expanding the 9 terms in for gives

Using the “S” notation we can write as

where and etc.

The way we find the minimum of is to take the partial derivatives with respect to and and set them equal to zero.

This gives two equation which can be solved for and .

This equation can be written as a matrix equation.

As long as the determinant of the first matrix on the left is not zero we can solve these linear equations with Cramer’s rule. The condition is basically that two of the be different. You can’t make any sense of the fit if all the measurements are just at one point. You need two different to define a linear function.

A common expression is .

Now we find the uncertainties in and by using the propagation of error formula.

If we have not repeated the measurements at each several times then we don’t know what the are. This is a good justification for repeating the measurements. If we used all different then we can just assume that is the same for all the points. If we use some arguments from statistics we can say

There are N points, but two degrees of freedom less because we have two fit parameters. The expression in each parenthesis is roughly equal to . So we can write

Now we can do the calculation of the uncertainty. We have

I define . It has the properties that for each . , ,

A similar calculation for gives

If we know each then it makes sense to just do the weighted calculation which we will describe later in this lesson.

For a model like Ohm’s law, , we want to fit a straight line forced to pass through the origin, .

The calculus is similar.

Minimizing for using we find

The propagation of error formula gives the uncertainty of .

Again if we don’t know the uncertainties we can use

The last model we will consider is a horizontal line.

.

So we take the first derivative and find

or

That is just the arithmetic mean for . We can find the standard deviation in the mean by the same procedure we used before.

We use for where .

So the standard deviation of the mean is given by

.

It is more useful to do curve fitting when you have repeated the measurements. Then you can know what the different standard deviations are. If the standard deviation is large then you don’t want to count those points as much as the ones for which the standard deviation is small. Another method of doing least squares curve fitting is to weight the residuals by . This sum is called chi squared, .

This does not change the math so much that we cannot solve for and . You just have to replace each formula with . For example, , , and so on.

There are slight differences in how you interpret chi squared as a statistic if you use each pair of data points in the equation or first average the points and calculate their standard deviation of the mean. This boils down to using or . Often you don’t have access to the individual data points if you look at someone else’s data, so you have to use the average values and the standard deviations of the mean instead of . We can then summarize the results from before with the new weighting scheme.

For .

For . We have . So we can calculate the uncertainty.

Probably the most useful weighted formula is for the weighted mean when . This allows you to average together measurements.

Taking we find

This is referred to as the weighted mean. We can find the uncertainty in the weighted mean

In the limit, that all are the same we recover the arithmetic mean and the standard deviation of the mean for the uncertainty.

This formula is quite useful. If you have a series of accurate measurements some of which are precise and others not so much you can combine them to get a more accurate and precise average based on the weighting. Most people are probably already familiar with this in terms of calculating their grades. Say 25 percent for homework, 25 percent for the midterm and 50 percent for the final. Then an average score might look like

Pretty close, let’s give them an A minus anyways.

Example. Write a Matlab function that fits a straight line through a dataset.

We can easily compute all the sums in Matlab. Computing the sums by hand would be quite tedious.

function [ A sigA B sigB chi2] = lineABx( xi, yi, sigi )

%[A sigA B sigB chi2] = lineBx(xi, yi, sigi)

% Least sqaures fit to a line y = A + Bx

wi = sigi.^(-2);

Sw = sum(wi);

Swx = sum(wi.*xi);

Swxx = sum(wi.*xi.*xi);

Swy = sum(wi.*yi);

Swxy = sum(wi.*xi.*yi);

Delta = Sw*Swxx – Swx^2;

A = (Swy*Swxx – Swx*Swxy)/Delta;

B = (Sw*Swxy – Swx*Swy)/Delta;

sigA = sqrt(Swxx/Delta);

sigB = sqrt(Sw/Delta);

chi2 = sum( wi.*(yi – A- B.*xi).^2)

end

Example. Write a Matlab function that fits an unweighted straight line through a dataset.

We have

function [ A sigA B sigB] = lineABxU( xi, yi )

%[A sigA B sigB] = lineBx(xi, yi)

%Unweighted least sqaures fit to a line y = A + Bx

S = length(xi);

Sx = sum(xi);

Sxx = sum(xi.*xi);

Sy = sum(yi);

Sxy = sum(xi.*yi);

Delta = S*Sxx – Sx^2;

A = (Sy*Sxx – Sx*Sxy)/Delta;

B = (S*Sxy – Sx*Sy)/Delta;

sigma = sqrt(1/(length(xi)-2)*sum((yi – A – B.*xi).^2));

sigA = sigma*sqrt(Sxx/Delta);

sigB = sigma*sqrt(S/Delta);

end

Example. Write a Matlab function to fit a straight line through the origin

We can evaluate the sums easily in Matlab.

function [ B sigB ] = lineBx( xi, yi, sigi )

%[B sigB chi2] = lineBx(xi, yi, sigi)

%Weightef least sqaured fit to a line y = Bx

wi = sigi.^(-2);

Swxy = sum(wi.*xi.*yi);

Swxx = sum(wi.*xi.*xi);

B = Swxy/Swxx;

sigB = 1/sqrt(Swxx);

chi2 = sum( wi.*(yi – B.*xi).^2)

end

Example. Write a Matlab function to fit an unweighted straight line through the origin

The code for the unweighted fit is as follows

function [ BU sigBU ] = lineBxU( xi, yi )

%[B sigB] = lineBx(xi, yi )

%Weightef least sqaured fit to a line y = Bx

Sxy = sum(xi.*yi);

Sxx = sum(xi.*xi);

BU = Sxy/Sxx;

variance = 1/length(yi)* sum( (yi – BU).^2 );

sigma = sqrt(variance);

sigBU = sigma/sqrt(Sxx);

end

Example. Write a Matlab function to calculated the weighted mean and its uncertainty.

Implementing the theory we have

function [ A sigA chi2] = lineA( xi, yi, sigi )

%[A sigA chi2] = lineA(xi, yi, sigi)

%Least sqaures fit to a constant y = A

wi = sigi.^(-2);

Sw = sum(wi);

Swy = sum(wi.*yi);

A = Swy/Sw;

sigA = 1/sqrt(Sw);

chi2 = sum( wi.*(yi – A).^2 );

end

Example. Illustrate a weighted linear fit to .

For the purpose of simulating, data we will assume a linear model plus random gaussian noise. The gaussian distribution noise can be generated with the command . We will

take measurements at each point for . With the model . The goal of this simulation is to compare fitting the measurements and their standard deviations with the means at each and their standard deviation of the mean. It will turn out that fitting either way gives identical estimates.

clear; figure(1); clf;

format long;

A = 10;

B = 3;

mu = 0;

sig = 3;

Ni = 5;

x = 1:1:10

allx = [];

ally = [];

allysig = [];

avgy = [];

for i = 1:length(x)

xtemp = [];

temp = [];

tempsig = [];

for j = 1:Ni

err = normrnd(mu,sig);

temp(j) = A + B*x(i) + err;

xtemp = [xtemp x(i)];

end

sigy1 = std(temp);

allx = [allx xtemp];

ally = [ally temp];

allysig = [allysig ones(1,Ni)*sigy1 ];

avgy(i) = [mean(temp)];

avgysig(i) = [sigy1/sqrt(Ni)];

end

[A1 sigA1 B1 sigB1 chi1] = lineABx(allx,ally,allysig);

[A2 sigA2 B2 sigB2 chi2] = lineABx(x, avgy, avgysig);

figure(1);

plot(allx, ally,’.k’,’markersize’,20);

hold on;

plot(x, avgy, ‘xr’,’markersize’,20);

axis( [ 0 11 0 1.1*max(ally)]);

xlabel(‘x’,’fontsize’,20);

ylabel(‘y’,’fontsize’,20);

x1 = [0 11]

y1 = [ A1 (A1 +B1*11)];

plot(x1,y1);

You can check the fit parameters A1, A2, B1, and B2 to see that they are numerically the same. Using the measurements with the standard deviations is equivalent to using the means with the standard deviations of the mean. Whichever you choose to display is up to you. In physics usually one shows the standard error of the mean because the samples are assumed to be identical and there are often a surplus of points. It might make more sense in biology to show the actual variation of the samples where they can be different and the number of measurements is less. You can modify the code by increasing to show that the parameter estimates become more accurate and precise as is increased.

Model y = 10 + 3x + normrnd(0,3), with five samples using standard deviation and measurements

Model y = 10 + 3x + normrnd(0,3), with five samples using means and standard deviation of the means. We obtain the same fit as before

Example. Show the effect of an outlying point when a linear fit is weighted or unweighted.

We will use simulated data like before but where the last point is an outlier. We will show the effect in an unweighted fit and a weighted fit which is more robust and fits the data better.

clear;

format long;

x = 1:10

y = [ 13.69 14.9 17.1 24.4 25.4 26.6 29.48 32.0 37.36 48.0 ]

sigy = [ 0.58 1.0 1.0 1.2 1.6 1.4 0.86 1.8 0.84 8.1 ]

figure(1);clf;

errorbar(x,y,sigy,’.k’,’markersize’,10);

hold on;

axis( [ 0 11 0 1.3*max(y)]);

xlabel(‘x’,’fontsize’,20);

ylabel(‘y’,’fontsize’,20);

title(‘Effect of an outlying point on unweighted fit’,’fontsize’,18);

[A1 sigA1 B1 sigB1 chi1] = lineABx(x,y, sigy);

[A2 sigA2 B2 sigB2 ] = lineABxU(x, y);

x1 = [0 11]

y1 = [ A1 (A1 +B1*11)];

plot(x1,y1,’k-‘);

x2 = [0 11]

y2 = [ A2 (A2 +B2*11)];

plot(x2,y2,’k-.’);

Model y = 10 + 3x + normrnd(0,3), with five samples, standard deviation of the mean used and an outlying point. The outlying point shifts the slope of the unweighted fit, but does not effect the weighted fit too much.

LinearLeastSquaresCurveFittingOfStraightLineModels.pdf

The post Linear Least Squares Curve Fitting of Straight Line Models appeared first on www.sciencemathmastery.com.

]]>The post Propagation of Error with Single and Multiple Independent Variables appeared first on www.sciencemathmastery.com.

]]>Often it is necessary to calculate the uncertainty of derived quantities. This procedure and convention is called propagation of error. Calculating the propagation of error formula requires knowing how to take derivatives/ We give the propagation of error formulas for a single variable and multiple independent variable functions. We list some common error propagation formulas that work in most cases. Step by step error propagation can often be used as a shortcut to calculate the uncertainty.

If we know the error propagation formula we can find the terms that are causing the most uncertainty and try to reduce these dominant errors. It is useful to write a short MATLAB script to calculate error propagation, especially when the formula is complex or a lot of significant digits are involved.

As a math problem, I ask you to add 5.2 and 10.11. The result is straightforward 15.31. The numbers have different numbers of digits and precision. 5.2 is precise to the tenths while 10.11 is precise to the hundredths. This is not a problem in mathematics, but the situation is more subtle in error analysis. To see what is going on you need to look at the uncertainties.

Now, if I ask you to add and then what do you do? The first thing you want to ask yourself is if these numbers are properly reported measurements. After a quick check, we see there are two digits of uncertainty that match the precision of the measurement. In error analysis, you still add the numbers to find the sum, but to find the combined error we need a propagation of error formula. In this situation, the rule is

This chapter is all about finding the rules like this. We can now calculate the uncertainty and round to two significant digits. Then we match the precision of the calculated quantity to the uncertainty. Our final result is that

Notice that if you always start with uncertainties that have two significant digits, you get out a result that has two significant digits of uncertainty. You don’t have to worry that the first number 5.2 has only two significant digits, but the final result 15.3 has three significant digits. This comes about naturally by the golden rule of two significant digits of uncertainty.

There are error propagation rules for single variable functions and multiple variable functions. The single variable situation is not too complicated. You just need to understand differential calculus. For the situation with multiple variables functions, you have to worry if the variables are correlated or independent. In most cases, we don’t have to worry about correlations and the formulas are straightforward.

We now do a little review of differentiation because we need derivatives for error propagation. Ordinary derivatives are needed for a single variable function, and partial derivatives are needed for multiple variable functions. The derivative is defined in calculus as

The following list contains the basic formulas to master differential calculus.

The partial derivative is like the normal derivative except you hold the other variables constant when you take it. Here is the definition

For example, if we wanted to find all the partial derivatives of we would use the power rule .

There are three possible partial derivatives

If we have measured then the uncertainty in is given by

I assume you are familiar with differentiation. One can use a program like Mathematica or your graphing CAS calculator to find derivatives.

Example. The volume of a cube from error propagation.

Suppose a machinist has constructed a precise cube on a milling machine, and you want to find its volume. You repeatedly measure one of the sides to be

What is the volume and its uncertainty assuming the length, width, and height are identical?

We calculate the volume as and wait to round until we know the uncertainty.

To find the uncertainty we use the error propagation formula

After rounding we get .

Often we want to calculate some quantity that requires a series of individual measurements that must be combined. For example, if we wanted to calculate we would first measure then measure and then we could calculate . If we are doing error analysis, then to find the error in we will need to apply a formula of error propagation. There is a significant simplification if the two quantities are thought to be independent like the charge and mass of an electron. The error propagation formula for multiple independent variables labeled is given by

We will often use the notation that for simplification when we do calculations. Propagation of error for two independent variables is given by

Example. Show that .

Evaluating the partial derivatives

Applying the error propagation formula we have

Example. Find the perimeter of a rectangle of and .

The perimeter is equal to

We can find the perimeter by plugging in the values

We can find the uncertainty by applying the propagation of error formula

Our final answer is

Example. If , then show that

Evaluating the partial derivatives

Applying the error propagation formula we have

Divide both sides by .

Example. If , then show that

Evaluating the partial derivatives

Applying the error propagation formula we have

Divide both sides by .

then

then

then

then

then

then

Many formulas in physics are of the last form. For example, or .

Suppose we want to compute the error in the formula

It is convenient to do it step by step. Find the uncertainty in the numerator and denominator individually then combine the results.

We can let and then find the uncertainties in and first.

We can then calculate the uncertainty for which would be

You can check for yourself that you get the same result as if you did the calculation in one step.

If the formula was

then we couldn’t do it step by step the same way because the numerator and the denominator contain the same variables that repeat. The partial derivatives of the numerator and denominator cannot be separated.

The error propagation formula tells you where you should focus your efforts if you want to reduce the uncertainty in a derived quantity.

Example. Suppose you are calculating from experiments with a pendulum with small oscillations.

If the fractional uncertainty in the length and period are both ten percent, then how should you proceed to improve the experimental determination of ?

The error propagation formula for this situation.

We can easily write down the error propagation formula

If the fractional uncertainty in the length is 10 percent and the fractional uncertainty in the period is 10 percent then the expression in the square root is

Clearly measuring the period more precisely will have the largest effect on reducing the dominant error in the uncertainty.

Example. Finding the uncertainty in .

Let find and the uncertainty.

We know that . So we easily calculate the uncertainty using

Here is what the Matlab code looks like to calculate that.

format long;

N = 3.2524;

sigN = 0.0035;

f = exp(N)

sigf = exp(N)*sigN

>> f = 25.852311068629906

>> sigf = 0.090483088740205

We do the rounding and find

Example. Finding the uncertainty in .

What is the uncertainty in when ?

First, we convert the numbers to radians. The error propagation formula is then

It is easy to do all these calculations in Matlab.

format long;

t = 30;

sigt = 2.5;

trad = 30 * pi/180;

sigtrad = 2.5 * pi/180;

df = cos(trad)*sigtrad

f = sin(trad)

>> 0.03778748675488

>> 0.50000000000000

So the answer is .

Example. Finding the Stefan-Boltzmann constant and its uncertainty.

What is the value and uncertainty of the Stefan-Boltzmann constant

It’s funny that the notation for the Stefan-Boltzmann constant is already , don’t get confused.

The propagation of error formula is a common one.

Since is defined it has no uncertainty. Also constant factors like or have no uncertainty.

A quick google for “NIST hbar”, “NIST c”, and “NIST Boltzmann’s constant” gives the necessary data.

format long;

hb = 1.054571800e-34;

shb = 0.000000013e-34;

c = 299792458;

kb = 1.38064852e-23;

skb = 0.00000079e-23;

SB = pi^2/60 * kb^4/(c^2*hb^3)

sigSB = SB*(16*skb^2/kb^2+9*shb^2/hb^2)^(0.5)

>> 5.670366818327269e-08

>> 1.297991325923970e-13

The uncertainty is J msK. So the final result is

This calculation agrees with the NIST 2016 value.

ErrorPropagationSingleAndMultipleIndependentVariables.pdf

The post Propagation of Error with Single and Multiple Independent Variables appeared first on www.sciencemathmastery.com.

]]>The post Reporting Measurements and Uncertainties, Significant Digits, and Rounding appeared first on www.sciencemathmastery.com.

]]>In this lesson, we talk about the number of significant digits used to report measurements and uncertainties. There are conventional rules for the number of significant digits in a number. Experts find that the actual number of significant digits in a measurement is a function of how many significant digits in the uncertainty are used. The best practice is to use two significant digits of uncertainty and to match the precision of the measurement and the uncertainty. Reported measurements should be recorded as a value plus or minus another value, with parenthesis indicating the error in the last digits, or with scientific notation. For measurements that end in zero or zeros, there may be different conventions for indicating or calculating the number of significant digits. An alternative method is to use scientific notation, where all the digits are significant. Calculation of derived quantities from properly reported measurements and uncertainties also make new reportable quantities. Using two significant digits of uncertainty minimizes potential rounding errors.

The best way to learn how to correctly report measurements and uncertainties is to see some examples.

It would be correct to write any of the following.

We have introduced here scientific notation. A number in scientific notation is of the form where and is an integer. The two digits in parenthesis represent the uncertainty in the last two digits of the measurement. It would be incorrect to report any of these because the precision does match.

I would be very suspicious of the data analysis of any scientific report or book that makes this basic error.

The classic rules for the significant digits in a number are as follows. All non-zero digits are significant. In 1204 1, 2, and 4 are significant for sure. All zeroes between non-zero digits are significant. In 1204, the 0 is also significant. Leading zeros are not significant. 0.012 has only two significant digits Trailing zeroes after the decimal point are significant. 12.0 and 0.0400 both have three significant digits. Trailing zeroes without a decimal point are not significant. 1000 has only one significant digit.

I see a lot of books that give the rule, only use one significant digit of uncertainty. Unfortunately, this is not what the professionals are doing. If you go to the NIST or CODATA website that reports the fundamental constants. They are all given with two digits of uncertainty. We should try to emulate their example.

There is in fact only one rule. **Round to two significant digits of uncertainty and match the precision of the measurement by rounding.** For example, if we have or then we see that 4200 has a different number of significant digits in each instance. In the first case, 4200 has four significant digits, it just happens to end in two zeros. In the second case, 4200 has only two significant digits because the uncertainty has two significant digits and the precision must match. It would be incorrect to write . The digits 12 are insignificant in this instance and should be changed to zeros.

We also do not write . Although the precision does match, one cannot justify actually knowing that many digits of uncertainty. Such an expression can be fixed however and written as . Notice some rounding is involved. The rules for rounding are as follows. Consider the last two digits of the number as xy. If y is 0, 1, 2, 3, or 4 keep x. If y is 5, 6, 7, 8, or 9 then add one to x. Either remove y or replace y with zero depending on the situation.

Rounding the uncertainty is easy , , , , , , , and . Notice that 100 still only has two significant digits the first 1 and the first 0. For example, we could write as .

In practice, we keep all the digits for our calculation but then do the rounding at the end. We will usually have an error propagation formula or a method to calculate the spread of our data determining the uncertainty. We then round this to two significant digits. Then we can match our calculator value of the measurement. For example, If you feel there is ever any ambiguity then use scientific notation to indicate your significant digits.

If we wanted to add two numbers with uncertainty like and then we would get . The uncertainty is calculated as in this situation. We will learn this later using propagation of error. If we only used one significant digit there would be a major 41 percent rounding error, . This is one of the main reasons for using (at least) two significant digits. It is not necessary to keep an uncertainty to one part in a thousand by convention.

Certain fundamental physical constants or mathematical constants are defined so you don’t consider their significant digits. They have no uncertainty. You can pretend they have an infinite number of significant digits. If you are working with the speed of light then plug into your calculator

If you are calculating the area of a circle and your radius is then you should use just to have more digits than the number you are working with. You can use more digits but it won’t affect the rounding in the end.

Another example, where significant digits do not come into play, is conversion factors. The definition of an inch is that 25.4 mm equals one inch. This doesn’t mean that all your calculations have to come out with 3 significant digits. Keep using the methods that are outlined in this lesson.

ReportingMeasurementsSignificantDigitsAndRounding.pdf

The post Reporting Measurements and Uncertainties, Significant Digits, and Rounding appeared first on www.sciencemathmastery.com.

]]>