Generalized Linear Least Squares Curve Fitting


Least squares curve fitting can be generalized to work with any linear model to determine unique parameter estimates and uncertainties in terms of matrix equations. Linear models follow a general form. All other models are classified as nonlinear. While nonlinear models do not have a general analytic solution, some nonlinear models can be transformed into linear models. We give two examples of transforming an exponential model and a transformation of the enzymatic rate equation to straight line models using propagation of error. Polynomials form a class of linear models which we implement in MATLAB.

Linear and nonlinear models

In least square curve fitting, there is a step where we take derivatives to find the parameter estimates. If those equations are too complicated then we will have a hard time solving them. It is straightforward to solve these equations for so called linear models. Linear models are of the form

    \begin{align*} f(x_i; \mathbf{a})&= a_1 X_1(x_i) + a_2 X_2(x_i) + \ldots + a_M X_M(x_i) \end{align*}

One example of a linear model with M parameters is a polynomial.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 + a_2x_i + \ldots + a_{M}x_i^{M-1} \end{align*}

Other linear models might have the x_i replaced with general functions X_j(x_i). This gives a lot of different possible functions, but some are unnatural and do not occur in usual data sets.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 \sin(x_i) + a_2 e^{x_i} + \ldots + a_{M}\ln(x_i) \end{align*}

Linear means the parameter is to the first power out in front. The following model is not linear.

    \begin{align*} f(x_i;\mathbf{a}) &= a_1 \sin(a_2 x_i) + a_3 e^{a_4 x_i} \end{align*}

For models of the nonlinear form, the optimization problem is usually quite difficult to solve. We will consider what to do with nonlinear models later.

To perform a curve fit on a model with M parameters you need at least M different points. A good practice is to have many more measured points than parameters. When models are bloated with a lot of parameters you can usually fit a wide variety of functions. Keep it as simple as possible but not simpler. Use only as few parameters are necessary.

The generalized least squares problem

Now with our linear model of M parameters we would like to derive the best parameter estimates and their uncertainties in the least squares weighted framework. The model is.

    \[f(x_i;a_1,a_2,...,a_M) = f(x_i; \mathbf{a}) = \sum_{k=1}^M a_k X_k(x_i)\]

where X_k(x_i) are some generally defined continuous functions.

    \begin{align*} \chi^2 &= \sum_{i=1}^N \cfrac{({y}_i- \sum_{k=1}^M a_kX_k(x_i))^2}{{\sigma}_i^2} \end{align*}

To minizise \chi^2, we set each derivative equal to zero.

    \[\cfrac{\partial}{ \partial a_k} \chi^2=0\]

This gives a k= 1,2,\ldots, M linear system of equations. These “normal equations” give the optimal fit parameters \mathbf{a} which we can write as a vector.

    \[\sum_{i=1}^N \left ( y_i - \sum_{j=1}^M a_j X_j(x_i) \right) \cfrac{X_k(x_i)}{\sigma_i^2} =0\]

Taking terms on each side of the equal sign.

    \begin{align*} \sum_{i=1}^N \cfrac{y_i X_k(x_i)}{\sigma_i^2}=\sum_{i=1}^N \sum_{j=1}^M {a}_j X_j(x_i)\cfrac{X_k(x_i)}{{\sigma}_i^2} \end{align*}

It is useful to do some linear algebra to solve these equations for the parameters and their uncertainties. The “design matrix,” A_{ij}, is defined as the N row and M column matrix

    \begin{align*} A_{ij} = \cfrac{X_j(x_i)}{{\sigma}_i} \end{align*}

Second we define the column vector \mathbf{B} of M rows.

    \begin{align*} B_k = \sum_{i=1}^N \cfrac{{y}_iX_k(x_i)}{{\sigma}_i^2} \end{align*}

A third M\times M matrix, D_{jk} has inverse C_{jk}.

    \begin{align*} {D}_{jk} &= \sum_{i=1}^N \cfrac{X_j(x_i)X_k(x_i)}{{\sigma}_i^2} \to \mathbf{D} = \mathbf{A}^T \mathbf{A}\\ \mathbf{C}&= \mathbf{D}^{-1} \end{align*}

Now the normal equations can be rewritten as

    \begin{align*} \mathbf{D} \mathbf{{a}} = \mathbf{B} \end{align*}

To solve for the fit parameters \mathbf{{a}} it is easy to numerically compute the inverse in MATLAB.

    \begin{align*} \mathbf{{a}}& = \mathbf{D}^{-1} \mathbf{B} = \mathbf{C}\mathbf{B} \\ {a}_j &= \sum_{k=1}^M C_{jk} \left[ \sum_{i=1}^N \cfrac{y_iX_k(x_i)}{{\sigma}_i^2}\right] \end{align*}

General parameter uncertainties

Now, we can find the uncertainty in the parameters by the propagation of error formula.

    \begin{align*} {\sigma}_{a_j}^2 &= \sum_{i=1}^N \left( \cfrac{\partial a_j}{\partial {y}_i}\right)^2{\sigma}_i^2\\ \cfrac{\partial a_j}{\partial {y}_i}&= \sum_{k=1}^M \cfrac{C_{jk} X_{k}(x_i)}{{\sigma}_i^2}\\ {\sigma}^2_{a_j}&= \sum_{k=1}^M \sum_{l=1}^M C_{jk} C_{jl} \left(\sum_{i=1}^N \cfrac{X_k(x_i)X_l(x_i)}{{\sigma}_i^2} \right)\\ {\sigma}^2_{a_j}&=\sum_{k=1}^M \sum_{l=1}^M C_{jk} C_{jl} D_{lk} \end{align*}

But \displaystyle \sum_{l=1}^M C_{jl}D_{lk} = \delta_{jk} the Kroenecker delta because \mathbf{C} is the inverse of \mathbf{D}.

    \begin{align*} \delta_{ij} = \begin{cases} 1, & \text{if } i=j,\\ 0, & \text{if } i\neq j. \end{cases} \end{align*}

so the sum reduces to

    \begin{align*} {\sigma}^2_{{a}_j} = C_{jj} \end{align*}

These are the uncertainties in the fitting parameters.
The off diagonal terms of \mathbf{C} also give the covariances. If two different groups use the least squares theory on a linear model they should arrive at the same results for the fitting parameters and their uncertainties which are derived here. To summarize the results for linear least squares models

    \begin{align*} {a}_j &= \sum_{k=1}^M C_{jk} \left[ \sum_{i=1}^N \cfrac{{y}_iX_k(x_i)}{{\sigma}_i^2}\right] \qquad {\sigma}_{{a}_j} = \sqrt{C_{jj}} \end{align*}

Linearizing nonlinear models

Sometimes data following a nonlinear relationship may be transformed to a linear model. One can then estimate the transformed parameters exactly in conjunction with propagation of error. Here is a well known example from biochemistry for the velocity of a chemical reaction illustrating linearization.

    \begin{align*} V = \cfrac{V_{max}[S]}{K_m + [S]} \end{align*}

This equation can be rewritten as

    \begin{align*} \cfrac{1}{V} = \cfrac{K_m}{V_{max}}\cfrac{1}{[S]} + \cfrac{1}{V_{max}} \end{align*}

We start with ([S]_i, V_i, \sigma_{V_i}) as the set of information. \sigma_{V_i} is the standard deviation of the mean of the repeated measurement of V_i. The transformed variables are

    \begin{align*} x_i &= 1/[S]_i \quad y_i = 1/V_i \quad \sigma_i = \cfrac{\sigma_{V_i}}{V_i^2}\\ a_2 &= \cfrac{K_m}{V_{max}} \qquad a_1 = \cfrac{1}{V_{max}} \end{align*}

Propagation of error for a single variable has been used to find \sigma_i. One can then weighted fit to a line to find a_1, a_2, \sigma_{a_1}, and \sigma_{a_2}. K_m and V_{max} can then be found also by propagation of error. This transformation generates a linear plot called the Lineweaver-Burke plot.

Another straightforward example of linearization involves exponential decay.

    \begin{align*} N(t) = N_0e^{-kt} \end{align*}

One has the set of information (t_i,N_i,\sigma_{N_i}) where \sigma_{N_i} are the standard deviations of the mean.
Now take the logarithm of both sides.

    \begin{align*} \log(N_i) = \log(N_0) - kt_i \end{align*}

Now we transform

    \begin{align*} x_i &= t_i \quad y_i = \log(N_i) \quad \sigma_i = \sigma_{N_i}/N_i\\ a_1 &= \log(N_0) \quad a_2= -k \end{align*}

If the error bars were originally similar for most points on the linear plot they will be stretched out at longer times after transformation. After finding a_1 and \sigma_{a_1}, you can use error propagation to find N_0 and its uncertainty.

    \[\sigma_{N_0}= e^{a_1}\sigma_{a_1}\]

Example. Fit some simulated exponentially decaying data with linear least squares by linearizing the model.
We generate some model data which are the average of five points of y = 100e^{-t} + normrnd(0,4).

ti = 0.0:0.1:2.5
N0 = 100;
L = 1;
Ni = 5;
for i = 1:length(ti)
temp = [];
for j = 1:Ni
temp(j) = N0*exp(-L*ti(i))+normrnd(0,4);
yi(i) = mean(temp);
ei(i) = std(temp)/sqrt(Ni);

axis([-0.5 3 0 125]);

zi = log(yi);
si = ei./yi;
axis([-0.5 3 0 5])
hold on;

[A sigA B sigB chi2] = lineABx(ti, zi, si);
XX = 0:0.1:3
YY = A + XX.*B;

estN = exp(A)
estsigN = exp(A)*sigA
estL = -B
estsigL = sigB

hold on;
XX = 0:0.01:3
YY = estN.*exp(-estL.*XX);

Exponential decay with best fit

Linearization and weighted fit

Least squares polynomial

Example. Implement a polynomial least squares fitting function in MATLAB.
Here is the MATLAB function we will use.

function [ A sigA ] = genpolyfit( M, xi, yi, sigi )
% [A sigA ] = genpolyfi( M, xi, yi, sigi)
% Least sqaures polynomial fit to degree M-1

for k = 1:M
X(k,:) = xi.^(k-1);

for j = 1:M
for k = 1:M
D(j,k) = sum(sigi.^(-2).*X(j,:).*X(k,:));

for k = 1:M
B(k) = sum(X(k,:).*yi.*sigi.^(-2));

A = inv(D)*B’;
C = inv(D);
for k = 1:M
sigA(k) = sqrt(C(k,k));

Example. Simulate some data that can be fit by a parabola.
We can use the least square polynomial function.

A = 1;
B = 2;
C = 3;
Ni = 5;
xi = -10:1:10;
for i = 1:length(xi)
temp = [];
for j = 1:Ni
temp(j) = A + B*xi(i) + C*xi(i).^2 + normrnd(0,10);
yi(i) = mean(temp);
si(i) = std(temp)/sqrt(Ni);
errorbar(xi,yi,si,’.k’, ‘markersize’,10);
hold on;
[A sigA] = genpolyfit(3,xi,yi,si);
x1 = -11:0.1:11;
y1 = A(3).*x1.^2 + A(2).*x1 + A(1);
plot(x1,y1, ‘r’);
axis( [ -11 11 -50 1.1*max(y1)]);’

Curve fit to a parabola