MATLAB Examples 2 (covering Statistics Lectures 3 and 4)

Contents

Example 1: Fit a linearized regression model

% generate some random data
x = -20:20;
y = 1.5*x.^2 - 60*x + 10 + 30*randn(size(x));

% given these data, we will now try to fit a model
% to the data.  we assume we know what model to use,
% namely, y = ax^2 + bx + c where a, b, and c are
% free parameters.  notice that this was the model
% we used to generate the data.

% since our model is a linearized model, we can express
% it using simple matrix notation.

% construct the regressor matrix
X = [x(:).^2 x(:) ones(length(x),1)];

% estimate the free parameters of the model
% using ordinary least-squares (OLS) estimation
w = inv(X'*X)*X'*y(:);

% what are the estimated weights?
w
w =

           1.5368275257135
          -60.638944801903
          8.15661270608602

% what is the model fit?
modelfit = X*w;

% what is the squared error achieved by this model fit?
squarederror = sum((y(:)-modelfit).^2);

% visualize
figure;
hold on;
scatter(x,y,'r.');
ax = axis;
xx = linspace(ax(1),ax(2),100)';
yy = [xx.^2 xx ones(length(xx),1)]*w;
plot(xx,yy,'k-');
xlabel('x');
ylabel('y');
title(sprintf('squared error = %.1f',squarederror));

Example 2: Fit a parametric nonlinear model

% generate some random data
x = 0:.05:3;
y = 5*x.^3.5 + 5*randn(size(x));

% given these data, we will now try to fit a model
% to the data.  we assume we know what model to use,
% namely, y = ax^n where a and n are free parameters.
% notice that this was the model we used to generate the data.

% define optimization options
options = optimset('Display','iter','FunValCheck','on', ...
                   'MaxFunEvals',Inf,'MaxIter',Inf, ...
                   'TolFun',1e-6,'TolX',1e-6);

% define bounds for the parameters
%              a    n
paramslb = [-Inf    0];  % lower bound
paramsub = [ Inf  Inf];  % upper bound

% define the initial seed
%            a     n
params0 = [  1     1];

% estimate the free parameters of the model using nonlinear optimization
  % hint: pp is a vector of parameters like [a n],
  %       xx is a vector of x-values like [.1 .4 .5],
  %       and what modelfun does is to output the predicted y-values.
modelfun = @(pp,xx) pp(1)*xx.^pp(2);
[params,resnorm,residual,exitflag,output] = lsqcurvefit(modelfun,params0,x,y,paramslb,paramsub,options);
                                         Norm of      First-order 
 Iteration  Func-count     f(x)          step          optimality   CG-iterations
     0          3          431921                      7.93e+03
     1          6          431921             10       7.93e+03            0
     2          9          247013            2.5        8.3e+04            0
     3         12          247013        2.81299        8.3e+04            0
     4         15         98286.2       0.703246       1.09e+05            0
     5         18         17315.7        1.40649       3.78e+05            0
     6         21         1894.05       0.327179       2.46e+04            0
     7         24         1482.46       0.939262       9.09e+03            0
     8         27         1241.08       0.422605       1.33e+03            0
     9         30         1237.01     0.00758631            5.5            0
    10         33         1237.01    1.29563e-05       7.82e-05            0

Local minimum possible.

lsqcurvefit stopped because the final change in the sum of squares relative to 
its initial value is less than the selected value of the function tolerance.



% what are the estimated parameters?
params
params =

           5.4338041940236          3.42528833322854

% what is the model fit?
modelfit = modelfun(params,x);

% what is the squared error achieved by this model fit?
squarederror = sum((y(:)-modelfit(:)).^2);
% (note that the 'resnorm' output also provides the squared error.)

% visualize
figure;
hold on;
scatter(x,y,'r.');
ax = axis;
xx = linspace(ax(1),ax(2),100)';
yy = modelfun(params,xx);
plot(xx,yy,'k-');
xlabel('x');
ylabel('y');
title(sprintf('squared error = %.1f',squarederror));

Example 3: Another optimization example

% generate some random data
x = rand(1,1000).^2;

% we want to determine the value that minimizes the sum of the
% absolute differences between the value and the data points

% define some stuff
options = optimset('Display','iter','FunValCheck','on', ...
                   'MaxFunEvals',Inf,'MaxIter',Inf, ...
                   'TolFun',1e-6,'TolX',1e-6);
paramslb = [];  % [] means no bounds
paramsub = [];
params0 = [0];

% perform the optimization
  % hint: here we have to apply a square-root transformation so that when
  % lsqnonlin squares the output of costfun, we will be back to absolute error.
costfun = @(pp) sqrt(abs(x-pp));
[params,resnorm,residual,exitflag,output] = lsqnonlin(costfun,params0,paramslb,paramsub,options);
                                         Norm of      First-order 
 Iteration  Func-count     f(x)          step          optimality   CG-iterations
     0          2          346.47                           500
     1          4         346.385    8.61928e-05            490            0
     2          6         341.862     0.00491093            440            0
     3          8         333.538      0.0101418            387            0
     4         10         315.811      0.0259447            306            0
     5         12         308.364      0.0129775            271            0
     6         14         292.776      0.0324211            215            0
     7         16          279.22       0.035143            170            0
     8         18          270.08      0.0314543            118            0
     9         20         263.724       0.031138             88            0
    10         22         259.555      0.0265819             67            0
    11         24         258.266      0.0107348             52            0
    12         26         257.181       0.012298             35            0
    13         28         256.649     0.00864949             25            0
    14         30         256.518     0.00310659             19            0
    15         32         256.401     0.00335775             15            0
    16         34         256.317     0.00312291             12            0
    17         36         256.252     0.00307883              7            0
    18         38         256.245    0.000546973              6            0
    19         40         256.236    0.000864426              5            0
    20         42         256.227      0.0010377              4            0
    21         44         256.224    0.000510969              3            0
    22         46          256.22    0.000677599              3            0
    23         48         256.218    0.000404828              1            0
    24         50         256.218    0.000140253              1            0
    25         52         256.217    0.000193302              1            0
    26         54         256.217     0.00021431              1            0
    27         56         256.217    0.000205629              1            0
    28         58         256.216    9.93736e-05       5.88e-05            0

Local minimum possible.

lsqnonlin stopped because the final change in the sum of squares relative to 
its initial value is less than the selected value of the function tolerance.



% what is the solution?
params
params =

         0.260043242623885

% what is the median of the data?
median(x)
ans =

         0.260365029783367

% notice that the solution is nearly identical to the median of the data.
% (the slop comes from the tolerance used in the optimization and
% from the interpolation used in the calculation of the median.)