MATH661 Homework 3 - Least squares problems

Posted: 09/09/21

Due: 09/15/21, 11:55PM

This assigment addresses one of the fundamental topics within scientific computation: finding economical descriptions of complex objects. Some object is described by 𝒚m (with m typically large), and a reduced description is sought by linear combination 𝑨𝒙, with 𝑨m×n (n<m, often nm). The surprisingly simple Euclidean geometry of Fig. 1 (which should be committed to memory) will be shown to have wide-ranging applicability to many different types of problems. The error (or residual) in approximating 𝒚 by 𝑨𝒙 is defined as

𝒓=𝒃-𝑨𝒙,

and 2-norm minimization defines the least-squares problem

min𝒙m||𝒃-𝑨𝒙||.

Figure 1. Least squares (2-norm error minimization) problem.

1Track 1

Consider data 𝒟={(ti,yi)|i=1,2,,m.} obtained by sampling a function f:, with yi=f(ti). An approximation is sought by linear combination

f(t)x1a1(t)+x2a2(t)++xnan(t).

Introduce the vector-valued function A:n (organized as a row vector)

A(t)=[ a1(t) a2(t) an(t) ],

such that

f(t)A(t)𝒙,𝒙=[ x1 x2 xn ]T.

With 𝒕=[ t1 t2 tm ]T a sampling of the function domain, a matrix is defined by

𝑨=A(𝒕)𝒙=[ a1(𝒕) a2(𝒕) an(𝒕) ]𝒙m×n.

Tasks. In each exercise below, construct the least-squares approximant for the stated range of n𝒩, sample points 𝒕, and choice of A(t). Plot in a single figure all components of A(t). Plot the approximants, as well as f in a single figure. Construct a convergence plot of the approximations by representation of point data ={(logn,log||𝒚-𝑨𝒙||)|𝑨m×n,n𝒩.}. For the largest value of n within 𝒩, construct a figure superimposing increasing number of sampling points, m. Comment on what you observe in each individual exercise. Also compare results from the different exercises.

  1. Start with the classical example due to Runge (1901)

    f:[-1,1],f(t)=1(1+25t2),ti=2(i-1)m-1-1, ={16,32,64,128,256},𝒩={4,8,16,32}, A(t)=[ 1 t t2 tn-1 ].

    Solution. With 𝒕m denoting the sampling point vector, and 𝒚m, the function values at the sample points, the least squares problem is

    min𝒙n||𝒚-𝑨𝒙||,

    where

    𝑨=[ 𝟏 𝒕 𝒕n-1 ].

    The solution to the least squares problem

    𝒛=argmin𝒙n||𝒚-𝑨𝒙||,

    furnishes the approximation

    f(t)f(t)=A(t)𝒛,

    that can be sampled at Mm points to assess approximation error.

    When carrying convergence studies such as these, it is convenient to define functions for common tasks:

    sample. Returns a sample of f:[a,b] at m equidistant points

    function sample(a,b,f,m)
      t = LinRange(a,b,m); y = f.(t)
      return t,y
    end;

    plotLSQ. Constructs a figure with plots of:

    1. The m sample points (i.e., data) represented as black dots;

    2. The function sampled at more points, i.e., Mm;

    3. The approximation sampled at M points

    function plotLSQ(a,b,f,Basis,m,n,M)
      data=sample(a,b,f,m); t=data[1]; y=data[2]
      Data=sample(a,b,f,M); T=Data[1]; Y=Data[2]
      A = Basis(t,n); x = A\y; z = Basis(T,n)*x
      plot(t,y,"ok",T,z,"-r",T,Y,"-b"); grid("on");
      xlabel("t"); ylabel("y");
      title("Least squares approximation")
    end;

    plotConv. Construct a convergence plot as n increases for fixed value of m

    function plotConv(t,y,Basis)
      E=zeros(4,2)
      for p=2:5
        n=2^p; A = Basis(t,n)
        x = A\y; logerr = log(2,norm(y-Basis(t,n)*x))
        E[p-1,1]=p; E[p-1,2]=logerr
      end
      plot(E[:,1],E[:,2],"o-"); grid("on")
      xlabel("log(2,n)"); ylabel("log(2,err)");
      title("Convergence plot");
    end;

    This problem solution is obtained by:

    1. defining the function f

      Runge(t)=1/(1+25*t^2);
    2. defining the basis

      function MonomialBasis(t,n)
        m=size(t)[1]; A=ones(m,1);
        for j=1:n-1    
          A = [A t.^j]
        end
        return A
      end;
    3. Invoking plotLSQ with appropriate parameters

      clf(); plotLSQ(-1,1,Runge,MonomialBasis,16,4,64);
      FigPrefix=homedir()*"/courses/MATH661/images/H03";
      savefig(FigPrefix*"Fig01.eps")

      Figure 2. Least squares approximant (red) of Runge function (blue) sampled at (black dots).

    Once the above are defined, cycling through the parameter ranges is straightforward (open figure folds to see code).

    Figure 3. First n=8 monomial basis functions

    data=sample(-1,1,Runge,128); t=data[1]; A=MonomialBasis(t,8);
    clf();
    for k=1:8
      global t
      plot(t,A[:,k])
    end
    grid("on"); xlabel("t"); ylabel("B");
    title("Monomial basis functions");
    FigPrefix=homedir()*"/courses/MATH661/images/H03";
    savefig(FigPrefix*"MonomialBasis.eps")

    Figure 4. Effect of increasing number of monomial basis functions in least squares approximation of Runge function. Equidistant sample points.

    for p=2:5
      clf(); n=2^p
      for q=4:8
        m=2^q; plotLSQ(-1,1,Runge,MonomialBasis,m,n,1024)
      end
      savefig(FigPrefix*"Fig01n="*string(n)*".eps")
    end

    Just as straightforward is the construction of the convergence plots for m. Note that when m=n=32, the error is 𝒪(ϵmach), and the least squares approximant becomes an interpolant. In all cases, as the number of basis functions n increases, the error decreases, Fig. 5. How to reconcile this observation to Fig. 13, where increasing error is observed at interval endpoints? Note that in Fig. 13, a comparison is made between the approximant and the exact function, both evaluated at more data points than present in the least squares approximation (LSQ). In Fig. 5, the error is evaluated only at points within the data used in the LSQ.

    Figure 5. Convergence of monomial approximation to available data.

    clf();
    for q=5:8
      m=2^q;
      data=sample(-1,1,Runge,m); t=data[1]; y=data[2]
      plotConv(t,y,MonomialBasis)
    end
    savefig(FigPrefix*"FigConv.eps")
  2. Instead of the equidistant point samples of the Runge example above use the Chebyshev nodes

    ti=cos(2i-12mπ),

    keeping other parameters as in Problem 1.

    Solution. Using the Chebyshev nodes corresponds to uniform sampling of the composite function h=fg, h:[δ,π-δ], with

    δ=π2m,f(t)=11+25t2,g(θ)=cos(θ).

    It is straightforward to modify plotLSQ to take an additional g argument

    function plotLSQ(a,b,f,g,Basis,m,n,M)
      data=sample(a,b,g,m); θ=data[1]; t=data[2]; y=f.(t)
      Data=sample(t[1],t[m],f,M); T=Data[1]; Y=Data[2]
      A = Basis(t,n); x = A\y; z = Basis(T,n)*x
      plot(t,y,"ok",T,z,"-r",T,Y,"-b"); grid("on");
      xlabel("t"); ylabel("y");
      title("Least squares approximation")
    end;

    The approximant obtained for m=16 sample points with n=4 basis functions is compared at M=64 points in Fig. 6, with the results for the full parameter sweep shown in Fig. 5. Use of the Chebyshev sample points leads to significantly smaller approximation error upon finer sampling of the domain of definition of f.

    g(θ)=cos(θ); m=16; δ=π/(2*m);
    clf(); plotLSQ(δ,π-δ,Runge,g,MonomialBasis,16,4,64);
    FigPrefix=homedir()*"/courses/MATH661/images/H03";
    savefig(FigPrefix*"Fig04.eps")

    Figure 6. Least squares approximant (red) of Runge function (blue) sampled at (black dots).

    Figure 7. Effect of increasing number of monomial basis functions in least squares approximation of Runge function. Chebyshev sample points.

    for p=2:5
      figure(p-1); clf(); n=2^p
      for q=4:8    
        local m=2^q; plotLSQ(δ,π-δ,Runge,g,MonomialBasis,m,n,1024)
      end
      savefig(FigPrefix*"Fig05n="*string(n)*".eps")
    end

    The convergence plot

    Figure 8. Convergenge of least squares approximation, monomial basis, Chebyshev sampling points.

    figure(1); clf(); g(θ)=cos(θ);
    for q=5:8
      local m,δ
      m=2^q; δ=π/(2*m);
      data=sample(δ,π-δ,g,m); θ=data[1]; t=data[2]; y=Runge.(t)  
      plotConv(t,y,MonomialBasis)
    end
    savefig(FigPrefix*"FigConvChebPts.eps")
  3. Instead of the monomial family of the Runge example, use the Fourier basis

    A(t)=[ 1 cosπt sinπt cosπnt sinπnt ]

    keeping other parameters as in Problem 1. In this case 𝑨m×(2n+1).

    Solution. Observe that the Runge function is even f(t)=f(-t), so all coefficients of the odd functions sin(πkt) should be zero. Define the basis set, and verify this numerically.

    function TrigBasis(t,n)
      m=size(t)[1]; A=ones(m,1);
      for k=1:n-1    
        A = [A cos.(π*k*t) sin.(π*k*t)]
      end
      return A
    end;
    data=sample(-1,1,Runge,64); t=data[1]; y=data[2];
    A=TrigBasis(t,16); size(A)

    [ 64 31 ] (1)

    x=A\y; norm(x[3:2:31])

    6.426216646559447e-17

    Figure 9. First n=8 monomial basis functions

    data=sample(-1,1,Runge,128); t=data[1]; A=TrigBasis(t,8);
    clf();
    for k=1:7
      global t
      plot(t,A[:,k])
    end;
    grid("on"); xlabel("t"); ylabel("B");
    title("Fourier basis functions");
    FigPrefix=homedir()*"/courses/MATH661/images/H03";
    savefig(FigPrefix*"TrigBasis.eps")

    n=4

    n=8

    n=16

    n=32

    Figure 10. Effect of increasing number of trigonometric basis functions in least squares approximation of Runge function. Equidistant sample points with m=2q, q=4:8 (from row 1 to row 5). Note that when more basis functions are used than data points, the approximation error is large.

    for p=2:5
      n=2^p; ns=string(n)
      for q=4:8
        local m
        m=2^q; ms=string(m)
        clf(); plotLSQ(-1,1,Runge,TrigBasis,m,n,512) 
        savefig(FigPrefix*"Fig08m="*ms*"n="*ns*".eps")
      end  
    end

    Figure 11. Convergence of least squares approximation, trigonometric basis. Notice the very rapid, exponential decrease in error once enough basis functions are used for the available sample points, a consequence of Parseval's theorem.

    figure(1); clf();
    for q=5:8
      local m,data,t,y
      m=2^q; data=sample(-1,1,Runge,m); t=data[1]; y=data[2]
      plotConv(t,y,TrigBasis)
    end
    savefig(FigPrefix*"FigConvTrig.eps")
  4. Instead of the monomial family of the Runge example, use the piecewise linear B-spline basis

    A(t)=[ N1(t) N2(t) Nn(t) ],
    Ni(t)={ 0, t<ti-1 t-ti-1h ti-1t<ti ti+1-th tit<ti+1 0 ti+1<t .,h=2n-1,

    keeping other parameters as in Problem 1.

    Solution. First define Ni(t), and then the basis set.

    function N(t,ti,h)
      if ((t<=ti-h) || (ti+h<=t)) return 0 end
      if (t<ti) return (t-ti)/h+1
      else      return (ti-t)/h+1 end
    end;
    function LinBsplineBasis(t,n)
      m=size(t)[1]; h=(t[m]-t[1])/(n-1); A=N.(t,t[1],h)
      for k=1:n-1    
        A = [A N.(t,t[1]+k*h,h)]
      end
      return A
    end;
    t=LinRange(-1,1,4); A=LinBsplineBasis(t,2)

    [ 1.0 0 0.6666666666666667 0.33333333333333326 0.33333333333333337 0.6666666666666666 0 1.0 ] (2)

    In contrast to the previous basis sets, the linear splines are non-zero only over the interval (ti-1,ti+1); they are said to have compact support.

    Figure 12. Linear B-spline functions for n=8.

    data=sample(-1,1,Runge,17); t=data[1]; A=LinBsplineBasis(t,9);
    clf();
    for k=1:8
      global t
      plot(t,A[:,k])
    end
    grid("on"); xlabel("t"); ylabel("B");
    title("Linear B-spline basis functions");
    FigPrefix=homedir()*"/courses/MATH661/images/H03";
    savefig(FigPrefix*"BsplineBasis.eps")

    n=4

    n=8

    n=16

    n=32

    Figure 13. Effect of increasing number of trigonometric basis functions in least squares approximation of Runge function. Equidistant sample points with m=2q, q=4:8 (from row 1 to row 5). Note that when more basis functions are used than data points, the approximation error is large.

    Figure 14. Convergence of least squares approximation, B-spline basis.

    figure(1); clf();
    for q=5:8
      local m,data,t,y
      m=2^q; data=sample(-1,1,Runge,m+1); t=data[1]; y=data[2]
      plotConv(t,y,LinBsplineBasis)
    end
    savefig(FigPrefix*"FigConvBspline.eps")
    for p=2:5
      local n=2^p; ns=string(n)
      for q=4:8
        local m=2^q; ms=string(m)
        clf(); plotLSQ(-1,1,Runge,LinBsplineBasis,m+1,n+1,512) 
        savefig(FigPrefix*"Fig09m="*ms*"n="*ns*".eps")
      end  
    end

2Track 2

  1. If 𝑸m×n has orthonormal columns, prove that 𝑷𝑸=𝑸𝑸 is an orthogonal projector onto C(𝑸). Determine the expression of 𝑷𝑨, the projector onto C(𝑨), with 𝑨m×n. Compare the number of arithmetic operations required to compute 𝒚=𝑷𝑨𝒙, by comparison to first determining the QR factorization, 𝑨=𝑸𝑹, and then computing 𝒚=𝑸𝑸𝒙.

    Solution. 𝑷𝑸 is an orthogonal projector if 𝑷𝑸2=𝑷𝑸, and 𝑷𝑸=𝑷𝑸. Both equalities are easily verified

    𝑷𝑸2=(𝑸𝑸)(𝑸𝑸)=𝑸(𝑸𝑸)𝑸=𝑸𝑰n𝑸=𝑸𝑸,𝑷𝑸=(𝑸𝑸)=𝑸𝑸=𝑷𝑸.

    Denote by 𝒙m the projection of 𝒚m onto C(𝑨), 𝒙=𝑨𝒖. Then

    𝑨(𝒚-𝑨𝒖)=𝟎(𝑨𝑨)𝒖=𝑨𝒚𝒖=(𝑨𝑨)-1𝑨𝒚𝒙=𝑨(𝑨𝑨)-1𝑨𝒚.

    The inverse of 𝑨𝑨 exists if 𝑨 is of full rank, and the projector is

    𝑷𝑨=𝑨(𝑨𝑨)-1𝑨.

    Note that

    𝑷𝑨2=𝑨(𝑨𝑨)-1𝑨𝑨(𝑨𝑨)-1𝑨=𝑷𝑨.

    Computing 𝑸𝑹=𝑨 requires (n-1++1)m=𝒪(n2m/2) flops, while computing 𝒚=𝑸𝑸𝒙 requires 2mn flops, for a total of 𝒪((n2/2+2n)m) flops.

    Computing 𝑨𝑨 requires n2m flops. The efficient way to apply 𝑷𝑨 is to compute:

    1. 𝒛=𝑨𝒙, mn flops;

    2. solution of (𝑨𝑨)𝒖=𝒛 (more economical than 𝒖=(𝑨𝑨)-1𝒛), n3/3 flops;

    3. 𝒚=𝑨𝒖, mn flops.

    The total is n2m+2mn+n3/3. It is more economical to first carry out the QR-decomposition and construct the projector onto C(𝑨) as 𝑸𝑸.

    Though this problem can be solved analytically, it is instructive to verify conclusions through computation. Generate a matrix, and time the 𝑷𝑸𝒙 operations

    m=1000; n=250; A=randn(m,n); x=rand(m,1);
    tQR = @elapsed F=qr(A);
    Q=Array(F.Q); tz = @elapsed z=Q'*x;
    ty = @elapsed yPQ=Q*z;
    tPQ = tQR+tz+ty; kFlopsPQ=m*(n^2/2+2*n)/1.0E3; [tPQ kFlopsPQ]

    [ 0.007526675 31750.0 ] (3)

    Now, time the 𝑷𝑨𝒙 operations

    tz = @elapsed z=A'*x;
    tAA = @elapsed AA=A'*A;
    tu = @elapsed u = AA\z;
    ty = @elapsed yPA=A*u;
    tPA=tz+tAA+tu+ty; kFlopsPA=(m*(n^2+2*n)+n^3/3)/1000;[tPA kFlopsPA]

    [ 0.004205787 68208.33333333333 ] (4)

    [tPA/tPQ kFlopsPA/kFlopsPQ]

    [ 0.5587841908943856 2.148293963254593 ] (5)

  2. Continuing Problem 1, determine ||𝑷𝑸||2, and express ||𝑷𝑨||2 in terms of the singular value decomposition of 𝑨. Comment the result, considering, say, length of shadows at various times of day.

    Solution.

  3. A matrix 𝑨=[aij]m×n is said to be banded with bandwidth B if aij=0 for |i-j|>B. Implement the modified Gram-Schmidt algorithm for 𝑨m×n a banded matrix with bandwidth B using as few arithmetic operations as possible.

  4. Solve Problem 1, Track 1.

  5. Solve Problem 4, Track 1.

  6. In Problem 1, Track 1, replace the monomial basis with the Legendre polynomials, whose samples are determined by QR decomposition 𝑸𝑹=𝑨. The resulting least squares problem is now

    min𝒙n||𝒚-𝑸𝒙||2.