MATH661 Homework 3 - Least squares problems

MATH661 Homework 3 - Least squares problems

Posted: 09/09/21

Due: 09/15/21, 11:55PM

This assigment addresses one of the fundamental topics within scientific computation: finding economical descriptions of complex objects. Some object is described by $𝒚 \in ℂ^{m}$ (with $m$ typically large), and a reduced description is sought by linear combination $𝑨 𝒙$ , with $𝑨 \in ℂ^{m \times n}$ ( $n < m$ , often $n ≪ m$ ). The surprisingly simple Euclidean geometry of Fig. 1 (which should be committed to memory) will be shown to have wide-ranging applicability to many different types of problems. The error (or residual) in approximating $𝒚$ by $𝑨 𝒙$ is defined as

𝒓 = 𝒃 - 𝑨 𝒙,

and 2-norm minimization defines the least-squares problem

{min}_{𝒙 \in ℂ^{m}} || 𝒃 - 𝑨 𝒙 || .

Figure 1. Least squares (2-norm error minimization) problem.

1Track 1

Consider data $𝒟 = {(t_{i}, y_{i}) | i = 1, 2, \dots, m .}$ obtained by sampling a function $f : ℝ \to ℝ$ , with $y_{i} = f (t_{i})$ . An approximation is sought by linear combination

f (t) ≅ x_{1} a_{1} (t) + x_{2} a_{2} (t) + \dots + x_{n} a_{n} (t) .

Introduce the vector-valued function $A : ℝ \to ℝ^{n}$ (organized as a row vector)

A (t) = [\begin{array}{llll} a_{1} (t) & a_{2} (t) & \dots & a_{n} (t) \end{array}],

such that

f (t) ≅ A (t) 𝒙, 𝒙 = {[\begin{array}{llll} x_{1} & x_{2} & \dots & x_{n} \end{array}]}^{T} .

With $𝒕 = {[\begin{array}{llll} t_{1} & t_{2} & \dots & t_{m} \end{array}]}^{T}$ a sampling of the function domain, a matrix is defined by

𝑨 = A (𝒕) 𝒙 = [\begin{array}{llll} a_{1} (𝒕) & a_{2} (𝒕) & \dots & a_{n} (𝒕) \end{array}] 𝒙 \in ℝ^{m \times n} .

Tasks. In each exercise below, construct the least-squares approximant for the stated range of $n \in 𝒩$ , sample points $𝒕$ , and choice of $A (t)$ . Plot in a single figure all components of $A (t)$ . Plot the approximants, as well as $f$ in a single figure. Construct a convergence plot of the approximations by representation of point data $ℰ = {(\log n, \log || 𝒚 - 𝑨 𝒙 ||) | 𝑨 \in ℝ^{m \times n}, n \in 𝒩 .}$ . For the largest value of $n$ within $𝒩$ , construct a figure superimposing increasing number of sampling points, $m \in ℳ$ . Comment on what you observe in each individual exercise. Also compare results from the different exercises.

Start with the classical example due to Runge (1901)

\begin{array}{l} f : [- 1, 1] \to ℝ, f (t) = \frac{1}{(1 + 25 t^{2})}, t_{i} = \frac{2 (i - 1)}{m - 1} - 1, \\ ℳ = {16, 32, 64, 128, 256}, 𝒩 = {4, 8, 16, 32}, \\ A (t) = [\begin{array}{lllll} 1 & t & t^{2} & \dots & t^{n - 1} \end{array}] . \end{array}

Solution. With $𝒕 \in ℝ^{m}$ denoting the sampling point vector, and $𝒚 \in ℝ^{m}$ , the function values at the sample points, the least squares problem is

{min}_{𝒙 \in ℝ^{n}} || 𝒚 - 𝑨 𝒙 ||,

where

𝑨 = [\begin{array}{llll} 𝟏 & 𝒕 & \dots & 𝒕^{n - 1} \end{array}] .

The solution to the least squares problem

𝒛 = {argmin}_{𝒙 \in ℝ^{n}} || 𝒚 - 𝑨 𝒙 ||,

furnishes the approximation

f (t) ≅ \tilde{f} (t) = A (t) 𝒛,

that can be sampled at $M ⩾ m$ points to assess approximation error.

When carrying convergence studies such as these, it is convenient to define functions for common tasks:

$•$ $\circ$ sample. Returns a sample of $f : [a, b] \to ℝ$ at $m$ equidistant points

∴	function sample(a,b,f,m) t = LinRange(a,b,m); y = f.(t) return t,y end;

∴

$•$ $\circ$ plotLSQ. Constructs a figure with plots of:

The $m$ sample points (i.e., data) represented as black dots;
The function sampled at more points, i.e., $M ⩾ m$ ;
The approximation sampled at $M$ points

∴

function plotLSQ(a,b,f,Basis,m,n,M)
  data=sample(a,b,f,m); t=data[1]; y=data[2]
  Data=sample(a,b,f,M); T=Data[1]; Y=Data[2]
  A = Basis(t,n); x = A\y; z = Basis(T,n)*x
  plot(t,y,"ok",T,z,"-r",T,Y,"-b"); grid("on");
  xlabel("t"); ylabel("y");
  title("Least squares approximation")
end;

∴

$•$ $\circ$ plotConv. Construct a convergence plot as $n$ increases for fixed value of $m$

∴

function plotConv(t,y,Basis)
  E=zeros(4,2)
  for p=2:5
    n=2^p; A = Basis(t,n)
    x = A\y; logerr = log(2,norm(y-Basis(t,n)*x))
    E[p-1,1]=p; E[p-1,2]=logerr
  end
  plot(E[:,1],E[:,2],"o-"); grid("on")
  xlabel("log(2,n)"); ylabel("log(2,err)");
  title("Convergence plot");
end;

∴

This problem solution is obtained by:

defining the function $f$
∴

Runge(t)=1/(1+25*t^2);
∴

defining the basis

∴	function MonomialBasis(t,n) m=size(t)[1]; A=ones(m,1); for j=1:n-1 A = [A t.^j] end return A end;

∴

Invoking plotLSQ with appropriate parameters

∴	clf(); plotLSQ(-1,1,Runge,MonomialBasis,16,4,64);

∴	FigPrefix=homedir()*"/courses/MATH661/images/H03";

∴	savefig(FigPrefix*"Fig01.eps")

∴

Figure 2. Least squares approximant (red) of Runge function (blue) sampled at (black dots).

Once the above are defined, cycling through the parameter ranges is straightforward (open figure folds to see code).

$•$ $\circ$

Figure 3. First $n = 8$ monomial basis functions

∴	data=sample(-1,1,Runge,128); t=data[1]; A=MonomialBasis(t,8);

∴

clf();

∴	for k=1:8 global t plot(t,A[:,k]) end

∴	grid("on"); xlabel("t"); ylabel("B");

∴	title("Monomial basis functions");

∴	FigPrefix=homedir()*"/courses/MATH661/images/H03";

∴	savefig(FigPrefix*"MonomialBasis.eps")

∴

$•$ $\circ$

Figure 4. Effect of increasing number of monomial basis functions in least squares approximation of Runge function. Equidistant sample points.

∴	for p=2:5 clf(); n=2^p for q=4:8 m=2^q; plotLSQ(-1,1,Runge,MonomialBasis,m,n,1024) end savefig(FigPrefix"Fig01n="string(n)*".eps") end

∴

Just as straightforward is the construction of the convergence plots for $m \in ℳ$ . Note that when $m = n = 32$ , the error is $𝒪 (ϵ_{mach})$ , and the least squares approximant becomes an interpolant. In all cases, as the number of basis functions $n$ increases, the error decreases, Fig. 5. How to reconcile this observation to Fig. 13, where increasing error is observed at interval endpoints? Note that in Fig. 13, a comparison is made between the approximant and the exact function, both evaluated at more data points than present in the least squares approximation (LSQ). In Fig. 5, the error is evaluated only at points within the data used in the LSQ.

$•$ $\circ$

Figure 5. Convergence of monomial approximation to available data.

∴

clf();

∴	for q=5:8 m=2^q; data=sample(-1,1,Runge,m); t=data[1]; y=data[2] plotConv(t,y,MonomialBasis) end

∴	savefig(FigPrefix*"FigConv.eps")

∴

Instead of the equidistant point samples of the Runge example above use the Chebyshev nodes

t_{i} = \cos (\frac{2 i - 1}{2 m} π),

keeping other parameters as in Problem 1.

Solution. Using the Chebyshev nodes corresponds to uniform sampling of the composite function $h = f \circ g$ , $h : [δ, π - δ] \to ℝ$ , with

δ = \frac{π}{2 m}, f (t) = \frac{1}{1 + 25 t^{2}}, g (θ) = \cos (θ) .

It is straightforward to modify plotLSQ to take an additional $g$ argument

∴

function plotLSQ(a,b,f,g,Basis,m,n,M)
  data=sample(a,b,g,m); θ=data[1]; t=data[2]; y=f.(t)
  Data=sample(t[1],t[m],f,M); T=Data[1]; Y=Data[2]
  A = Basis(t,n); x = A\y; z = Basis(T,n)*x
  plot(t,y,"ok",T,z,"-r",T,Y,"-b"); grid("on");
  xlabel("t"); ylabel("y");
  title("Least squares approximation")
end;

∴

The approximant obtained for $m = 16$ sample points with $n = 4$ basis functions is compared at $M = 64$ points in Fig. 6, with the results for the full parameter sweep shown in Fig. 5. Use of the Chebyshev sample points leads to significantly smaller approximation error upon finer sampling of the domain of definition of $f$ .

∴	g(θ)=cos(θ); m=16; δ=π/(2*m);

∴	clf(); plotLSQ(δ,π-δ,Runge,g,MonomialBasis,16,4,64);

∴	FigPrefix=homedir()*"/courses/MATH661/images/H03";

∴	savefig(FigPrefix*"Fig04.eps")

∴

Figure 6. Least squares approximant (red) of Runge function (blue) sampled at (black dots).

$•$ $\circ$

Figure 7. Effect of increasing number of monomial basis functions in least squares approximation of Runge function. Chebyshev sample points.

∴	for p=2:5 figure(p-1); clf(); n=2^p for q=4:8 local m=2^q; plotLSQ(δ,π-δ,Runge,g,MonomialBasis,m,n,1024) end savefig(FigPrefix"Fig05n="string(n)*".eps") end

∴

The convergence plot

$•$ $\circ$

Figure 8. Convergenge of least squares approximation, monomial basis, Chebyshev sampling points.

∴	figure(1); clf(); g(θ)=cos(θ);

∴	for q=5:8 local m,δ m=2^q; δ=π/(2*m); data=sample(δ,π-δ,g,m); θ=data[1]; t=data[2]; y=Runge.(t) plotConv(t,y,MonomialBasis) end

∴	savefig(FigPrefix*"FigConvChebPts.eps")

∴

Instead of the monomial family of the Runge example, use the Fourier basis

A (t) = [\begin{array}{llllll} 1 & \cos π t & \sin π t & \dots & \cos π n t & \sin π n t \end{array}]

keeping other parameters as in Problem 1. In this case $𝑨 \in ℝ^{m \times (2 n + 1)}$ .

$•$ $\circ$ Solution. Observe that the Runge function is even $f (t) = f (- t)$ , so all coefficients of the odd functions $\sin (π k t)$ should be zero. Define the basis set, and verify this numerically.

∴	function TrigBasis(t,n) m=size(t)[1]; A=ones(m,1); for k=1:n-1 A = [A cos.(πkt) sin.(πkt)] end return A end;

∴	data=sample(-1,1,Runge,64); t=data[1]; y=data[2];

∴	A=TrigBasis(t,16); size(A)

$[\begin{array}{c} 64 \\ 31 \end{array}]$ (1)

∴	x=A\y; norm(x[3:2:31])

$6.426216646559447 e - 17$

∴

$•$ $\circ$

Figure 9. First $n = 8$ monomial basis functions

∴	data=sample(-1,1,Runge,128); t=data[1]; A=TrigBasis(t,8);

∴

clf();

∴	for k=1:7 global t plot(t,A[:,k]) end;

∴	grid("on"); xlabel("t"); ylabel("B");

∴	title("Fourier basis functions");

∴	FigPrefix=homedir()*"/courses/MATH661/images/H03";

∴	savefig(FigPrefix*"TrigBasis.eps")

∴

$•$ $\circ$

$n = 4$	$n = 8$	$n = 16$	$n = 32$

Figure 10. Effect of increasing number of trigonometric basis functions in least squares approximation of Runge function. Equidistant sample points with $m = 2^{q}$ , $q = 4 : 8$ (from row 1 to row 5). Note that when more basis functions are used than data points, the approximation error is large.

∴	for p=2:5 n=2^p; ns=string(n) for q=4:8 local m m=2^q; ms=string(m) clf(); plotLSQ(-1,1,Runge,TrigBasis,m,n,512) savefig(FigPrefix"Fig08m="ms"n="ns*".eps") end end

∴

$•$ $\circ$

Figure 11. Convergence of least squares approximation, trigonometric basis. Notice the very rapid, exponential decrease in error once enough basis functions are used for the available sample points, a consequence of Parseval's theorem.

∴	figure(1); clf();

∴	for q=5:8 local m,data,t,y m=2^q; data=sample(-1,1,Runge,m); t=data[1]; y=data[2] plotConv(t,y,TrigBasis) end

∴	savefig(FigPrefix*"FigConvTrig.eps")

∴

Instead of the monomial family of the Runge example, use the piecewise linear $B$ -spline basis

A (t) = [\begin{array}{llll} N_{1} (t) & N_{2} (t) & \dots & N_{n} (t) \end{array}],

N_{i} (t) = {\begin{cases} 0, & t < t_{i - 1} \\ \frac{t - t_{i - 1}}{h} & t_{i - 1} ⩽ t < t_{i} \\ \frac{t_{i + 1} - t}{h} & t_{i} ⩽ t < t_{i + 1} \\ 0 & t_{i + 1} < t \end{cases} ., h = \frac{2}{n - 1},

keeping other parameters as in Problem 1.

$•$ $\circ$ Solution. First define $N_{i} (t)$ , and then the basis set.

∴	function N(t,ti,h) if ((t<=ti-h) \|\| (ti+h<=t)) return 0 end if (t<ti) return (t-ti)/h+1 else return (ti-t)/h+1 end end;

∴	function LinBsplineBasis(t,n) m=size(t)[1]; h=(t[m]-t[1])/(n-1); A=N.(t,t[1],h) for k=1:n-1 A = [A N.(t,t[1]+k*h,h)] end return A end;

∴	t=LinRange(-1,1,4); A=LinBsplineBasis(t,2)

$[\begin{array}{cc} 1.0 & 0 \\ 0.6666666666666667 & 0.33333333333333326 \\ 0.33333333333333337 & 0.6666666666666666 \\ 0 & 1.0 \end{array}]$ (2)

∴

In contrast to the previous basis sets, the linear splines are non-zero only over the interval $(t_{i - 1}, t_{i + 1})$ ; they are said to have compact support.

$•$ $\circ$

Figure 12. Linear $B$ -spline functions for $n = 8$ .

∴	data=sample(-1,1,Runge,17); t=data[1]; A=LinBsplineBasis(t,9);

∴

clf();

∴	for k=1:8 global t plot(t,A[:,k]) end

∴	grid("on"); xlabel("t"); ylabel("B");

∴	title("Linear B-spline basis functions");

∴	FigPrefix=homedir()*"/courses/MATH661/images/H03";

∴	savefig(FigPrefix*"BsplineBasis.eps")

∴

$•$ $\circ$

$n = 4$	$n = 8$	$n = 16$	$n = 32$

Figure 13. Effect of increasing number of trigonometric basis functions in least squares approximation of Runge function. Equidistant sample points with $m = 2^{q}$ , $q = 4 : 8$ (from row 1 to row 5). Note that when more basis functions are used than data points, the approximation error is large.

$•$ $\circ$

Figure 14. Convergence of least squares approximation, $B$ -spline basis.

∴	figure(1); clf();

∴	for q=5:8 local m,data,t,y m=2^q; data=sample(-1,1,Runge,m+1); t=data[1]; y=data[2] plotConv(t,y,LinBsplineBasis) end

∴	savefig(FigPrefix*"FigConvBspline.eps")

∴

∴	for p=2:5 local n=2^p; ns=string(n) for q=4:8 local m=2^q; ms=string(m) clf(); plotLSQ(-1,1,Runge,LinBsplineBasis,m+1,n+1,512) savefig(FigPrefix"Fig09m="ms"n="ns*".eps") end end

∴

2Track 2

If $𝑸 \in ℂ^{m \times n}$ has orthonormal columns, prove that $𝑷_{𝑸} = 𝑸 𝑸^{*}$ is an orthogonal projector onto $C (𝑸)$ . Determine the expression of $𝑷_{𝑨}$ , the projector onto $C (𝑨)$ , with $𝑨 \in ℂ^{m \times n}$ . Compare the number of arithmetic operations required to compute $𝒚 = 𝑷_{𝑨} 𝒙$ , by comparison to first determining the $Q R$ factorization, $𝑨 = 𝑸 𝑹$ , and then computing $𝒚 = 𝑸 𝑸^{*} 𝒙$ .

Solution. $𝑷_{𝑸}$ is an orthogonal projector if $𝑷_{𝑸}^{2} = 𝑷_{𝑸}$ , and $𝑷_{𝑸} = 𝑷_{𝑸}^{*}$ . Both equalities are easily verified

𝑷_{𝑸}^{2} = (𝑸 𝑸^{*}) (𝑸 𝑸^{*}) = 𝑸 (𝑸^{*} 𝑸) 𝑸^{*} = 𝑸 𝑰_{n} 𝑸^{*} = 𝑸 𝑸^{*}, 𝑷_{𝑸}^{*} = {(𝑸 𝑸^{*})}^{*} = 𝑸 𝑸^{*} = 𝑷_{𝑸} .

Denote by $𝒙 \in ℂ^{m}$ the projection of $𝒚 \in ℂ^{m}$ onto $C (𝑨)$ , $𝒙 = 𝑨 𝒖$ . Then

𝑨^{*} (𝒚 - 𝑨 𝒖) = 𝟎 \Rightarrow (𝑨^{*} 𝑨) 𝒖 = 𝑨^{*} 𝒚 \Rightarrow 𝒖 = {(𝑨^{*} 𝑨)}^{- 1} 𝑨^{*} 𝒚 \Rightarrow 𝒙 = 𝑨 {(𝑨^{*} 𝑨)}^{- 1} 𝑨^{*} 𝒚 .

The inverse of $𝑨^{*} 𝑨$ exists if $𝑨$ is of full rank, and the projector is

𝑷_{𝑨} = 𝑨 {(𝑨^{*} 𝑨)}^{- 1} 𝑨^{*} .

Note that

𝑷_{𝑨}^{2} = 𝑨 {(𝑨^{*} 𝑨)}^{- 1} 𝑨^{*} 𝑨 {(𝑨^{*} 𝑨)}^{- 1} 𝑨^{*} = 𝑷_{𝑨} .

Computing $𝑸 𝑹 = 𝑨$ requires $(n - 1 + \dots + 1) m = 𝒪 (n^{2} m / 2)$ flops, while computing $𝒚 = 𝑸 𝑸^{*} 𝒙$ requires $2 m n$ flops, for a total of $𝒪 ((n^{2} / 2 + 2 n) m)$ flops.

Computing $𝑨^{*} 𝑨$ requires $n^{2} m$ flops. The efficient way to apply $𝑷_{𝑨}$ is to compute:

$𝒛 = 𝑨^{*} 𝒙$ , $m n$ flops;
solution of $(𝑨^{*} 𝑨) 𝒖 = 𝒛$ (more economical than $𝒖 = {(𝑨^{*} 𝑨)}^{- 1} 𝒛$ ), $n^{3} / 3$ flops;
$𝒚 = 𝑨 𝒖$ , $m n$ flops.

The total is $n^{2} m + 2 m n + n^{3} / 3$ . It is more economical to first carry out the $Q R$ -decomposition and construct the projector onto $C (𝑨)$ as $𝑸 𝑸^{*}$ .

Though this problem can be solved analytically, it is instructive to verify conclusions through computation. Generate a matrix, and time the $𝑷_{𝑸} 𝒙$ operations

∴	m=1000; n=250; A=randn(m,n); x=rand(m,1);

∴	tQR = @elapsed F=qr(A);

∴	Q=Array(F.Q); tz = @elapsed z=Q'*x;

∴	ty = @elapsed yPQ=Q*z;

∴	tPQ = tQR+tz+ty; kFlopsPQ=m(n^2/2+2n)/1.0E3; [tPQ kFlopsPQ]

$[\begin{array}{cc} 0.007526675 & 31750.0 \end{array}]$ (3)

∴

Now, time the $𝑷_{𝑨} 𝒙$ operations

∴	tz = @elapsed z=A'*x;

∴	tAA = @elapsed AA=A'*A;

∴	tu = @elapsed u = AA\z;

∴	ty = @elapsed yPA=A*u;

∴	tPA=tz+tAA+tu+ty; kFlopsPA=(m(n^2+2n)+n^3/3)/1000;[tPA kFlopsPA]

$[\begin{array}{cc} 0.004205787 & 68208.33333333333 \end{array}]$ (4)

∴	[tPA/tPQ kFlopsPA/kFlopsPQ]

$[\begin{array}{cc} 0.5587841908943856 & 2.148293963254593 \end{array}]$ (5)

∴

Continuing Problem 1, determine ${|| 𝑷_{𝑸} ||}_{2}$ , and express ${|| 𝑷_{𝑨} ||}_{2}$ in terms of the singular value decomposition of $𝑨$ . Comment the result, considering, say, length of shadows at various times of day.

Solution.
A matrix $𝑨 = [a_{i j}] \in ℂ^{m \times n}$ is said to be banded with bandwidth $B$ if $a_{i j} = 0$ for $| i - j | > B$ . Implement the modified Gram-Schmidt algorithm for $𝑨 \in ℂ^{m \times n}$ a banded matrix with bandwidth $B$ using as few arithmetic operations as possible.
Solve Problem 1, Track 1.
Solve Problem 4, Track 1.
In Problem 1, Track 1, replace the monomial basis with the Legendre polynomials, whose samples are determined by $Q R$ decomposition $𝑸 𝑹 = 𝑨$ . The resulting least squares problem is now
${min}_{𝒙 \in ℝ^{n}} {|| 𝒚 - 𝑸 𝒙 ||}_{2} .$