MATH661 HW03 - SVD applications

MATH661 HW03 - SVD applications

Posted: 09/13/23

Due: 09/20/23, 11:59PM

Tracks 1 & 2: 1. Track 2: 2.

This homework marks your first foray into realistic scientific computation by applying concepts from linear algebra, specifically using the singular value decomposition to analyze hurricane Lee, still active at this time.

1Problem setup

1.1Image data

Processing of image data through the tools of linear algebra is often encountered, and in this assignment you shall work with satellite images of hurricane Lee of 2023 Season. Open the following folds and execute the code to set up your environment.

$\circ$ The following commands within a BASH shell brings up an animation of hurricane Lee (ImageMagick package must be installed and in the system path.

$\circ$ The model will use various Julia packages to process image data, carry out linear algebra operations. The following commands build the appropriate Julia environment. The Julia packages are downloaded, compiled and stored in your local Julia library. Note: compiling the Images package takes a long time and it's best to do this within a terminal window. Compiling the packages need only be done once.

$\circ$ Once compiled, packages are imported into the current environment

$\circ$ With appropriate packages in place and available, the satellite imagery can be imported into the Julia environment and used to insert images such as the one in Fig. (1).

∴	pre=homedir()*"/courses/MATH661/data/weather/";

∴	gif=load(pre*"HurricaneLee.gif");

∴	(mx,my,nf)=size(gif)

$[\begin{array}{c} 720 \\ 850 \\ 100 \end{array}]$ (1)

∴

The following commands display the $n^{th}$ frame in the animation, and save it within a homework directory. Use this template to save interesting images

∴	n=20; frame = Gray.(gif[:,:,n]);

∴	A = Float32.(frame); clf(); imshow(A,cmap="gray");

∴	hwdir=homedir()*"/courses/MATH661/homework/H03";

∴	savefig(hwdir*"/H03Fig01.png");

∴

$\circ$ Image data must be quantified, i.e., transformed into numbers prior to analysis. In this case matrices of 32-bit floats are obtained from the gray-scale intensity values associated with each pixel. The 720 by 850 pixel images might be too large for machines with limited hardware resources, so adapt the window to obtain reasonable code execution times, Fig. 2.

∴	window = Gray.(gif[51:650,101:750,:]);

∴	data = Float32.(window); clf(); imshow(data[:,:,nf],cmap="gray");

∴	(nx,ny,nf) = size(data)

$[\begin{array}{c} 600 \\ 650 \\ 100 \end{array}]$ (2)

∴

Figure 1. Night-time satellite imagery of hurricane Lee. Superimposed on the familiar overall anti-cyclone pattern are small-scale features (lightning, cloud patterns in the arms). The SVD can be used to distinguish between small and large scale features.

Figure 2. Data window of a time snapshot of hurricane Lee.

1.2Computing the SVD

The SVD of the array of floats obtained from an image identifies correlations in gray-level intensity between pixel positions, an encoded description of weather physics. Here are the Julia instructions to compute the SVD of one frame of image data. It is instructive to plot the singular values $Σ = diag (σ_{1}, σ_{2}, \dots)$ , in log coordinates.

∴	n=32; A=data[:,:,n]; U,S,V=svd(A);

∴	clf(); plot(log10.(S),"o"); xlabel(L"mode number $k$"); ylabel(L"lg(\sigma_k)");

∴	title("Singular values of hurricane Lee image"); grid("on");

∴	hwdir=homedir()"/courses/MATH661/homework/H03"; savefig(hwdir"/H03Fig03a.eps");

∴	clf(); plot(log10.(S[1:100]),"o"); xlabel(L"mode number $k$"); ylabel(L"lg(\sigma_k)");

∴	title("Singular values of hurricane Lee image"); grid("on");

∴	savefig(hwdir*"/H03Fig03b.eps");

∴

Figure 3. The singular values of a hurricane satellite image rapidly decay from $\sim 10^{2.5}$ to $\sim 10^{0}$ , a two order-of-magnitude decrease over the first 100 modes (singular vectors). This indicates considerable data compression is possible.

$\circ$ Define a function rsvd to sum the rank-one updates from $p$ to $q$ from an SVD,

𝑨 = 𝑼 𝚺 𝑽^{T} ≅ 𝑩 = \sum_{k = p}^{q} σ_{j} 𝒖_{j} 𝒗_{j}^{T} .

The sum from $p = 1$ to $q$ contains the first $q$ dominant correlations, identified as large scale weather patterns. The sum from $p = 101$ to $q = 140$ can be identified as small scale patterns.

Figure 4. (Left) Reconstruction of hurricane Lee image from first $q = 12$ modes, showing large scale patterms. (Right) Correlated small-scale patterns obtained by sum of modes $p = 101$ to $q = 140$ .

2Common problems (both tracks)

Consider large-scale weather patterns $𝑩_{k}$ , $𝑩_{k - 1}$ , $𝑩_{k - 2}$ obtained from the $p$ most significant modes from frames $k, k - 1, k - 2$ (i.e., different times in the past). Assuming a constant rate of change leads to the prediction

𝑷_{k} = 𝑩_{k - 1} + (𝑩_{k - 1} - 𝑩_{k - 2})

(3)

for the known large-scale weather $𝑩_{k}$ . The prediction error is $ε_{k, 1} (p) = {|| 𝑷_{k} - 𝑩_{k} ||}_{F} / {|| 𝑩_{k} ||}_{F}$ . Present a plot of the prediction error for various values of $p$ over the recorded hurricane data. Analyze your results to answer the question: “can overall storm evolution be predicted by linear extrapolation of past data?”

Solution.

$\circ$ Define a function to construct prediction at time $k$ by linear extrapolation of modes from $p$ to $q$ at times $k - 1, k - 2$

$\circ$ Define a function to compare the coarse grained prediction and data

∴

function Compare(k,p,q,data,predict)
  P=predict(k,p,q,data)
  A = data[:,:,k]; U,S,V=svd(A); B=rsvd(p,q,U,S,V)
  fig=figure(1,figsize=(10,4)); clf()
  (ax1,ax2) = fig.subplots(1,2)
  fig.suptitle("Comparison using modes "*string(p)*" to "*string(q))
  ax1.imshow(P,cmap="gray"); ax1.set_title("Prediction")
  ax2.imshow(B,cmap="gray"); ax2.set_title("Data")
  savefig(hwdir*"/anim/frame"*lpad(string(k),4,"0")*".png")
  err = norm(P-B)/norm(B)
end;

∴	Compare(10,1,10,data,LinPredict)

$0.07692018$

∴	size(data)[3]

$100$

∴

The above implementation returns the prediction error and also plots the prediction and available data. The images are saved to a work directory to construct an animation.

$\circ$ Define a function to plot the prediction error over the available data

∴

function ErrPlot(p,q,data,np,s)
  n=size(data)[3]; err=ones(n)
  for k=3:n
    err[k] = Compare(k,p,q,data,LinPredict)
    sleep(0.25) # Allow plot display during loop
  end
  figure(np); plot(3:n,err[3:n],s);
  title("Prediction error")
  grid("on"); xlabel("Time step"); ylabel(L"\epsilon");
  savefig(hwdir*"/H03Fig06.png");
  return err
end;

∴	np=3; figure(np); clf(); err01to10=ErrPlot(1,10,data,np,".");

∴	err01to05=ErrPlot(1,5,data,np,"xr");

∴	err01to20=ErrPlot(1,20,data,np,"ok");

∴	figure(np); savefig(hwdir*"/H03Fig05.png");

∴	figure(1); Compare(40,1,5,data,LinPredict); savefig(hwdir*"/H03Fig06a.png");

∴	Compare(40,1,10,data,LinPredict); savefig(hwdir*"/H03Fig06b.png");

∴	Compare(40,1,20,data,LinPredict); savefig(hwdir*"/H03Fig06c.png");

∴

Encapsulating the comparison now allows study of error evolution in time and change with respect to number of modes..

Figure 5. Linear extrapolation error, unprocessed images: ( $\times$ ) modes $p = 1$ to $q = 5$ ; ( $•$ ) modes $p = 1$ to $q = 10$ ; ( $•$ ) modes $p = 1$ to $q = 20$ .

Relative error (Fig. 5) between prediction and observed data increases as with increase in the number of SVD modes (Fig. 6). This might seem surprising at first glance, but the comparison is done on raw data. In almost all real-world applications additional pre/post-processing of data is needed to:

ensure identical illumination in the two images;

align and rotate the images to match overall features (aka “image registration”)

determine whether pixel-by-pixel comparison is meaningful as opposed to derived measures: total cloud cover, position of center, etc.

Bonus point project (4 points). Carry out the above operations. Of particular relevance is the recognition of overall structure in data, in this case the spiral pattern of the anti-cyclone. Think of a way to use this knowledge of the expected overall pattern to better compare predictions to data.

Figure 6. Comparison of coarse-grained prediction with data for $5, 10, 20$ modes from top to bottom.

Repeat the above construction of an error plot for local weather patterns $𝑪_{k}, 𝑪_{k - 1}, 𝑪_{k - 2}$ for lesser significance modes from $p$ to $q$ , that lead to prediction

𝑸_{k} = 𝑪_{k - 1} + (𝑪_{k - 1} - 𝑪_{k - 2}) .

Analyze your results to answer the question: “are small-scale storm features predictable?”

Solution. The key word in the problem statement is “repeat”. Defining purpose-specific functions is not only good coding practice, but saves time in studying similar problems. Here, the previously defined functions are simply invoked with new parameters.

$\circ$

Figure 7. Linear extrapolation error, unprocessed images: ( $\times$ ) modes $p = 30$ to $q = 60$ ; ( $•$ ) modes $p = 60$ to $q = 90$ .

The $𝒪 (1)$ relative error (Fig. 7) would at first glance indicate no predictive capability of the linear extrapolation of the fine scale structure. However the error definition is inappropriate:

the error measure is global while the modes reflect structure at smaller spatial scales
illumination and registration has not been performed.

As in the previous case, several additional steps have to be carried out on the fine scale data in order to determine whether linear extrapolation is an efficient predictor. In particular, the smaller spatial extent has to be taken into account. A typical processing sequence would:

Ensure equal illumination and registration
Define a window of smaller size, and construct the mean and variance observed when translating the window over the entire image. This would be a data-extracted model of small-scale features of the hurricane
Use observed mean and variance to construct extrapolated small-scale features.

Repeat the above for combined large ( $𝑩$ ) and small scale ( $𝑪$ ) weather patterns. Experiment with different weights $u, v$ in the linear combination $u 𝑩 + v 𝑪$ , $u + v = 1$ .

Solution.

$\circ$ Define a function to combine both large and small scales into a weighted average

$\circ$ Define a function to compare the combined large/small-scale prediction and data

∴

function wCompare(k,pu,qu,u,pv,qv,v,data,wpredict)
  P=wpredict(k,pu,qu,u,pv,qv,v,data); A=data[:,:,k]
  err = norm(P-A)/norm(A)
  fig=figure(1,figsize=(10,4)); clf()
  (ax1,ax2) = fig.subplots(1,2)
  fig.suptitle("Comparison using modes "*string(pu)*" to "*string(qu)*" u="*string(u)*
               " and "*string(pv)*" to "*string(qv)*" v="*string(v))
  ax1.imshow(P,cmap="gray"); ax1.set_title("Prediction")
  ax2.imshow(A,cmap="gray"); ax2.set_title("Data")
  savefig(hwdir*"/anim/frame"*lpad(string(k),4,"0")*".png")
  return err 
end;

∴	wCompare(10,1,10,0.7,30,60,0.3,data,wLinPredict)

$0.33098787783018163$

∴

The above implementation returns the prediction error and also plots the prediction and available data. The images are saved to a work directory to construct an animation.

$\circ$ Define a function to plot the prediction error over the available data

$\circ$

Figure 8. Linear extrapolation error using large and small modes

Formula (3) expresses a degree-one in time $t$ prediction

𝑩 (t) = 𝑩_{k - 1} + (t - k + 1) (𝑩_{k - 1} - 𝑩_{k - 2}),

evaluated at $t = k$ . Construct a quadratic prediction based upon data $𝑩_{k - 1}, 𝑩_{k - 2}, 𝑩_{k - 3}$ , and compare with degree-one prediction in question 1. Recall that a quadratic can be constructed from knowledge of three points. Again, experiment with various values of $k, p$ .

Solution. Consider $y (t)$ with known values $y_{i} = y (i)$ at $i = 0, 1, 2$ . The quadratic passing through these three points is

y (t) = \frac{1}{2} (t - 1) (t - 2) y_{0} - t (t - 2) y_{1} + \frac{1}{2} t (t - 1) y_{2},

and predicts value

y (3) = y_{0} - 3 y_{1} + 3 y_{2},

hence quadratic predictions are given by

𝑩_{k} = 𝑩_{k - 3} - 3 𝑩_{k - 2} + 3 𝑩_{k - 1} .

$\circ$

Figure 9. (Left) Errors of large scale quadratic prediction; (Right) Comparison.

Errors for quadratic extrapolation (Fig. 9) are higher than those for linear extrapolation (Fig. 5). In addition to the errors discussed previously (illumination, registration), higher order extrapolation generally exhibits higher error (to be discussed during presentation of polynomial interpolation).

Carrying the error computation, say for the large scale modes, is easily done using previously defined functions and a new quadratic prediction function

∴

function QuadPredict(k,p,q,data)
  A1 = data[:,:,k-1]; U1,S1,V1=svd(A1); B1=rsvd(p,q,U1,S1,V1)
  A2 = data[:,:,k-2]; U2,S2,V2=svd(A2); B2=rsvd(p,q,U2,S2,V2)
  A3 = data[:,:,k-3]; U3,S3,V3=svd(A3); B3=rsvd(p,q,U3,S3,V3)
  P = B3 + 3*(B1-B2)
end;

∴

function ErrPlot(p,q,data,np,s)
  n=size(data)[3]; err=ones(n)
  for k=4:n
    err[k] = Compare(k,p,q,data,QuadPredict)
    sleep(0.25) # Allow plot display during loop
  end
  figure(np); plot(4:n,err[4:n],s);
  title("Prediction error")
  grid("on"); xlabel("Time step"); ylabel(L"\epsilon");
  savefig(hwdir*"/H03Fig09.png");
  return err
end;

∴	np=3; figure(np); clf(); err=ErrPlot(1,10,data,np,".");

∴

$0.12888846$

∴

3Track 2 questions

Use the SVD to investigate how familiar properties for $a \in ℝ$ might extend to matrices $𝑨 \in ℝ^{m \times m}$ . For example, on the real axis adding $- a$ to $a$ results in the null element, $a - a = 0$ , so we say $a$ is a distance $| a |$ from zero. This easily generalizes to $ℝ^{m}$ , and the minimal modification of $𝒂 \in ℝ^{m}$ to obtain zero is $- 𝒂$ , $𝒂 + (- 𝒂) = 𝟎$ , and $𝒂$ is distance $|| 𝒂 ||$ away from zero. Consider now matrices $𝑨 \in ℝ^{m \times m}$ , with $rank (𝑨) = m$ . How far is this matrix from “zero”? By “zero” we shall understand a singular matrix with $rank (𝑨) < m$ .

All real numbers $x \in ℝ$ can be obtained as the limit of a sequence of rationals ${p_{n} / q_{n}}$ , $p_{n}, q_{n} \in ℤ$ . We say that the rationals $ℚ$ is a dense subset of $ℝ$ . What about matrix spaces?

Find $𝑿$ of minimal norm such that $𝑨 + 𝑿$ is singular. State how far $𝑨$ is from “zero”?

Solution. Seek insight from $m = 1$ for which the problem is to find $x$ of minimal norm such that $a + x = 0$ . From

$0 = | a + x | ⩾ | a | - | x | \Rightarrow | x | ⩾ | a | and 0 = | x + a | ⩾ | x | - | a | \Rightarrow | x | ⩽ | a |$

deduce that $| x | = | a |$ and the solution is $x = - a$ . Recall that $|| 𝑨 ||$ (the induced 2-norm in the context of an SVD), the maximal amplication factor along any direction within $ℝ^{m}$ , becomes $| a |$ , interpreted as the amplification factor when $m = 1$ . What changes when $m > 1$ ? There are now different amplification factors of $𝑨$ along different directions. The maximum amplification factor (i.e., the matrix norm) is given by the largest singular value

$σ_{1} = || 𝑨 || = {sup}_{|| 𝒖 || = 1} || 𝑨 𝒖 ||, and || 𝑨 𝒖 || ⩽ σ_{1} || 𝒖 || .$

There is now also a minimal amplification factor, the smallest singular value

$σ_{m} = {inf}_{|| 𝒖 || = 1} || 𝑨 𝒖 ||, and || 𝑨 𝒖 || ⩾ σ_{m} || 𝒖 || .$

The matrix $𝑨$ is singular if $σ_{m} = 0$ , stating that along at least one direction $𝒗$ the scaling factor is null, $𝑨 𝒗 = 𝟎$ , in which case $𝑿 = 𝟎$ is of minimal norm and trivially satisfies the problem conditions. Assume now that $𝑨$ is not singular, $σ_{m} > 0$ , and investigate the smallest amplification factor of $𝑨 + 𝑿$ , seeking a lower bound for $|| (𝑨 + 𝑿) 𝒖 ||$ along some arbitrary direction $𝒖$ ,

$|| (𝑨 + 𝑿) 𝒖 || = || 𝑨 𝒖 + 𝑿 𝒖 || ⩾ || 𝑨 𝒖 || - || 𝑿 𝒖 || ⩾ (σ_{m} - || 𝑿 ||) || 𝒖 || .$

The above states that if $|| 𝑿 || < σ_{m}$ then $|| (𝑨 + 𝑿) 𝒖 ||$ >0 for $|| 𝒖 || = 1$ and $𝑨 + 𝑿$ is not singular. Hence, in order for $𝑨 + 𝑿$ to be singular the condition $|| 𝑿 || ⩾ σ_{m}$ must hold. Use the SVD of $𝑨 = 𝑼 𝚺 𝑽^{T}$ and construct

$𝑿 = 𝑼 diag (0, \dots, 0, - σ_{m}) 𝑽^{T}$

that makes the sum $𝑨 + 𝑿$ singular, and $|| 𝑿 || = σ_{m}$ , the minimal allowed norm.
Prove that any matrix in $ℝ^{m \times m}$ is the limit of a sequence of matrices of full rank, i.e., the set of full-rank matrices is a dense subset of $ℝ^{m \times m}$ .

Solution. The SVD of $𝑨$ of rank $r$ is $𝑨 = 𝑼 𝚺 𝑽^{T}$ with $𝚺 = diag (σ_{1}, σ_{2}, \dots, σ_{r}, 0, \dots, 0)$ . Define sequences ${s_{n}^{(j)}}_{n \in ℕ}$ such that

${lim}_{n \to \infty} s_{n}^{(j)} = σ_{j}, e . g .$

$s_{n}^{(j)} = \frac{n + 1}{n} σ_{j} for j ⩽ r, s_{n}^{(j)} = \frac{n + 1}{n^{2}} for j > r .$

Then $𝑨_{n} = 𝑼 𝑺_{n} 𝑽^{T}$ with $𝑺_{n} = diag (s_{n}^{(1)}, s_{n}^{(2)}, \dots s_{n}^{(m)})$ are of full rank and ${lim}_{n \to \infty} 𝑨_{n} = 𝑨$ .