"Maymester MATH547 Linear Algebra for Applications in Data Science"

1.MATH547 Homework 3

Topic:	Math@UNC environment
Post date:	May 20, 2020
Due date:	May 21, 2020

1.1.Background

This homework investigates consequences of the fundamental theorem of algebra and application of the singular value decomposition.

1.2.Theoretical questions

Consider a linear mapping $𝒇 : U \to V$ , from vector space $𝒰 = (U, ℝ, +, \cdot)$ with basis ${𝒖_{1}, \dots, 𝒖_{n}}$ , to $𝒱 = (V, ℝ, +, \cdot)$ , with basis ${𝒗_{1}, \dots, 𝒗_{m}}$ .

Is ${𝒇 (𝒖_{1}), \dots, 𝒇 (𝒖_{n})}$ a basis for $𝒱$ ?

Solution. Not necessarily, for example if $𝒇 (𝒖_{j}) = 𝟎$ .
If $nullity (𝒇) = 0$ must $m = n$ ?

Solution. No. Let $𝑨 \in ℝ^{m \times n}$ be matrix associated with $𝒇$ . If $nullity (𝒇) = 0$ then $N (𝑨) = {𝟎}$ , and the only solution to $𝒇 (𝒙) = 𝟎$ is $𝒙 = 𝟎$ . Consider $𝒇 : ℝ \to ℝ^{2}$ , $𝒇 (x) = {[\begin{array}{cc} x & 2 x \end{array}]}^{T} = 𝑨 x$ , with $𝑨 = {[\begin{array}{cc} 1 & 2 \end{array}]}^{T}$ , and $m = 1$ , $n = 2$ , $m \neq n$ .
If $m = n$ and $𝒖_{i} = 𝒗_{i}$ for $i = 1, \dots, m$ , what is the matrix $𝑨$ representing $𝒇$ ?

Solution. The identity matrix $𝑨 = 𝑰$ . Apply the linear mapping $𝒗 = 𝒇 (𝒖) = 𝑨 𝒖$ to the basis vectors ${𝒖_{1}, \dots, 𝒖_{m}}$ , to obtain
$𝑽 = [\begin{array}{cccc} 𝒗_{1} & 𝒗_{2} & \dots & 𝒗_{m} \end{array}] = 𝑨 𝑼 = [\begin{array}{cccc} 𝑨 𝒖_{1} & 𝑨 𝒖_{2} & \dots & 𝑨 𝒖_{m} \end{array}] .$
Since $𝑼$ is a basis, the column vectors are linearly independent and the matrix is invertible,
$𝑨 = 𝑽 𝑼^{- 1} .$
The statement $𝒖_{i} = 𝒗_{i}$ can be interpreted in one of two ways (these descriptions are said to be duals of one another, and distinguish between a vector as a geometrical entity and its coordinates in some given basis):
1. $𝒖_{i}$ is a vector that contains the coordinates of the $i^{th}$ basis vector of $U$ in the $𝑰$ basis, $𝒖_{i} = 𝑰 𝒖_{i}$ . Similarly, $𝒗_{i}$ is a vector that contains the coordinates of the $i^{th}$ basis vector of $V$ in the $𝑰$ basis, $𝒗_{i} = 𝑰 𝒗_{i}$ . In this case $𝒖_{i} = 𝒗_{i}$ implies $𝑼 = 𝑽$ , and $𝑨 = 𝑰$ . This interpretation was emphasized in this course, and is appropriate for $U = V = ℝ^{m}$ .
2. $𝒖_{i}$ is the $i^{th}$ basis vector of $U$ , whose coordinates in that basis are $𝒃_{i}$ given by $𝒖_{i} = 𝑼 𝒃_{i}$ . Similarly $𝒗_{i} = 𝑽 𝒄_{i}$ . The statement $𝒖_{i} = 𝒗_{i}$ now implies $𝑼 𝒃_{i} = 𝑽 𝒄_{i}$ , such that
  $𝒄_{i} = 𝑽^{- 1} 𝑼 𝒃_{i}$
  and the matrix associated with the mapping would be $𝑨 = 𝑽^{- 1} 𝑼$ . This interpretation is appropriate when mapping between different vector spaces of the same dimension, $U \neq V$ , $\dim (U) = \dim (V)$ .
Determine the singular value decomposition and pseudo-inverse of a matrix $𝑨 \in ℝ^{1 \times n}$ (i.e., a row vector).

Solution. The SVD is $𝑨 = 𝑼 𝚺 𝑽^{T}$ , $𝑼 \in ℝ^{m \times m}$ orthogonal, $𝑽 \in ℝ^{n \times n}$ orthogonal, $𝚺 \in ℝ_{+}^{m \times n}$ diagonal. For $𝑨$ a row vector, $m = 1$ , and
$𝚺 = [\begin{array}{cccc} σ_{1} & 0 & \dots & 0 \end{array}]$
with $σ_{1} = {|| 𝑨 ||}_{2}$ . By definition of the matrix $2$ -norm
$σ_{1} = {sup}_{|| 𝒙 || = 1} || 𝑨 𝒙 || .$
Since $𝑨 = 𝒂^{T}$ ,
$𝑨 𝒙 = 𝒂^{T} 𝒙 = \sum_{i = 1}^{n} a_{i} x_{i}, \sum_{i = 1}^{n} x_{i}^{2} = 1 .$
The largest possible value of $|| 𝑨 𝒙 ||$ when $|| 𝒙 || = 1$ , is $σ_{1} = || 𝑨 || = {max}_{1 ⩽ i ⩽ n} | a_{i} |$ . The SVD is
$𝑨 = [1] [\begin{array}{cccc} σ_{1} & 0 & \dots & 0 \end{array}] [\begin{array}{c} \frac{𝒂^{T}}{σ_{1}} \\ 𝑽_{n - 1}^{T} \end{array}]$
with $𝑽_{n - 1}^{T} 𝒂 = 0$ , $𝑽_{n - 1}^{T} 𝑽_{n - 1} = 𝑰_{n - 1}$ (orthogonal).

1.3.Ordered bases for the fundamental spaces and painting motifs

The fudamental theorem of linear algebra partitions the domain and codomain of a linear mapping. The singular value decomposition provides orthogonal bases for each of the subspaces arising in the partition. The bases are ordered according to the amplification behavior of the linear mapping, expressed through the norm of successive restrictions of the mapping. This approach is closely aligned with typical problems in data science, and can be used in a variety of scenarios. In this homework linear algebra methods will first be used in a field far removed from the physical sciences: extracting the quirks of painter style from the overall composition of a painting, and applying one artist's style to another artist's composition. This is often-encountered data science problem: distinguishing between small and large scale features of data.

First steps in solving the homework questions will be carried out in class. Each of the following subsections is a homework question, with 1 grade point awarded for a correct solution. Here are some initial Octave instructions to define a directory to save images into and define a function to read an image.

octave]

cd homework; mkdir hw03; cd hw03

octave]

im=imread("/home/student/courses/MATH547ML/data/paintings/Andy_Warhol_127.jpg");

octave]

imshow(im)

octave]

function im=pread(name)
  im=imread(strcat("/home/student/courses/MATH547ML/data/paintings/",name));
  im=rgb2gray(im);
end

octave]

im=pread("Andy_Warhol_127.jpg");

octave]

imshow(im); size(im)

ans = 360 357

octave]

1.3.1.Data input mappings

Define a linear mapping that rescales data within an image file to some specified size $p_{x} \times p_{y}$ . Determine whether the mapping is data-preserving, and if not, quantify the amount of data loss.

Solution. One such linear mapping is to simply take a portion of the image. If the initial image was $q_{x} \times q_{y}$ the data loss is $q_{x} \times q_{y} - p_{x} \times p_{y}$ .

octave]

function im1=psize(im0,px,py)
  [mx,my]=size(im0);
  cx=floor(mx/2); cy=floor(my/2);
  px2=floor(px/2); py2=floor(py/2);
  im1 = im0(cx-px2:cx+px2,cy-py2:cy+py2);
end

octave]

imshow(psize(im,256,256))

octave]

imshow(psize(pread("Vincent_van_Gogh_167.jpg"),256,256))

octave]

1.3.2.Images as data: change of basis

Take the largest possible portion of a painting of size $p \times p$ with $p = 2^{q}$ . Interpret the resulting image as a vector $𝒃 \in ℝ^{m}$ with $m = p^{2} = 2^{2 q}$ . The image is thus specified as a linear combination of the columns of the identity matrix $𝑰 \in ℝ^{m \times m}$ ,

𝒃 = 𝑰 𝒃 = b_{1} 𝒆_{1} + b_{2} 𝒆_{2} + \dots + b_{m} 𝒆_{m},

and describes illumination on a pixel-by-pixel basis. A column vector of $𝑰$ can be interpreted as the binary base representation of a natural number from the set $P = {1, 2, 3, \dots, 2^{m}}$ , namely $2^{j - 1} \overset{g}{\to} 𝒆_{j}$ .

𝒆_{j}^{T} = [\begin{array}{ccccccc} 0 & \dots & 0 & 1 & 0 & \dots & 0 \end{array}] \in ℝ^{m}

n_{j} = 00 \dots 010 \dots 0_{2} = 2^{j - 1}

Note that is one choice among the possible combinations of $m$ objects chosen from a set with $2^{m}$ elements, $C (2^{m}, m)$ . Even for small $m$ , the number of choices is enormous

C (2^{m}, m) = \frac{2^{m}!}{m! (2^{m} - m)!}, m = 16 \Rightarrow C (2^{16}, 16) = \frac{2^{16}!}{16! (2^{16} - 16)!} ≅ 5.5 \times 10^{63} .

Construct a new basis by random choice of numbers from $P$ and the $g$ mapping. Check if the basis is orthogonal.

Solution. Let $M$ be the largest representable integer on a computer. If $M < m = p^{2}$ then the binary number representation would not construct a basis since the vector $𝒖$ associated with binary number $B_{𝒖} > M$ is not within span ${𝒗_{1}, \dots, 𝒗_{m}}$ associated with binary numbers ${B_{1}, \dots, B_{m}}$ , all less than $M$ , $B_{i} ⩽ M$ . Choose $m < M$ . Orthgonality of vectors associated with binary numbers $B_{i}, B_{j}$ corresponds to a zero result from the bitwise and operation.

octave]

m=4*4;

octave]

B=randi(flintmax()-1,m,1);

octave]

C=zeros(m,m);
for i=1:m
  for j=1:m
    C(i,j)=bitand(B(i),B(j));
  end;
end

octave]

max(max(C))

ans = 8.9756e+15

octave]

Linear dependence is the same as equality in this binary representation.

octave]

D=zeros(m,m);
for i=1:m
  for j=i+1:m
    D(i,j) = B(i)==B(j);
  end;
end

octave]

max(max(D))

ans = 0

octave]

From above, a non-orthogonal basis is obtained.

1.3.3.Images as data: positive checkerboard basis

Construct another basis that corresponds to successive halving of image regions by factors $(2^{k}, 2^{l})$ along the horizontal and vertical dimensions. Are basis vectors orthogonal?

Solution. There are many possible approaches to this problem. In the following a full solution is presented highlighting that application of linear algebra concepts to practical data science problems usually involves non-trivial preprocessing of the data. Since the solution does involve considerable technique from outside of linear algebra full credit is awarded to all attempts of this problem.

First consider the simplest possible case. A $1 \times 1$ image cannot be halved, hence start with $2 \times 2$ images for which

\begin{array}{cc} 1 & 0 \\ 1 & 0 \end{array}, \begin{array}{cc} 0 & 1 \\ 0 & 1 \end{array}, \begin{array}{cc} 1 & 1 \\ 0 & 0 \end{array}, \begin{array}{cc} 0 & 0 \\ 1 & 1 \end{array}, \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}, \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array}

give 6 possible patterns encoded by row vectors

𝑩^{T} = [\begin{array}{cccc} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \end{array}] .

Only four basis vectors are required and can be chosen as

𝑪 = [\begin{array}{cccc} 1 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 \end{array}] .

octave]

B=[1 0 1 0; 0 1 0 1; 1 1 0 0; 0 0 1 1; 1 0 0 1; 0 1 1 0];
C=[1 0 1 0; 1 1 0 0; 0 1 1 0; 0 0 1 1];

octave]

[rank(B) rank(C)]

ans = 4 4

octave]

The pattern suggested by above is alternating groups of $n$ 1's and $n$ 0's, that are then shifted right by $1, \dots, n$ places. Define a function to generate alternating 1,0 bit groupings for images of size $m = p \times p$ , $p = 2^{q}$ . There are $2^{2 q - 1}$ such numbers.

octave]

q=2; p=2^q; m=p^2; disp([q p m])

2 4 16

octave]

g=zeros(2^(2*q-1),1); o=2^m-1;
for i=0:q
  k=2^i; l=2^k; disp([k l]);
  g(i+1) = bitshift(l-1,2^i);
  dec2bin(g(i+1))
end

1 2 ans = 10 2 4 ans = 1100 4 16 ans = 11110000

octave]

1.3.4.Images as data: mixed-sign checkerboard basis

Map the binary digits ${0, 1}$ from the positive checkerboard basis to integers ${- 1, 1}$ . Check if the newly obtained basis is orthogonal. Display approximations of the image that result from the first $2^{l}$ basis vectors, $l = 2 q, 2 q - 2, 2 q - 4, 2 q - 6, 2 q - 8$ .

Solution. As above, s.ince the solution does involve considerable technique from outside of linear algebra full credit is awarded to all attempts of this problem.

1.3.5.Images as mappings

Alternatively, an image can be interpreted as a matrix $𝑨$ , hence a mapping. From

𝑨 = 𝑨 𝑰 = 𝑨 [\begin{array}{cccc} 𝒆_{1} & 𝒆_{2} & \dots & 𝒆_{m} \end{array}] = [\begin{array}{cccc} 𝑨 𝒆_{1} & 𝑨 𝒆_{2} & \dots & 𝑨 𝒆_{m} \end{array}],

the image can be interpreted as the transformation of the image encoded by $𝑰$ . Denote by $𝑨_{k}$ the $k^{th}$ -rank approximation of $𝑨$ from the singular value decomposition $𝑨 = 𝑼 𝚺 𝑽^{T}$

𝑨_{k} = \sum_{l = 1}^{k} σ_{l} 𝒖_{l} 𝒗_{l}^{T} .

Display the images that correspond to $k = m, m / 2, m / 4, m / 8$ .

Solution. Construct and represent each $𝑨_{k}$ .

octave]

function im=svdim(S,U,V,p)
  im=S(1,1)*U(:,1)*V(:,1)';
  for k=2:p
    im=im+S(k,k)*U(:,k)*V(:,k)';
  end;
end

octave]

figure(3); clf; imshow(im)

octave]

[U,S,V]=svd(im,1);

octave]

figure(2); plot(log10(diag(S)),'o');

octave]

figure(1); clf; imagesc(svdim(S,U,V,10)); colormap(gray); print -deps HW03Fig01.eps

octave]

figure(1); clf; imagesc(svdim(S,U,V,20)); colormap(gray); print -deps HW03Fig02.eps

octave]

figure(1); clf; imagesc(svdim(S,U,V,40)); colormap(gray); print -deps HW03Fig03.eps

octave]

Figure 1. Successive SVD approximations of an image with $k = 10, 20, 40$ rank-one updates.

1.3.6.Extracting and applying motifs

Consider images from two different artists, $𝑨, 𝑩$ and their singular value decompositions

𝑨 = 𝑺 𝚲 𝑻^{T}, 𝑩 = 𝑼 𝚺 𝑽^{T} .

Let $q = rank (𝑨)$ , $r = rank (𝑩)$ . Construct and display images that take the large scale features from $𝑨$ combined with small scale features from $𝑩$ ,

𝑪 = \sum_{l = 1}^{min (k, q)} λ_{l} 𝒔_{l} 𝒕_{l}^{T} + \sum_{l = max (1, r - k)}^{r} σ_{l} 𝒖_{l} 𝒗_{l}^{T},

for $k = m / 2, m / 4, m / 8, m / 16$ where $r = rank (𝑩)$ .

Solution. Choose first $40$ modes from the composition image, and mmodes 50 to 60 from the style image.

octave]

ims=pread("Vincent_van_Gogh_50.jpg");

octave]

figure(4); imshow(ims)

octave]

imC=psize(im,256,256); imS=psize(ims,256,256);

octave]

figure(1); clf; imshow(imC); print -deps Hw03Fig04.eps;

octave]

figure(2); clf; imshow(imS); print -deps HW03Fig05.eps;

octave]

[Uc,Sc,Vc]=svd(imC,1); [Us,Ss,Vs]=svd(imS,1);

octave]

function im=svdimpq(S,U,V,p,q)
  im=S(p,p)*U(:,p)*V(:,p)';
  for k=p+1:q
    im=im+S(k,k)*U(:,k)*V(:,k)';
  end;
end

octave]

imCS = svdim(Sc,Uc,Vc,40) + svdimpq(Ss,Us,Vs,50,60);

octave]

figure(5); clf; imagesc(imCS); colormap(gray); print -deps Hw03Fig06.eps;

octave]

Figure 2. Applying the style of Van Gogh to a work by Warhol.