"Maymester MATH547 Linear Algebra for Applications in Data Science"

1.MATH547 Homework 2

Topic:	Math@UNC environment
Post date:	May 18, 2020
Due date:	May 19, 2020

1.1.Background

This homework investigates the matrix fundamental spaces, and the concept of linear dependence and independence.

1.2.Theoretical questions

Is the zero vector a linear combination of any non-empty set of vectors?

Answer. Yes, $𝒱 = (V, S, +, \cdot)$ , $𝒗 \in V$ , $0 \in S$ since $S$ a field, $0 \cdot 𝒗 = 𝟎$ , by vector space properties.
If $S \subseteq V$ , with $V$ a vector space then span( $S$ ) equals the intersection of all subspaces of $V$ that contain $S$ .

Answer. Let $I$ denote the intersection of all subspaces of $V$ that contain $S$ . Prove $span (S) = I$ by double inclusion:
1. Consider $𝒖 \in span (S)$ , and let $U$ be an arbitrary subspace of $V$ that contains $S$ . A subspace must be closed, hence $𝒖 \in U$ . Since $𝒖, U$ are arbitrary, $span (S) \subseteq I$ .
2. Establish $I \subseteq span (S)$ by contradiction. Assume $𝒖 \in I$ , but $𝒖 \notin span (S)$ . If $𝒖 \in I$ , it must belong to all vector subpaces of $V$ , including the $span (S)$ subspace, hence $𝒖 \in span (S)$ , contradiction. Conclude that if $𝒖 \in I$ it follows that $𝒖 \in span (S)$ , hence $I \subseteq span (S)$ .
Can a vector space have more than one basis?

Answer. Yes. In $(ℝ, ℝ, +, \cdot)$ both ${1}$ and ${- 1}$ are bases.
Must a vector space have a finite basis?

Answer. No, the vector space of functions that can be differentiated an arbitrary number of times is not of finite dimension.

1.3.Fundamental spaces and linear dependence in image processing

1.3.1.Data input

Images are often processed using techniques from linear algebra. In this assignment a database of facial images from MIT will be used to investigate the fundamental vector subspaces associated with a matrix (linear mapping) and concepts of linear dependence. The database is available in a format readily loaded into Octave, with gray-scale images stored as column vectors of a matrix $𝑨 \in ℝ^{m \times n} .$ There are $n$ images, each with $m$ pixels, and the two-dimensional image size in pixels is $p_{x} \times p_{y}$ . The images have been pre-processed to remove non-uniform background illumination, noise, off-centering, different face size, and also been scaled such that the norm of each column vector is equal to one. The resulting images represent common facial features, but may seem distant from the much more detailed processing of visual information that leads a human to recognize a face. After loading the database, a directory is created for this assignment and set as the current directory

octave]

cd /home/student/courses/MATH547ML/data/faces; load faces

octave]

[m,n]=size(A); px=floor(sqrt(m)); py=m/px; disp([m n px py])

16384 99 128 128

octave]

cd /home/student/courses/MATH547ML;

octave]

mkdir homework; cd homework; mkdir hw02; cd hw02

octave]

pwd

ans = /home/student/courses/MATH547ML/homework/hw02

octave]

1.3.2.Utility functions

Define functions to display a facial image, and save an image to a file.

octave]

function shwface(a,px,py)
  im=reshape(-a,px,py)'; colormap(gray);
  imagesc(im);
end

octave]

shwface(A(:,1),px,py)

octave]

function savface(a,px,py,fname)
  im=reshape(-a,px,py)'; colormap(gray);
  imagesc(im); print(fname,"-deps");
end

octave]

savface(A(:,1),px,py,"face1");

octave]

savface(A(:,2),px,py,"face2");

octave]

savface(A(:,3),px,py,"face3");

octave]

Figure 1.

1.3.3.Questions to investigate

Functionals to assess data variability.

Define an average of the data

𝒃 = \frac{1}{n} \sum_{j = 1}^{n} 𝒂_{j}

, with

𝑨 = [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}]

. Compute the norms

u_{j} = {|| 𝒂_{j} - 𝒃 ||}_{p}

and angle cosines

c_{j} = 𝒃^{T} 𝒂_{j} / || 𝒃 ||

(recall that preprocessing ensured

|| 𝒂_{j} || = 1

). Plot the results and comment on information provided by different norms. Assess data variability.

Solution. Compute the average image

octave]

b=mean(A')'; shwface(b,px,py);

octave]

u=zeros(n,4);

octave]

for i=1:n
  u(i,1) = norm(A(:,i)-b,1)/norm(b,1);
  u(i,2) = norm(A(:,i)-b,2)/norm(b,2);
  u(i,3) = norm(A(:,i)-b,3)/norm(b,3);
  u(i,4) = norm(A(:,i)-b,Inf)/norm(b,Inf);
end

octave]

idx=1:n; plot(idx,u(:,1),'k',idx,u(:,2),'r',idx,u(:,3),'b',idx,u(:,4),'g')

octave]

print -depsc Q1Fig1.eps

octave]

c = b'*A/norm(b);

octave]

figure(2); plot(idx,c); print -depsc Q1Fig2.eps

octave]

In all norms $|| 𝒂_{j} - 𝒃 || ≅ 10$ while $|| 𝒂_{j} || = 1$ , so the deviation from the average is large. Sice $c_{j} ≅ 0.1$ the cosine is almost constant. Deduce that the images are concentrated on the intersection of a hypercone of vertex angle $θ ≅ arc \cos 0.1$ with a hypersphere of radius $r ≅ 10$ .

Figure 2.

Functional to assess data redundancy.

If vectors

𝒙, 𝒚 \in ℝ^{m}

are colinear

𝒙^{T} 𝒚 / || 𝒙 || / || 𝒚 || = \pm 1

, one vector can be recovered from the other through a scaling operation

𝒙 = a 𝒚

, and the data is redundant. If the vectors are orthogonal,

𝒙^{T} 𝒚 = 0

, data in

𝒙

is independent of data in

𝒚

. Assess whether facial data in

𝑨

is independent by computing the angle cosines

s_{j k} = 𝒂_{j}^{T} 𝒂_{k}

Solution. $𝑺 = 𝑨^{T} 𝑨$ contains all the scalar products $s_{j k}$ . Contour lines of $s (j, k) = 𝒂_{j}^{T} 𝒂_{k}$ (Fig. 3) shows that the column vectors of $𝑨$ are close to orthogonal since the off-diagonal elements are very small compared to the diagonal elements. hence there is no redundancy in the data.

octave]

S=A'*A;

octave]

figure(1); surf(S)

octave]

figure(2); contour(S); print -depsc Q2Fig1.eps

octave]

[S(10,10) S(10,20) S(10,30)]

ans = 1.00000134838 -0.00000026211 -0.00000039673

octave]

min(diag(S))

ans = 1.00000

octave]

Figure 3. Scalar product of data vectors indicate linear independence; no redundancy

Column space to assess data sampling.

The Octave orth function returns an orthognal set of vectors that span the column space of a matrix given as its argument. Display a few such vectors and comment on their utility by comparison to the column vectors of

𝑨

Solution. Apply orth to the first 8 images in $𝑨$ to obtain $𝑩$ , and display first 3 columns of each (Fig. 4). Both $𝑨, 𝑩$ are equally useful in face identification since $𝑨$ was already close to orthogonal (previous question).

octave]

B=orth(A(:,1:8));

octave]

figure(1); shwface(A(:,1),128,128); print -depsc Q3Fig1a.eps

octave]

figure(2); shwface(B(:,1),128,128); print -depsc Q3Fig1b.eps

octave]

figure(1); shwface(A(:,2),128,128); print -depsc Q3Fig2a.eps

octave]

figure(2); shwface(B(:,2),128,128); print -depsc Q3Fig2b.eps

octave]

figure(1); shwface(A(:,3),128,128); print -depsc Q3Fig3a.eps

octave]

figure(2); shwface(B(:,3),128,128); print -depsc Q3Fig3b.eps

octave]

Figure 4. Images in $𝑨$ (top row), compared to images in $𝑩$ bottom row.

The same conclusion arises from sampling every other pixel in the horizontal and vertical directions to obtain $64 \times 64$ images, called subsampling (the shwface displays the image twice in this case)

octave]

B=orth(A(1:4:m,1:8));

octave]

figure(1); shwface(A(1:4:m,1),64,64); print -depsc Q3Fig4a.eps

octave]

figure(2); shwface(B(:,1),64,64); print -depsc Q3Fig4b.eps

octave]

figure(1); shwface(A(1:4:m,2),64,64); print -depsc Q3Fig5a.eps

octave]

figure(2); shwface(B(:,2),64,64); print -depsc Q3Fig5b.eps

octave]

figure(1); shwface(A(1:4:m,3),64,64); print -depsc Q3Fig6a.eps

octave]

figure(2); shwface(B(:,3),64,64); print -depsc Q3Fig6b.eps

octave]

Figure 5. Images in $𝑨$ (top row), compared to images in $𝑩$ bottom row.

Left null space to assess missing data.

The Octave null function returns an orthognal set of vectors that span the null space of a matrix given as its argument. Display a few vectors of

N (𝑨^{T})

and comment on what data cannot be obtained by linear combination of columns of

𝑨

Solution. Compute $N (𝑨^{T})$ on the $64 \times 64$ images obtained by subsampling, and display a few columns (Fig. 6). Notice that the vectors in the left null space contain isolated non-zero components. Linear combination of the facila images would not be able to represent such isolated pixels.

octave]

N=null(A(1:4:m,1:8)');

octave]

figure(1); shwface(N(:,1),64,64);  print -depsc Q4Fig1a.eps

octave]

figure(1); shwface(N(:,31),64,64);  print -depsc Q4Fig1b.eps

octave]

figure(1); shwface(N(:,61),64,64);  print -depsc Q4Fig1c.eps

octave]

Figure 6. Elements of $N (𝑨^{T})$

Linear combinations.

Define a few new faces from linear combinations of

p = 2, 3, \dots, 8

columns of

𝑨

Solution. Define scaling coefficients that sum to 1 for each $p$ value, find the linear combination, display the image.

octave]

x=[0.5; 0.5]; b=A(:,1:2)*x;
shwface(b,128,128); print -depsc Q5Fig1.eps

octave]

x=[0.2; 0.3; 0.5]; b=A(:,1:3)*x;
shwface(b,128,128); print -depsc Q5Fig2.eps

octave]

x=[0.1; 0.2; 0.2; 0.5]; b=A(:,1:4)*x;
shwface(b,128,128); print -depsc Q5Fig3.eps

octave]

x=[0.1; 0.2; 0.2; 0.2; 0.3]; b=A(:,1:5)*x;
shwface(b,128,128); print -depsc Q5Fig4.eps

octave]

x=[0.1; 0.2; 0.2; 0.2; 0.2; 0.1]; b=A(:,1:6)*x;
shwface(b,128,128); print -depsc Q5Fig5.eps

octave]

x=[0.1; 0.2; 0.2; 0.2; 0.1; 0.1; 0.1]; b=A(:,1:7)*x;
shwface(b,128,128); print -depsc Q5Fig6.eps

octave]

x=[0.1; 0.1; 0.1; 0.2; 0.2; 0.1; 0.1; 0.1]; b=A(:,1:8)*x;
shwface(b,128,128); print -depsc Q5Fig7.eps

octave]

Figure 7.

Linear dependence.

Is the facial data within

𝑨

linearly dependent or independent?

Solution. From the computation of the scalar products, $𝑺 = 𝑨^{T} 𝑨$ with $𝑺$ close to diagonal, deduce that images within $𝑨$ are linearly independent.