Lecture 7: The Singular Value Decomposition

1.Mappings as data

1.1.Vector spaces of mappings and matrix representations

A vector space can be formed from all linear mappings from the vector space 𝒰=(U,S,+,) to another vector space 𝒱=(V,S,+,)

={L,S,+,}, L={𝒇|𝒇:UV,𝒇(a𝒖+b𝒗)=af(𝒖)+bf(𝒗).},

with addition and scaling of linear mappings defined by (𝒇+𝒈)(𝒖)=𝒇(𝒖)+𝒈(𝒖) and (a𝒇)(𝒖)=a𝒇(𝒖). Let B={𝒖1,𝒖2,} denote a basis for the domain U of linear mappings within , such that the linear mapping 𝒇 is represented by the matrix

𝑨=[ 𝒇(𝒖1) 𝒇(𝒖2) ].

When the domain and codomain are the real vector spaces U=n, V=m, the above is a standard matrix of real numbers, 𝑨m×n. For linear mappings between infinite dimensional vector spaces, the matrix is understood in a generalized sense to contain an infinite number of columns that are elements of the codomain V. For example, the indefinite integral is a linear mapping between the vector space of functions that allow differentiation to any order,

:𝒞𝒞 v(t)=u(t)dt

and for the monomial basis B={1,t,t2,}, is represented by the generalized matrix

𝑨=[ t t2 t3 ].

Truncation of the MacLaurin series u(t)=j=1ujtj, with uj=u(j)(0)/j! to n terms, and sampling of u𝒞 at points t1,,tm, forms a standard matrix of real numbers

𝑨=[ 𝒕 𝒕2 𝒕3 ]m×n, 𝒕j=[ t1j tmj ] .

Values of function u𝒞 at t1,,tm are approximated by

𝒖=𝑩𝒙=[ u(t1) u(tm) ]T,

with 𝒙 denoting the coordinates of 𝒖 in basis 𝑩. The above argument states that the coordinates 𝒚 of 𝒗, the primitive of 𝒖 are given by

𝒚=𝑨𝒙,

as can be indeed verified through term-by-term integration of the MacLaurin series.

As to be expected, matrices can also be organized as vector space , which is essentially the representation of the associated vector space of linear mappings,

=(M,S,+,) M={𝑨|𝑨=[ 𝒇(𝒖1) 𝒇(𝒖2) ].} .

The addition 𝑪=𝑨+𝑩 and scaling 𝑺=a𝑹 of matrices is given in terms of the matrix components by

cij=aij+bij,sij=arij.

1.2.Measurement of mappings

From the above it is apparent that linear mappings and matrices can also be considered as data, and a first step in analysis of such data is definition of functionals that would attach a single scalar label to each linear mapping of matrix. Of particular interest is the definition of a norm functional that characterizes in an appropriate sense the size of a linear mapping.

Consider first the case of finite matrices with real components 𝑨m×n that represent linear mappings between real vector spaces 𝒇:mn. The columns 𝒂1,,𝒂n of 𝑨m×n could be placed into a single column vector 𝒄 with mn components

𝒄=[ 𝒂1 𝒂n ].

Subsequently the norm of the matrix 𝑨 could be defined as the norm of the vector 𝒄. An example of this approach is the Frobenius norm

||𝑨||F=||𝒄||2=(i=1mj=1n|aij|2)1/2.

A drawback of the above approach is that the structure of the matrix and its close relationship to a linear mapping is lost. A more useful characterization of the size of a mapping is to consider the amplification behavior of linear mapping. The motivation is readily understood starting from linear mappings between the reals f:, that are of the form f(x)=ax. When given an argument of unit magnitude |x|=1, the mapping returns a real number with magnitude |a|. For mappings 𝒇:22 within the plane, arguments that satisfy ||𝒙||2=1 are on the unit circle with components 𝒙=[ cosθ sinθ ] have images through 𝒇 given analytically by

𝒇(𝒙)=𝑨𝒙=[ 𝒂1 𝒂2 ][ cosθ sinθ ]=cosθ𝒂1+sinθ𝒂2,

and correspond to ellipses.

Figure 1. Mapping of unit circle by 𝒇(𝒙)=𝑨𝒙, 𝑨=[ 2 -1 3 1 ].

n=250; h=2.0*π/n; θ=h*(1:n); c=cos.(θ); s=sin.(θ);
a1=[2; 3]; a2=[-1; 1]; A=[a1 a2]

[ 2 -1 3 1 ] (1)

fx = c.*a1[1]+s.*a2[1]; fy = c.*a1[2]+s.*a2[2];
clf(); grid("on"); plot(c,s); axis("equal");
plot(fx,fy,"r");
F=svd(A); U=F.U; Σ=Diagonal(F.S); Vt=F.Vt; V=Vt';
σ1=Σ[1,1]; σ2=Σ[2,2];
z=[0; 0]; u1=σ1*[z U[:,1]]; u2=σ2*[z U[:,2]];
v1=[z V[:,1]]; v2=[z V[:,2]];
cd(homedir()*"/courses/MATH661/images");
plot(u1[1,:],u1[2,:],"r");
plot(u2[1,:],u2[2,:],"r");
plot(v1[1,:],v1[2,:],"b");
plot(v2[1,:],v2[2,:],"b");
savefig("L08Fig01.eps")

From the above the mapping associated 𝑨 amplifies some directions more than others. This suggests a definition of the size of a matrix or a mapping by the maximal amplification unit norm vectors within the domain.

Definition. For vector spaces U,V with norms ||||U:U+, ||||V:V+, the induced norm of 𝒇:UV is

||𝒇||=sup||𝒙||U=1||𝒇(𝒙)||V.

Definition. For vector spaces n,m with norms ||||(n):U+, ||||(m):V+, the induced norm of matrix 𝑨m×n is

||𝑨||=sup||𝒙||(n)=1||𝑨𝒙||(m).

In the above, any vector norm can be used within the domain and codomain.

2.The Singular Value Decomposition (SVD)

The fundamental theorem of linear algebra partitions the domain and codomain of a linear mapping 𝒇:UV. For real vectors spaces U=n, V=m the partition properties are stated in terms of spaces of the associated matrix 𝑨 as

C(𝑨)N(𝑨T)=m C(𝑨)N(𝑨T) C(𝑨T)N(𝑨)=n C(𝑨T)N(𝑨) .

The dimension of the column and row spaces r=dimC(𝑨)=dimC(𝑨T) is the rank of the matrix, n-r is the nullity of 𝑨, and m-r is the nullity of AT. A infinite number of bases could be defined for the domain and codomain. It is of great theoretical and practical interest to define bases with properties that facilitate insight or computation.

2.1.Orthogonal matrices

The above partitions of the domain and codomain are orthogonal, and suggest searching for orthogonal bases within these subspaces. Introduce a matrix representation for the bases

𝑼=[ 𝒖1 𝒖2 𝒖m ]m×m,𝑽=[ 𝒗1 𝒗2 𝒗n ]n×n,

with C(𝑼)=m and C(𝑽)=n. Orthogonality between columns 𝒖i, 𝒖j for ij is expressed as 𝒖iT𝒖j=0. For i=j, the inner product is positive 𝒖iT𝒖i>0, and since scaling of the columns of 𝑼 preserves the spanning property C(𝑼)=m, it is convenient to impose 𝒖iT𝒖i=1. Such behavior is concisely expressed as a matrix product

𝑼T𝑼=𝑰m,

with 𝑰m the identity matrix in m. Expanded in terms of the column vectors of 𝑼 the first equality is

[ 𝒖1 𝒖2 𝒖m ]T[ 𝒖1 𝒖2 𝒖m ]=[ 𝒖1T 𝒖2T 𝒖mT ][ 𝒖1 𝒖2 𝒖m ]=[ 𝒖1T𝒖1 𝒖1T𝒖2 𝒖1T𝒖m 𝒖2T𝒖1 𝒖2T𝒖2 𝒖2T𝒖m 𝒖mT𝒖1 𝒖mT𝒖2 𝒖mT𝒖m ]=𝑰m.

It is useful to determine if a matrix 𝑿 exists such that 𝑼𝑿=𝑰m, or

𝑼𝑿=𝑼[ 𝒙1 𝒙2 𝒙m ]=[ 𝒆1 𝒆2 𝒆m ].

The columns of 𝑿 are the coordinates of the column vectors of 𝑰m in the basis 𝑼, and can readily be determined

𝑼𝒙j=𝒆j𝑼T𝑼𝒙j=𝑼T𝒆j𝑰m𝒙j=[ 𝒖1T 𝒖2T 𝒖mT ]𝒆j𝒙j=(𝑼T)j,

where (𝑼T)j is the jth column of 𝑼T, hence 𝑿=𝑼T, leading to

𝑼T𝑼=𝑰=𝑼𝑼T.

Note that the second equality

[ 𝒖1 𝒖2 𝒖m ][ 𝒖1 𝒖2 𝒖m ]T=[ 𝒖1 𝒖2 𝒖m ][ 𝒖1T 𝒖2T 𝒖mT ]=𝒖1𝒖1T+𝒖2𝒖2T++𝒖m𝒖mT=𝑰

acts as normalization condition on the matrices 𝑼j=𝒖j𝒖jT.

Definition. A square matrix 𝑼 is said to be orthogonal if 𝑼T𝑼=𝑼𝑼T=𝑰.

2.2.Intrinsic basis of a linear mapping

Given a linear mapping 𝒇:UV, expressed as 𝒚=𝒇(𝒙)=𝑨𝒙, the simplest description of the action of 𝑨 would be a simple scaling, as exemplified by 𝒈(𝒙)=a𝒙 that has as its associated matrix a𝑰. Recall that specification of a vector is typically done in terms of the identity matrix 𝒃=𝑰𝒃, but may be more insightfully given in some other basis 𝑨𝒙=𝑰𝒃. This suggests that especially useful bases for the domain and codomain would reduce the action of a linear mapping to scaling along orthogonal directions, and evaluate 𝒚=𝑨𝒙 by first re-expressing 𝒚 in another basis 𝑼, 𝑼𝒔=𝑰𝒚 and re-expressing 𝒙 in another basis 𝑽, 𝑽𝒓=𝑰𝒙. The condition that the linear operator reduces to simple scaling in these new bases is expressed as si=σiri for i=1,,min(m,n), with σi the scaling coefficients along each direction which can be expressed as a matrix vector product 𝒔=𝚺𝒓, where 𝚺m×n is of the same dimensions as 𝑨 and given by

𝚺=[ σ1 0 0 0 0 0 σ2 0 0 0 0 0 0 σr 0 0 0 0 0 0 0 0 0 0 0 0 ].

Imposing the condition that 𝑼,𝑽 are orthogonal leads to

𝑼𝒔=𝒚𝒔=𝑼T𝒚,𝑽𝒓=𝒙𝒓=𝑽T𝒙,

which can be replaced into 𝒔=𝚺𝒓 to obtain

𝑼T𝒚=𝚺𝑽T𝒙𝒚=𝑼𝚺𝑽T𝒙.

From the above the orthogonal bases 𝑼,𝑽 and scaling coefficients 𝚺 that are sought must satisfy 𝑨=𝑼𝚺𝑽T.

Theorem. Every matrix 𝑨m×n has a singular value decomposition (SVD)

𝑨=𝑼𝚺𝑽T,

with properties:

  1. 𝑼m×m is an orthogonal matrix, 𝑼T𝑼=𝑰m;

  2. 𝑽m×m is an orthogonal matrix, 𝑽T𝑽=𝑰n;

  3. 𝚺m×n is diagonal, 𝚺=diag(σ1,,σp), p=min(m,n), and σ1σ2σp0.

Proof. The proof of the SVD makes use of properties of the norm, concepts from analysis and complete induction. Adopting the 2-norm set σ1=||A||2,

σ1=sup||𝒙||2=1||𝑨𝒙||2.

The domain ||𝒙||2=1 is compact (closed and bounded), and the extreme value theorem implies that 𝒇(𝒙)=𝑨𝒙 attains its maxima and minima, hence there must exist some vectors 𝒖1,𝒗1 of unit norm such that σ1𝒖1=𝑨𝒗1σ1=𝒖1T𝑨𝒗1. Introduce orthogonal bases 𝑼1, 𝑽1 for m,n whose first column vectors are 𝒖1,𝒗1, and compute

𝑼1T𝑨𝑽1=[ 𝒖1T 𝒖mT ][ 𝑨𝒗1 𝑨vn ]=[ σ1 𝒘T 𝟎 𝑩 ]=𝑪.

In the above 𝒘T is a row vector with n-1 components 𝒖1T𝑨𝒗j, j=2,,n, and 𝒖iT𝑨𝒗1 must be zero for 𝒖1 to be the direction along which the maximum norm ||𝑨𝒗1|| is obtained. Introduce vectors

𝒚=[ σ1 𝒘 ],𝒛=𝑪𝒚=[ σ12+𝒘T𝒘 𝑩𝒘 ],

and ||𝑪𝒚||2=||𝒛||2σ12+𝒘T𝒘+||𝑩𝒘||1σ12+𝒘T𝒘=||𝒚||22=σ12+𝒘T𝒘 ||𝒚||2. From ||𝑼1T𝑨𝑽1||=||𝑨||=σ1=||𝑪||σ12+𝒘T𝒘 it results that 𝒘=𝟎. By induction, assume that 𝑩 has a singular value decomposition, 𝑩=𝑼2𝚺2𝑽2T, such that

𝑼1T𝑨𝑽1=[ σ1 𝟎T 𝟎 𝑼2𝚺2𝑽2T ]=[ 1 𝟎T 𝟎 𝑼2 ][ σ1 𝟎T 𝟎 𝚺2 ][ 1 𝟎T 𝟎 𝑽2T ],

and the orthogonal matrices arising in the singular value decomposition of 𝑨 are

𝑼=𝑼1[ 1 𝟎T 𝟎 𝑼2 ],𝑽T=[ 1 𝟎T 𝟎 𝑽2T ]𝑽1T.

The scaling coefficients σj are called the singular values of 𝑨. The columns of 𝑼 are called the left singular vectors, and those of 𝑽 are called the right singular vectors.

The fact that the scaling coefficients are norms of 𝑨 and submatrices of 𝑨, σ1=||𝑨||, is crucial importance in applications. Carrying out computation of the matrix products

𝑨=[ 𝒖1 𝒖2 𝒖r 𝒖r+1 𝒖m ][ σ1 0 0 0 0 0 σ2 0 0 0 0 0 0 σr 0 0 0 0 0 0 0 0 0 0 0 0 ][ 𝒗1T 𝒗2T 𝒗rT 𝒗r+1T 𝒗nT ]=[ 𝒖1 𝒖2 𝒖r 𝒖r+1 𝒖m ][ σ1𝒗1T σ2𝒗2T σr𝒗rT 0 ]

leads to a representation of 𝑨 as a sum

𝑨=i=1rσi𝒖i𝒗iT,rmin(m,n).
𝑨=σ1𝒖1𝒗1T+σ2𝒖2𝒗2T++σr𝒖r𝒗rT

Each product 𝒖i𝒗iT is a matrix of rank one, and is called a rank-one update. Truncation of the above sum to p terms leads to an approximation of 𝑨

𝑨𝑨p=i=1pσi𝒖i𝒗iT.

In very many cases the singular values exhibit rapid, exponential decay, σ1σ2, such that the approximation above is an accurate representation of the matrix 𝑨.

Figure 2. Successive SVD approximations of Andy Warhol's painting, Marilyn Diptych (~1960), with k=10,20,40 rank-one updates.

3.SVD solution of linear algebra problems

The SVD can be used to solve common problems within linear algebra.

Change of coordinates.
To change from vector coordinates 𝒃 in the canonical basis 𝑰m×m to coordinates 𝒙 in some other basis 𝑨m×m, a solution to the equation 𝑰𝒃=𝑨𝒙 can be found by the following steps.

  1. Compute the SVD, 𝑼𝚺𝑽T=𝑨;

  2. Find the coordinates of 𝒃 in the orthogonal basis 𝑼, 𝒄=𝑼T𝒃;

  3. Scale the coordinates of 𝒄 by the inverse of the singular values yi=ci/σi, i=1,,m, such that Σ𝒚=𝒄 is satisfied;

  4. Find the coordinates of 𝒚 in basis 𝑽T, 𝒙=𝑽𝒚.

Best 2-norm approximation.
In the above 𝑨 was assumed to be a basis, hence r=rank(𝑨)=m. If columns of 𝑨 do not form a basis, r<m, then 𝒃m might not be reachable by linear combinations within C(𝑨). The closest vector to 𝒃 in the norm is however found by the same steps, with the simple modification that in Step 3, the scaling is carried out only for non-zero singular values, yi=ci/σi, i=1,,r.

The pseudo-inverse.
From the above, finding either the solution of 𝑨𝒙=𝑰𝒃 or the best approximation possible if 𝑨 is not of full rank, can be written as a sequence of matrix multiplications using the SVD

(𝑼𝚺𝑽T)𝒙=𝒃𝑼(𝚺𝑽T𝒙)=𝒃(𝚺𝑽T𝒙)=𝑼T𝒃𝑽T𝒙=𝚺+𝑼T𝒃𝒙=𝑽𝚺+𝑼T𝒃,

where the matrix 𝚺+n×m (notice the inversion of dimensions) is defined as a matrix with elements σi-1 on the diagonal, and is called the pseudo-inverse of 𝚺. Similarly the matrix

𝑨+=𝑽𝚺+𝑼T

that allows stating the solution of 𝑨𝒙=𝒃 simply as 𝒙=𝑨+𝒃 is called the pseudo-inverse of 𝑨. Note that in practice 𝑨+ is not explicitly formed. Rather the notation 𝑨+ is simply a concise reference to carrying out steps 1-4 above.