The Singular Value Decomposition

Synopsis. The linear algebra framework in which matrix-vector multiplication represents a linear combination and also corresponds to evaluation of a linear mapping has been shown to be complete through the FTLA. In order to effectively solve the main problems of linear algebra within this framework, bases must be constructed for the fundamental subspaces of a matrix. In order for the bases to be computationally efficient they should be orthonormal. The existence of such basis sets is guaranteed by the singular value decomposition theorem.

1.Orthogonal matrices

The fundamental theorem of linear algebra partitions the domain and codomain of a linear mapping 𝒇:UV. For real vectors spaces U=n, V=m the partition properties are stated in terms of spaces of the associated matrix 𝑨 as

C(𝑨)N(𝑨T)=m C(𝑨)N(𝑨T) C(𝑨T)N(𝑨)=n C(𝑨T)N(𝑨) .

The dimension of the column and row spaces r=dimC(𝑨)=dimC(𝑨T) is the rank of the matrix, n-r is the nullity of 𝑨, and m-r is the nullity of AT. A infinite number of bases could be defined for the domain and codomain. It is of great theoretical and practical interest to define bases with properties that facilitate insight or computation.

The above partitions of the domain and codomain are orthogonal, and suggest searching for orthogonal bases within these subspaces. Introduce a matrix representation for the bases

𝑼=[ 𝒖1 𝒖2 𝒖m ]m×m,𝑽=[ 𝒗1 𝒗2 𝒗n ]n×n,

with C(𝑼)=m and C(𝑽)=n. Orthogonality between columns 𝒖i, 𝒖j for ij is expressed as 𝒖iT𝒖j=0. For i=j, the inner product is positive 𝒖iT𝒖i>0, and since scaling of the columns of 𝑼 preserves the spanning property C(𝑼)=m, it is convenient to impose 𝒖iT𝒖i=1. Such behavior is concisely expressed as a matrix product

𝑼T𝑼=𝑰m,

with 𝑰m the identity matrix in m. Expanded in terms of the column vectors of 𝑼 the first equality is

[ 𝒖1 𝒖2 𝒖m ]T[ 𝒖1 𝒖2 𝒖m ]=[ 𝒖1T 𝒖2T 𝒖mT ][ 𝒖1 𝒖2 𝒖m ]=[ 𝒖1T𝒖1 𝒖1T𝒖2 𝒖1T𝒖m 𝒖2T𝒖1 𝒖2T𝒖2 𝒖2T𝒖m 𝒖mT𝒖1 𝒖mT𝒖2 𝒖mT𝒖m ]=𝑰m.

It is useful to determine if a matrix 𝑿 exists such that 𝑼𝑿=𝑰m, or

𝑼𝑿=𝑼[ 𝒙1 𝒙2 𝒙m ]=[ 𝒆1 𝒆2 𝒆m ].

The columns of 𝑿 are the coordinates of the column vectors of 𝑰m in the basis 𝑼, and can readily be determined

𝑼𝒙j=𝒆j𝑼T𝑼𝒙j=𝑼T𝒆j𝑰m𝒙j=[ 𝒖1T 𝒖2T 𝒖mT ]𝒆j𝒙j=(𝑼T)j,

where (𝑼T)j is the jth column of 𝑼T, hence 𝑿=𝑼T, leading to

𝑼T𝑼=𝑰=𝑼𝑼T.

Note that the second equality

[ 𝒖1 𝒖2 𝒖m ][ 𝒖1 𝒖2 𝒖m ]T=[ 𝒖1 𝒖2 𝒖m ][ 𝒖1T 𝒖2T 𝒖mT ]=𝒖1𝒖1T+𝒖2𝒖2T++𝒖m𝒖mT=𝑰

acts as normalization condition on the matrices 𝑼j=𝒖j𝒖jT.

Definition. A square matrix 𝑼 is said to be orthogonal if 𝑼T𝑼=𝑼𝑼T=𝑰.

2.Intrinsic basis of a linear mapping

Given a linear mapping 𝒇:UV, expressed as 𝒚=𝒇(𝒙)=𝑨𝒙, the simplest description of the action of 𝑨 would be a simple scaling, as exemplified by 𝒈(𝒙)=a𝒙 that has as its associated matrix a𝑰. Recall that specification of a vector is typically done in terms of the identity matrix 𝒃=𝑰𝒃, but may be more insightfully given in some other basis 𝑨𝒙=𝑰𝒃. This suggests that especially useful bases for the domain and codomain would reduce the action of a linear mapping to scaling along orthogonal directions, and evaluate 𝒚=𝑨𝒙 by first re-expressing 𝒚 in another basis 𝑼, 𝑼𝒔=𝑰𝒚 and re-expressing 𝒙 in another basis 𝑽, 𝑽𝒓=𝑰𝒙. The condition that the linear operator reduces to simple scaling in these new bases is expressed as si=σiri for i=1,,min(m,n), with σi the scaling coefficients along each direction which can be expressed as a matrix vector product 𝒔=𝚺𝒓, where 𝚺m×n is of the same dimensions as 𝑨 and given by

𝚺=[ σ1 0 0 0 0 0 σ2 0 0 0 0 0 0 σr 0 0 0 0 0 0 0 0 0 0 0 0 ].

Imposing the condition that 𝑼,𝑽 are orthogonal leads to

𝑼𝒔=𝒚𝒔=𝑼T𝒚,𝑽𝒓=𝒙𝒓=𝑽T𝒙,

which can be replaced into 𝒔=𝚺𝒓 to obtain

𝑼T𝒚=𝚺𝑽T𝒙𝒚=𝑼𝚺𝑽T𝒙.

From the above the orthogonal bases 𝑼,𝑽 and scaling coefficients 𝚺 that are sought must satisfy 𝑨=𝑼𝚺𝑽T. The SVD theorem states that the matrix factors 𝑼,𝚺,𝑽 do indeed exist.

Theorem. Every matrix 𝑨m×n has a singular value decomposition (SVD)

𝑨=𝑼𝚺𝑽T,

with properties:

  1. 𝑼m×m is an orthogonal matrix, 𝑼T𝑼=𝑰m;

  2. 𝑽m×m is an orthogonal matrix, 𝑽T𝑽=𝑰n;

  3. 𝚺m×n is diagonal, 𝚺=diag(σ1,,σp), p=min(m,n), and σ1σ2σp0.

The scaling coefficients σj are called the singular values of 𝑨. The columns of 𝑼 are called the left singular vectors, and those of 𝑽 are called the right singular vectors. Carrying out computation of the matrix products

𝑨=[ 𝒖1 𝒖2 𝒖r 𝒖r+1 𝒖m ][ σ1 0 0 0 0 0 σ2 0 0 0 0 0 0 σr 0 0 0 0 0 0 0 0 0 0 0 0 ][ 𝒗1T 𝒗2T 𝒗rT 𝒗r+1T 𝒗nT ]
𝑨=[ 𝒖1 𝒖2 𝒖r 𝒖r+1 𝒖m ][ σ1𝒗1T σ2𝒗2T σr𝒗rT 0 ]

leads to a representation of 𝑨 as a sum

𝑨=i=1rσi𝒖i𝒗iT,

with rmin(m,n). Written out in full, the above sum is

𝑨=σ1𝒖1𝒗1T+σ2𝒖2𝒗2T++σr𝒖r𝒗rT.

Each product 𝒖i𝒗iT is a matrix of rank one, and is called a rank-one update. Truncation of the above sum to p terms leads to an approximation of 𝑨

𝑨𝑨p=i=1pσi𝒖i𝒗iT.

In very many cases the singular values exhibit rapid decay, σ1σ2, such that the approximation above is an accurate representation of the matrix 𝑨 for pr.

3.SVD solution of linear algebra problems

The SVD can be used to solve common problems within linear algebra.

Linear systems.
To change from vector coordinates 𝒃 in the canonical basis 𝑰m×m to coordinates 𝒙 in some other basis 𝑨m×m, a solution to the equation 𝑰𝒃=𝑨𝒙 can be found by the following steps.

  1. Compute the SVD, 𝑼𝚺𝑽T=𝑨;

  2. Find the coordinates of 𝒃 in the orthogonal basis 𝑼, 𝒄=𝑼T𝒃;

  3. Scale the coordinates of 𝒄 by the inverse of the singular values yi=ci/σi, i=1,,m, such that Σ𝒚=𝒄 is satisfied;

  4. Find the coordinates of 𝒚 in basis 𝑽T, 𝒙=𝑽𝒚.

Least squares.
In the above 𝑨 was assumed to be a basis, hence r=rank(𝑨)=m. If columns of 𝑨 do not form a basis, r<m, then 𝒃m might not be reachable by linear combinations within C(𝑨). The closest vector to 𝒃 in the 2-norm is however found by the same steps, with the simple modification that in Step 3, the scaling is carried out only for non-zero singular values, yi=ci/σi, i=1,,r.

The pseudo-inverse.
From the above, finding either the solution of 𝑨𝒙=𝑰𝒃 or the best approximation possible if 𝑨 is not of full rank, can be written as a sequence of matrix multiplications using the SVD

(𝑼𝚺𝑽T)𝒙=𝒃𝑼(𝚺𝑽T𝒙)=𝒃(𝚺𝑽T𝒙)=𝑼T𝒃𝑽T𝒙=𝚺+𝑼T𝒃𝒙=𝑽𝚺+𝑼T𝒃,

where the matrix 𝚺+n×m (notice the inversion of dimensions) is defined as a matrix with elements σi-1 on the diagonal, and is called the pseudo-inverse of 𝚺. Similarly the matrix

𝑨+=𝑽𝚺+𝑼T

that allows stating the solution of 𝑨𝒙=𝒃 simply as 𝒙=𝑨+𝒃 is called the pseudo-inverse of 𝑨. Note that in practice 𝑨+ is not explicitly formed. Rather the notation 𝑨+ is simply a concise reference to carrying out steps 1-4 above.