MATH661

Lecture 7: The Singular Value Decomposition

1.Mappings as data

1.1.Vector spaces of mappings and matrix representations

A vector space $ℒ$ can be formed from all linear mappings from the vector space $𝒰 = (U, S, +, \cdot)$ to another vector space $𝒱 = (V, S, +, \cdot)$

\begin{array}{cc} ℒ = {L, S, +, \cdot}, & L = {𝒇 | 𝒇 : U \to V, 𝒇 (a 𝒖 + b 𝒗) = a f (𝒖) + b f (𝒗) .}, \end{array}

with addition and scaling of linear mappings defined by $(𝒇 + 𝒈) (𝒖) = 𝒇 (𝒖) + 𝒈 (𝒖)$ and $(a 𝒇) (𝒖) = a 𝒇 (𝒖)$ . Let $B = {𝒖_{1}, 𝒖_{2}, \dots}$ denote a basis for the domain $U$ of linear mappings within $ℒ$ , such that the linear mapping $𝒇 \in ℒ$ is represented by the matrix

𝑨 = [\begin{array}{ccc} 𝒇 (𝒖_{1}) & 𝒇 (𝒖_{2}) & \dots \end{array}] .

When the domain and codomain are the real vector spaces $U = ℝ^{n}$ , $V = ℝ^{m}$ , the above is a standard matrix of real numbers, $𝑨 \in ℝ^{m \times n}$ . For linear mappings between infinite dimensional vector spaces, the matrix is understood in a generalized sense to contain an infinite number of columns that are elements of the codomain $V$ . For example, the indefinite integral is a linear mapping between the vector space of functions that allow differentiation to any order,

\begin{array}{cc} \int : 𝒞^{\infty} \to 𝒞^{\infty} & v (t) = \int u (t) d t \end{array}

and for the monomial basis $B = {1, t, t^{2}, \dots}$ , is represented by the generalized matrix

𝑨 = [\begin{array}{cccc} t & t^{2} & t^{3} & \dots \end{array}] .

Truncation of the MacLaurin series $u (t) = \sum_{j = 1}^{\infty} u_{j} t^{j},$ with $u_{j} = u^{(j)} (0) / j! \in ℝ$ to $n$ terms, and sampling of $u \in 𝒞^{\infty}$ at points $t_{1}, \dots, t_{m}$ , forms a standard matrix of real numbers

\begin{array}{cc} 𝑨 = [\begin{array}{cccc} 𝒕 & 𝒕^{2} & 𝒕^{3} & \dots \end{array}] \in ℝ^{m \times n}, & 𝒕^{j} = [\begin{array}{c} t_{1}^{j} \\ ⋮ \\ t_{m}^{j} \end{array}] \end{array} .

Values of function $u \in$ $𝒞^{\infty}$ at $t_{1}, \dots, t_{m}$ are approximated by

𝒖 = 𝑩 𝒙 = {[\begin{array}{lll} u (t_{1}) & \dots & u (t_{m}) \end{array}]}^{T},

with $𝒙$ denoting the coordinates of $𝒖$ in basis $𝑩$ . The above argument states that the coordinates $𝒚$ of $𝒗$ , the primitive of $𝒖$ are given by

𝒚 = 𝑨 𝒙,

as can be indeed verified through term-by-term integration of the MacLaurin series.

As to be expected, matrices can also be organized as vector space $ℳ$ , which is essentially the representation of the associated vector space of linear mappings,

\begin{array}{cc} ℳ = (M, S, +, \cdot) & M = {𝑨 | 𝑨 = [\begin{array}{ccc} 𝒇 (𝒖_{1}) & 𝒇 (𝒖_{2}) & \dots \end{array}] .} \end{array} .

The addition $𝑪 = 𝑨 + 𝑩$ and scaling $𝑺 = a 𝑹$ of matrices is given in terms of the matrix components by

c_{i j} = a_{i j} + b_{i j}, s_{i j} = a r_{ij} .

1.2.Measurement of mappings

From the above it is apparent that linear mappings and matrices can also be considered as data, and a first step in analysis of such data is definition of functionals that would attach a single scalar label to each linear mapping of matrix. Of particular interest is the definition of a norm functional that characterizes in an appropriate sense the size of a linear mapping.

Consider first the case of finite matrices with real components $𝑨 \in ℝ^{m \times n}$ that represent linear mappings between real vector spaces $𝒇 : ℝ^{m} \to ℝ^{n}$ . The columns $𝒂_{1}, \dots, 𝒂_{n}$ of $𝑨 \in ℝ^{m \times n}$ could be placed into a single column vector $𝒄$ with $m n$ components

𝒄 = [\begin{array}{c} 𝒂_{1} \\ ⋮ \\ 𝒂_{n} \end{array}] .

Subsequently the norm of the matrix $𝑨$ could be defined as the norm of the vector $𝒄$ . An example of this approach is the Frobenius norm

{|| 𝑨 ||}_{F} = {|| 𝒄 ||}_{2} = {(\sum_{i = 1}^{m} \sum_{j = 1}^{n} {| a_{i j} |}^{2})}^{1 / 2} .

A drawback of the above approach is that the structure of the matrix and its close relationship to a linear mapping is lost. A more useful characterization of the size of a mapping is to consider the amplification behavior of linear mapping. The motivation is readily understood starting from linear mappings between the reals $f : ℝ \to ℝ$ , that are of the form $f (x) = a x$ . When given an argument of unit magnitude $| x | = 1$ , the mapping returns a real number with magnitude $| a |$ . For mappings $𝒇 : ℝ^{2} \to ℝ^{2}$ within the plane, arguments that satisfy ${|| 𝒙 ||}_{2} = 1$ are on the unit circle with components $𝒙 = [\begin{array}{cc} \cos θ & \sin θ \end{array}]$ have images through $𝒇$ given analytically by

𝒇 (𝒙) = 𝑨 𝒙 = [\begin{array}{cc} 𝒂_{1} & 𝒂_{2} \end{array}] [\begin{array}{c} \cos θ \\ \sin θ \end{array}] = \cos θ 𝒂_{1} + \sin θ 𝒂_{2},

and correspond to ellipses.

$•$ $\circ$

Figure 1. Mapping of unit circle by $𝒇 (𝒙) = 𝑨 𝒙$ , $𝑨 = [\begin{array}{cc} 2 & - 1 \\ 3 & 1 \end{array}] .$

∴	n=250; h=2.0* $π$ /n; θ=h*(1:n); c=cos.(θ); s=sin.(θ);

∴	a1=[2; 3]; a2=[-1; 1]; A=[a1 a2]

$[\begin{array}{cc} 2 & - 1 \\ 3 & 1 \end{array}]$ (1)

∴	fx = c.a1[1]+s.a2[1]; fy = c.a1[2]+s.a2[2];

∴	clf(); grid("on"); plot(c,s); axis("equal");

∴	plot(fx,fy,"r");

∴	F=svd(A); U=F.U; Σ=Diagonal(F.S); Vt=F.Vt; V=Vt';

∴	σ1=Σ[1,1]; σ2=Σ[2,2];

∴	z=[0; 0]; u1=σ1[z U[:,1]]; u2=σ2[z U[:,2]];

∴	v1=[z V[:,1]]; v2=[z V[:,2]];

∴	cd(homedir()*"/courses/MATH661/images");

∴	plot(u1[1,:],u1[2,:],"r");

∴	plot(u2[1,:],u2[2,:],"r");

∴	plot(v1[1,:],v1[2,:],"b");

∴	plot(v2[1,:],v2[2,:],"b");

∴	savefig("L08Fig01.eps")

∴

From the above the mapping associated $𝑨$ amplifies some directions more than others. This suggests a definition of the size of a matrix or a mapping by the maximal amplification unit norm vectors within the domain.

Definition. For vector spaces $U, V$ with norms ${|| ||}_{U} : U \to ℝ_{+}$ , ${|| ||}_{V} : V \to ℝ_{+}$ , the induced norm of $𝒇 : U \to V$ is

$|| 𝒇 || = {sup}_{{|| 𝒙 ||}_{U} = 1} {|| 𝒇 (𝒙) ||}_{V} .$

Definition. For vector spaces $ℝ^{n}, ℝ^{m}$ with norms ${|| ||}^{(n)} : U \to ℝ_{+}$ , ${|| ||}^{(m)} : V \to ℝ_{+}$ , the induced norm of matrix $𝑨 \in ℝ^{m \times n}$ is

$|| 𝑨 || = {sup}_{{|| 𝒙 ||}^{(n)} = 1} {|| 𝑨 𝒙 ||}^{(m)} .$

In the above, any vector norm can be used within the domain and codomain.

2.The Singular Value Decomposition (SVD)

The fundamental theorem of linear algebra partitions the domain and codomain of a linear mapping $𝒇 : U \to V$ . For real vectors spaces $U = ℝ^{n}$ , $V = ℝ^{m}$ the partition properties are stated in terms of spaces of the associated matrix $𝑨$ as

\begin{array}{cccc} C (𝑨) \oplus N (𝑨^{T}) = ℝ^{m} & C (𝑨) ⊥ N (𝑨^{T}) & C (𝑨^{T}) \oplus N (𝑨) = ℝ^{n} & C (𝑨^{T}) ⊥ N (𝑨) \end{array} .

The dimension of the column and row spaces $r = \dim C (𝑨) = \dim C (𝑨^{T})$ is the rank of the matrix, $n - r$ is the nullity of $𝑨$ , and $m - r$ is the nullity of $A^{T}$ . A infinite number of bases could be defined for the domain and codomain. It is of great theoretical and practical interest to define bases with properties that facilitate insight or computation.

2.1.Orthogonal matrices

The above partitions of the domain and codomain are orthogonal, and suggest searching for orthogonal bases within these subspaces. Introduce a matrix representation for the bases

𝑼 = [\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] \in ℝ^{m \times m}, 𝑽 = [\begin{array}{cccc} 𝒗_{1} & 𝒗_{2} & \dots & 𝒗_{n} \end{array}] \in ℝ^{n \times n},

with $C (𝑼) = ℝ^{m}$ and $C (𝑽) = ℝ^{n}$ . Orthogonality between columns $𝒖_{i}$ , $𝒖_{j}$ for $i \neq j$ is expressed as $𝒖_{i}^{T} 𝒖_{j} = 0$ . For $i = j$ , the inner product is positive $𝒖_{i}^{T} 𝒖_{i} > 0$ , and since scaling of the columns of $𝑼$ preserves the spanning property $C (𝑼) = ℝ^{m}$ , it is convenient to impose $𝒖_{i}^{T} 𝒖_{i} = 1$ . Such behavior is concisely expressed as a matrix product

𝑼^{T} 𝑼 = 𝑰_{m},

with $𝑰_{m}$ the identity matrix in $ℝ^{m}$ . Expanded in terms of the column vectors of $𝑼$ the first equality is

{[\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}]}^{T} [\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] = [\begin{array}{c} 𝒖_{1}^{T} \\ 𝒖_{2}^{T} \\ ⋮ \\ 𝒖_{m}^{T} \end{array}] [\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] = [\begin{array}{cccc} 𝒖_{1}^{T} 𝒖_{1} & 𝒖_{1}^{T} 𝒖_{2} & \dots & 𝒖_{1}^{T} 𝒖_{m} \\ 𝒖_{2}^{T} 𝒖_{1} & 𝒖_{2}^{T} 𝒖_{2} & \dots & 𝒖_{2}^{T} 𝒖_{m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 𝒖_{m}^{T} 𝒖_{1} & 𝒖_{m}^{T} 𝒖_{2} & \dots & 𝒖_{m}^{T} 𝒖_{m} \end{array}] = 𝑰_{m} .

It is useful to determine if a matrix $𝑿$ exists such that $𝑼 𝑿 = 𝑰_{m}$ , or

𝑼 𝑿 = 𝑼 [\begin{array}{cccc} 𝒙_{1} & 𝒙_{2} & \dots & 𝒙_{m} \end{array}] = [\begin{array}{cccc} 𝒆_{1} & 𝒆_{2} & \dots & 𝒆_{m} \end{array}] .

The columns of $𝑿$ are the coordinates of the column vectors of $𝑰_{m}$ in the basis $𝑼$ , and can readily be determined

𝑼 𝒙_{j} = 𝒆_{j} \Rightarrow 𝑼^{T} 𝑼 𝒙_{j} = 𝑼^{T} 𝒆_{j} \Rightarrow 𝑰_{m} 𝒙_{j} = [\begin{array}{c} 𝒖_{1}^{T} \\ 𝒖_{2}^{T} \\ ⋮ \\ 𝒖_{m}^{T} \end{array}] 𝒆_{j} \Rightarrow 𝒙_{j} = {(𝑼^{T})}_{j},

where ${(𝑼^{T})}_{j}$ is the $j^{th}$ column of $𝑼^{T}$ , hence $𝑿 = 𝑼^{T}$ , leading to

𝑼^{T} 𝑼 = 𝑰 = 𝑼 𝑼^{T} .

Note that the second equality

[\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] {[\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}]}^{T} = [\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] [\begin{array}{c} 𝒖_{1}^{T} \\ 𝒖_{2}^{T} \\ ⋮ \\ 𝒖_{m}^{T} \end{array}] = 𝒖_{1} 𝒖_{1}^{T} + 𝒖_{2} 𝒖_{2}^{T} + \dots + 𝒖_{m} 𝒖_{m}^{T} = 𝑰

acts as normalization condition on the matrices $𝑼_{j} = 𝒖_{j} 𝒖_{j}^{T}$ .

Definition. A square matrix $𝑼$ is said to be orthogonal if $𝑼^{T} 𝑼 = 𝑼 𝑼^{T} = 𝑰$ .

2.2.Intrinsic basis of a linear mapping

Given a linear mapping $𝒇 : U \to V$ , expressed as $𝒚 = 𝒇 (𝒙) = 𝑨 𝒙$ , the simplest description of the action of $𝑨$ would be a simple scaling, as exemplified by $𝒈 (𝒙) = a 𝒙$ that has as its associated matrix $a 𝑰$ . Recall that specification of a vector is typically done in terms of the identity matrix $𝒃 = 𝑰 𝒃$ , but may be more insightfully given in some other basis $𝑨 𝒙 = 𝑰 𝒃$ . This suggests that especially useful bases for the domain and codomain would reduce the action of a linear mapping to scaling along orthogonal directions, and evaluate $𝒚 = 𝑨 𝒙$ by first re-expressing $𝒚$ in another basis $𝑼$ , $𝑼 𝒔 = 𝑰 𝒚$ and re-expressing $𝒙$ in another basis $𝑽$ , $𝑽 𝒓 = 𝑰 𝒙$ . The condition that the linear operator reduces to simple scaling in these new bases is expressed as $s_{i} = σ_{i} r_{i}$ for $i = 1, \dots, min (m, n)$ , with $σ_{i}$ the scaling coefficients along each direction which can be expressed as a matrix vector product $𝒔 = 𝚺 𝒓$ , where $𝚺 \in ℝ^{m \times n}$ is of the same dimensions as $𝑨$ and given by

𝚺 = [\begin{array}{ccccccc} σ_{1} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & σ_{2} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & 0 & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{r} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{array}] .

Imposing the condition that $𝑼, 𝑽$ are orthogonal leads to

𝑼 𝒔 = 𝒚 \Rightarrow 𝒔 = 𝑼^{T} 𝒚, 𝑽 𝒓 = 𝒙 \Rightarrow 𝒓 = 𝑽^{T} 𝒙,

which can be replaced into $𝒔 = 𝚺 𝒓$ to obtain

𝑼^{T} 𝒚 = 𝚺 𝑽^{T} 𝒙 \Rightarrow 𝒚 = 𝑼 𝚺 𝑽^{T} 𝒙 .

From the above the orthogonal bases $𝑼, 𝑽$ and scaling coefficients $𝚺$ that are sought must satisfy $𝑨 = 𝑼 𝚺 𝑽^{T}$ .

Theorem. Every matrix $𝑨 \in ℝ^{m \times n}$ has a singular value decomposition (SVD)

$𝑨 = 𝑼 𝚺 𝑽^{T},$

with properties:

$𝑼 \in ℝ^{m \times m}$ is an orthogonal matrix, $𝑼^{T} 𝑼 = 𝑰_{m};$

$𝑽 \in ℝ^{m \times m}$ is an orthogonal matrix, $𝑽^{T} 𝑽 = 𝑰_{n};$

$𝚺 \in ℝ^{m \times n}$ is diagonal, $𝚺 = diag (σ_{1}, \dots, σ_{p})$ , $p = min (m, n)$ , and $σ_{1} ⩾ σ_{2} ⩾ \dots ⩾ σ_{p} ⩾ 0$ .

Proof. The proof of the SVD makes use of properties of the norm, concepts from analysis and complete induction. Adopting the 2-norm set $σ_{1} = {|| A ||}_{2}$ ,

$σ_{1} = {sup}_{{|| 𝒙 ||}_{2} = 1} {|| 𝑨 𝒙 ||}_{2} .$

The domain ${|| 𝒙 ||}_{2} = 1$ is compact (closed and bounded), and the extreme value theorem implies that $𝒇 (𝒙) = 𝑨 𝒙$ attains its maxima and minima, hence there must exist some vectors $𝒖_{1}, 𝒗_{1}$ of unit norm such that $σ_{1} 𝒖_{1} = 𝑨 𝒗_{1} \Rightarrow σ_{1} = 𝒖_{1}^{T} 𝑨 𝒗_{1}$ . Introduce orthogonal bases $𝑼_{1}$ , $𝑽_{1}$ for $ℝ^{m}, ℝ^{n}$ whose first column vectors are $𝒖_{1}, 𝒗_{1}$ , and compute

$𝑼_{1}^{T} 𝑨 𝑽_{1} = [\begin{array}{c} 𝒖_{1}^{T} \\ ⋮ \\ 𝒖_{m}^{T} \end{array}] [\begin{array}{ccc} 𝑨 𝒗_{1} & \dots & 𝑨 v_{n} \end{array}] = [\begin{array}{cc} σ_{1} & 𝒘^{T} \\ 𝟎 & 𝑩 \end{array}] = 𝑪 .$

In the above $𝒘^{T}$ is a row vector with $n - 1$ components $𝒖_{1}^{T} 𝑨 𝒗_{j}$ , $j = 2, \dots, n$ , and $𝒖_{i}^{T} 𝑨 𝒗_{1}$ must be zero for $𝒖_{1}$ to be the direction along which the maximum norm $|| 𝑨 𝒗_{1} ||$ is obtained. Introduce vectors

$𝒚 = [\begin{array}{c} σ_{1} \\ 𝒘 \end{array}], 𝒛 = 𝑪 𝒚 = [\begin{array}{c} σ_{1}^{2} + 𝒘^{T} 𝒘 \\ 𝑩 𝒘 \end{array}],$

and ${|| 𝑪 𝒚 ||}_{2} = {|| 𝒛 ||}_{2} ⩾ σ_{1}^{2} + 𝒘^{T} 𝒘 + {|| 𝑩 𝒘 ||}_{1} ⩾ σ_{1}^{2} + 𝒘^{T} 𝒘 = {|| 𝒚 ||}_{2}^{2} = \sqrt{σ_{1}^{2} + 𝒘^{T} 𝒘}$ ${|| 𝒚 ||}_{2}$ . From $|| 𝑼_{1}^{T} 𝑨 𝑽_{1} || = || 𝑨 || = σ_{1} = || 𝑪 || ⩾ σ_{1}^{2} + 𝒘^{T} 𝒘$ it results that $𝒘 = 𝟎$ . By induction, assume that $𝑩$ has a singular value decomposition, $𝑩 = 𝑼_{2} 𝚺_{2} 𝑽_{2}^{T}$ , such that

$𝑼_{1}^{T} 𝑨 𝑽_{1} = [\begin{array}{cc} σ_{1} & 𝟎^{T} \\ 𝟎 & 𝑼_{2} 𝚺_{2} 𝑽_{2}^{T} \end{array}] = [\begin{array}{cc} 1 & 𝟎^{T} \\ 𝟎 & 𝑼_{2} \end{array}] [\begin{array}{cc} σ_{1} & 𝟎^{T} \\ 𝟎 & 𝚺_{2} \end{array}] [\begin{array}{cc} 1 & 𝟎^{T} \\ 𝟎 & 𝑽_{2}^{T} \end{array}],$

and the orthogonal matrices arising in the singular value decomposition of $𝑨$ are

$𝑼 = 𝑼_{1} [\begin{array}{cc} 1 & 𝟎^{T} \\ 𝟎 & 𝑼_{2} \end{array}], 𝑽^{T} = [\begin{array}{cc} 1 & 𝟎^{T} \\ 𝟎 & 𝑽_{2}^{T} \end{array}] 𝑽_{1}^{T} .$

$□$

The scaling coefficients $σ_{j}$ are called the singular values of $𝑨$ . The columns of $𝑼$ are called the left singular vectors, and those of $𝑽$ are called the right singular vectors.

The fact that the scaling coefficients are norms of $𝑨$ and submatrices of $𝑨$ , $σ_{1} = || 𝑨 ||$ , is crucial importance in applications. Carrying out computation of the matrix products

𝑨 = [\begin{array}{ccccccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{r} & 𝒖_{r + 1} & \dots & 𝒖_{m} \end{array}] [\begin{array}{ccccccc} σ_{1} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & σ_{2} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & 0 & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{r} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{array}] [\begin{array}{c} 𝒗_{1}^{T} \\ 𝒗_{2}^{T} \\ ⋮ \\ 𝒗_{r}^{T} \\ 𝒗_{r + 1}^{T} \\ ⋮ \\ 𝒗_{n}^{T} \end{array}] = [\begin{array}{ccccccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{r} & 𝒖_{r + 1} & \dots & 𝒖_{m} \end{array}] [\begin{array}{c} σ_{1} 𝒗_{1}^{T} \\ σ_{2} 𝒗_{2}^{T} \\ ⋮ \\ σ_{r} 𝒗_{r}^{T} \\ ⋮ \\ 0 \end{array}]

leads to a representation of $𝑨$ as a sum

𝑨 = \sum_{i = 1}^{r} σ_{i} 𝒖_{i} 𝒗_{i}^{T}, r ⩽ min (m, n) .

𝑨 = σ_{1} 𝒖_{1} 𝒗_{1}^{T} + σ_{2} 𝒖_{2} 𝒗_{2}^{T} + \dots + σ_{r} 𝒖_{r} 𝒗_{r}^{T}

Each product $𝒖_{i} 𝒗_{i}^{T}$ is a matrix of rank one, and is called a rank-one update. Truncation of the above sum to $p$ terms leads to an approximation of $𝑨$

𝑨 ≅ 𝑨_{p} = \sum_{i = 1}^{p} σ_{i} 𝒖_{i} 𝒗_{i}^{T} .

In very many cases the singular values exhibit rapid, exponential decay, $σ_{1} ≫ σ_{2} ≫ \dots$ , such that the approximation above is an accurate representation of the matrix $𝑨$ .

Figure 2. Successive SVD approximations of Andy Warhol's painting, Marilyn Diptych (~1960), with $k = 10, 20, 40$ rank-one updates.

3.SVD solution of linear algebra problems

The SVD can be used to solve common problems within linear algebra.

Change of coordinates.

To change from vector coordinates

𝒃

in the canonical basis

𝑰 \in ℝ^{m \times m}

to coordinates

𝒙

in some other basis

𝑨 \in ℝ^{m \times m}

, a solution to the equation

𝑰 𝒃 = 𝑨 𝒙

can be found by the following steps.

Compute the SVD, $𝑼 𝚺 𝑽^{T} = 𝑨$ ;
Find the coordinates of $𝒃$ in the orthogonal basis $𝑼$ , $𝒄 = 𝑼^{T} 𝒃$ ;
Scale the coordinates of $𝒄$ by the inverse of the singular values $y_{i} = c_{i} / σ_{i}$ , $i = 1, \dots, m$ , such that $Σ 𝒚 = 𝒄$ is satisfied;
Find the coordinates of $𝒚$ in basis $𝑽^{T}$ , $𝒙 = 𝑽 𝒚$ .

Best 2-norm approximation.

In the above

𝑨

was assumed to be a basis, hence

r = rank (𝑨) = m

. If columns of

𝑨

do not form a basis,

r < m

, then

𝒃 \in ℝ^{m}

might not be reachable by linear combinations within

C (𝑨)

. The closest vector to

𝒃

in the norm is however found by the same steps, with the simple modification that in Step 3, the scaling is carried out only for non-zero singular values,

y_{i} = c_{i} / σ_{i}

i = 1, \dots, r

The pseudo-inverse.

From the above, finding either the solution of

𝑨 𝒙 = 𝑰 𝒃

or the best approximation possible if

𝑨

is not of full rank, can be written as a sequence of matrix multiplications using the SVD

(𝑼 𝚺 𝑽^{T}) 𝒙 = 𝒃 \Rightarrow 𝑼 (𝚺 𝑽^{T} 𝒙) = 𝒃 \Rightarrow (𝚺 𝑽^{T} 𝒙) = 𝑼^{T} 𝒃 \Rightarrow 𝑽^{T} 𝒙 = 𝚺^{+} 𝑼^{T} 𝒃 \Rightarrow 𝒙 = 𝑽 𝚺^{+} 𝑼^{T} 𝒃,

where the matrix $𝚺^{+} \in ℝ^{n \times m}$ (notice the inversion of dimensions) is defined as a matrix with elements $σ_{i}^{- 1}$ on the diagonal, and is called the pseudo-inverse of $𝚺$ . Similarly the matrix

𝑨^{+} = 𝑽 𝚺^{+} 𝑼^{T}

that allows stating the solution of $𝑨 𝒙 = 𝒃$ simply as $𝒙 = 𝑨^{+} 𝒃$ is called the pseudo-inverse of $𝑨$ . Note that in practice $𝑨^{+}$ is not explicitly formed. Rather the notation $𝑨^{+}$ is simply a concise reference to carrying out steps 1-4 above.