MATH347DS

Synopsis. The linear algebra framework in which matrix-vector multiplication represents a linear combination and also corresponds to evaluation of a linear mapping has been shown to be complete through the FTLA. In order to effectively solve the main problems of linear algebra within this framework, bases must be constructed for the fundamental subspaces of a matrix. In order for the bases to be computationally efficient they should be orthonormal. The existence of such basis sets is guaranteed by the singular value decomposition theorem.

1.Orthogonal matrices

The fundamental theorem of linear algebra partitions the domain and codomain of a linear mapping $𝒇 : U \to V$ . For real vectors spaces $U = ℝ^{n}$ , $V = ℝ^{m}$ the partition properties are stated in terms of spaces of the associated matrix $𝑨$ as

\begin{array}{cccc} C (𝑨) \oplus N (𝑨^{T}) = ℝ^{m} & C (𝑨) ⊥ N (𝑨^{T}) & C (𝑨^{T}) \oplus N (𝑨) = ℝ^{n} & C (𝑨^{T}) ⊥ N (𝑨) \end{array} .

The dimension of the column and row spaces $r = \dim C (𝑨) = \dim C (𝑨^{T})$ is the rank of the matrix, $n - r$ is the nullity of $𝑨$ , and $m - r$ is the nullity of $A^{T}$ . A infinite number of bases could be defined for the domain and codomain. It is of great theoretical and practical interest to define bases with properties that facilitate insight or computation.

The above partitions of the domain and codomain are orthogonal, and suggest searching for orthogonal bases within these subspaces. Introduce a matrix representation for the bases

𝑼 = [\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] \in ℝ^{m \times m}, 𝑽 = [\begin{array}{cccc} 𝒗_{1} & 𝒗_{2} & \dots & 𝒗_{n} \end{array}] \in ℝ^{n \times n},

with $C (𝑼) = ℝ^{m}$ and $C (𝑽) = ℝ^{n}$ . Orthogonality between columns $𝒖_{i}$ , $𝒖_{j}$ for $i \neq j$ is expressed as $𝒖_{i}^{T} 𝒖_{j} = 0$ . For $i = j$ , the inner product is positive $𝒖_{i}^{T} 𝒖_{i} > 0$ , and since scaling of the columns of $𝑼$ preserves the spanning property $C (𝑼) = ℝ^{m}$ , it is convenient to impose $𝒖_{i}^{T} 𝒖_{i} = 1$ . Such behavior is concisely expressed as a matrix product

𝑼^{T} 𝑼 = 𝑰_{m},

with $𝑰_{m}$ the identity matrix in $ℝ^{m}$ . Expanded in terms of the column vectors of $𝑼$ the first equality is

{[\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}]}^{T} [\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] = [\begin{array}{c} 𝒖_{1}^{T} \\ 𝒖_{2}^{T} \\ ⋮ \\ 𝒖_{m}^{T} \end{array}] [\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] = [\begin{array}{cccc} 𝒖_{1}^{T} 𝒖_{1} & 𝒖_{1}^{T} 𝒖_{2} & \dots & 𝒖_{1}^{T} 𝒖_{m} \\ 𝒖_{2}^{T} 𝒖_{1} & 𝒖_{2}^{T} 𝒖_{2} & \dots & 𝒖_{2}^{T} 𝒖_{m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 𝒖_{m}^{T} 𝒖_{1} & 𝒖_{m}^{T} 𝒖_{2} & \dots & 𝒖_{m}^{T} 𝒖_{m} \end{array}] = 𝑰_{m} .

It is useful to determine if a matrix $𝑿$ exists such that $𝑼 𝑿 = 𝑰_{m}$ , or

𝑼 𝑿 = 𝑼 [\begin{array}{cccc} 𝒙_{1} & 𝒙_{2} & \dots & 𝒙_{m} \end{array}] = [\begin{array}{cccc} 𝒆_{1} & 𝒆_{2} & \dots & 𝒆_{m} \end{array}] .

The columns of $𝑿$ are the coordinates of the column vectors of $𝑰_{m}$ in the basis $𝑼$ , and can readily be determined

𝑼 𝒙_{j} = 𝒆_{j} \Rightarrow 𝑼^{T} 𝑼 𝒙_{j} = 𝑼^{T} 𝒆_{j} \Rightarrow 𝑰_{m} 𝒙_{j} = [\begin{array}{c} 𝒖_{1}^{T} \\ 𝒖_{2}^{T} \\ ⋮ \\ 𝒖_{m}^{T} \end{array}] 𝒆_{j} \Rightarrow 𝒙_{j} = {(𝑼^{T})}_{j},

where ${(𝑼^{T})}_{j}$ is the $j^{th}$ column of $𝑼^{T}$ , hence $𝑿 = 𝑼^{T}$ , leading to

𝑼^{T} 𝑼 = 𝑰 = 𝑼 𝑼^{T} .

Note that the second equality

[\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] {[\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}]}^{T} = [\begin{array}{cccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{m} \end{array}] [\begin{array}{c} 𝒖_{1}^{T} \\ 𝒖_{2}^{T} \\ ⋮ \\ 𝒖_{m}^{T} \end{array}] = 𝒖_{1} 𝒖_{1}^{T} + 𝒖_{2} 𝒖_{2}^{T} + \dots + 𝒖_{m} 𝒖_{m}^{T} = 𝑰

acts as normalization condition on the matrices $𝑼_{j} = 𝒖_{j} 𝒖_{j}^{T}$ .

Definition. A square matrix $𝑼$ is said to be orthogonal if $𝑼^{T} 𝑼 = 𝑼 𝑼^{T} = 𝑰$ .

2.Intrinsic basis of a linear mapping

Given a linear mapping $𝒇 : U \to V$ , expressed as $𝒚 = 𝒇 (𝒙) = 𝑨 𝒙$ , the simplest description of the action of $𝑨$ would be a simple scaling, as exemplified by $𝒈 (𝒙) = a 𝒙$ that has as its associated matrix $a 𝑰$ . Recall that specification of a vector is typically done in terms of the identity matrix $𝒃 = 𝑰 𝒃$ , but may be more insightfully given in some other basis $𝑨 𝒙 = 𝑰 𝒃$ . This suggests that especially useful bases for the domain and codomain would reduce the action of a linear mapping to scaling along orthogonal directions, and evaluate $𝒚 = 𝑨 𝒙$ by first re-expressing $𝒚$ in another basis $𝑼$ , $𝑼 𝒔 = 𝑰 𝒚$ and re-expressing $𝒙$ in another basis $𝑽$ , $𝑽 𝒓 = 𝑰 𝒙$ . The condition that the linear operator reduces to simple scaling in these new bases is expressed as $s_{i} = σ_{i} r_{i}$ for $i = 1, \dots, min (m, n)$ , with $σ_{i}$ the scaling coefficients along each direction which can be expressed as a matrix vector product $𝒔 = 𝚺 𝒓$ , where $𝚺 \in ℝ^{m \times n}$ is of the same dimensions as $𝑨$ and given by

𝚺 = [\begin{array}{ccccccc} σ_{1} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & σ_{2} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & 0 & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{r} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{array}] .

Imposing the condition that $𝑼, 𝑽$ are orthogonal leads to

𝑼 𝒔 = 𝒚 \Rightarrow 𝒔 = 𝑼^{T} 𝒚, 𝑽 𝒓 = 𝒙 \Rightarrow 𝒓 = 𝑽^{T} 𝒙,

which can be replaced into $𝒔 = 𝚺 𝒓$ to obtain

𝑼^{T} 𝒚 = 𝚺 𝑽^{T} 𝒙 \Rightarrow 𝒚 = 𝑼 𝚺 𝑽^{T} 𝒙 .

From the above the orthogonal bases $𝑼, 𝑽$ and scaling coefficients $𝚺$ that are sought must satisfy $𝑨 = 𝑼 𝚺 𝑽^{T}$ . The SVD theorem states that the matrix factors $𝑼, 𝚺, 𝑽$ do indeed exist.

Theorem. Every matrix $𝑨 \in ℝ^{m \times n}$ has a singular value decomposition (SVD)

$𝑨 = 𝑼 𝚺 𝑽^{T},$

with properties:

$𝑼 \in ℝ^{m \times m}$ is an orthogonal matrix, $𝑼^{T} 𝑼 = 𝑰_{m};$

$𝑽 \in ℝ^{m \times m}$ is an orthogonal matrix, $𝑽^{T} 𝑽 = 𝑰_{n};$

$𝚺 \in ℝ^{m \times n}$ is diagonal, $𝚺 = diag (σ_{1}, \dots, σ_{p})$ , $p = min (m, n)$ , and $σ_{1} ⩾ σ_{2} ⩾ \dots ⩾ σ_{p} ⩾ 0$ .

The scaling coefficients $σ_{j}$ are called the singular values of $𝑨$ . The columns of $𝑼$ are called the left singular vectors, and those of $𝑽$ are called the right singular vectors. Carrying out computation of the matrix products

𝑨 = [\begin{array}{ccccccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{r} & 𝒖_{r + 1} & \dots & 𝒖_{m} \end{array}] [\begin{array}{ccccccc} σ_{1} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & σ_{2} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & 0 & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & σ_{r} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{array}] [\begin{array}{c} 𝒗_{1}^{T} \\ 𝒗_{2}^{T} \\ ⋮ \\ 𝒗_{r}^{T} \\ 𝒗_{r + 1}^{T} \\ ⋮ \\ 𝒗_{n}^{T} \end{array}] \Rightarrow

𝑨 = [\begin{array}{ccccccc} 𝒖_{1} & 𝒖_{2} & \dots & 𝒖_{r} & 𝒖_{r + 1} & \dots & 𝒖_{m} \end{array}] [\begin{array}{c} σ_{1} 𝒗_{1}^{T} \\ σ_{2} 𝒗_{2}^{T} \\ ⋮ \\ σ_{r} 𝒗_{r}^{T} \\ ⋮ \\ 0 \end{array}]

leads to a representation of $𝑨$ as a sum

𝑨 = \sum_{i = 1}^{r} σ_{i} 𝒖_{i} 𝒗_{i}^{T},

with $r ⩽ min (m, n)$ . Written out in full, the above sum is

𝑨 = σ_{1} 𝒖_{1} 𝒗_{1}^{T} + σ_{2} 𝒖_{2} 𝒗_{2}^{T} + \dots + σ_{r} 𝒖_{r} 𝒗_{r}^{T} .

Each product $𝒖_{i} 𝒗_{i}^{T}$ is a matrix of rank one, and is called a rank-one update. Truncation of the above sum to $p$ terms leads to an approximation of $𝑨$

𝑨 ≅ 𝑨_{p} = \sum_{i = 1}^{p} σ_{i} 𝒖_{i} 𝒗_{i}^{T} .

In very many cases the singular values exhibit rapid decay, $σ_{1} ≫ σ_{2} ≫ \dots$ , such that the approximation above is an accurate representation of the matrix $𝑨$ for $p ≪ r$ .

The singular vector matrices $𝑼, 𝑽$ specify the intrinsic directions within $ℝ^{m}, ℝ^{n}$ along which the matrix $𝑨$ acts as a simple scaling transformation. For example, applying the linear mapping to the $𝒗_{1}$ vector, $𝒇 (𝒗_{1}) = 𝑨 𝒗_{1}$ , leads to

𝑨 𝒗_{1} = (\sum_{i = 1}^{p} σ_{i} 𝒖_{i} 𝒗_{i}^{T}) 𝒗_{1} = \sum_{i = 1}^{p} σ_{i} 𝒖_{i} (𝒗_{i}^{T} 𝒗_{1}) = σ_{1} 𝒖_{1} .

Since $σ_{1} ⩾ σ_{2} ⩾ \dots ⩾ σ_{r} > 0$ , the above states that the input direction most amplified by the $𝒇 (𝒙) = 𝑨 𝒙$ mapping is $𝒗_{1}$ and the result is the vector $σ_{1} 𝒖_{1}$ . The two-norm of $𝒗_{1}$ is equal to one and that of $σ_{1} 𝒖_{1}$ is $σ_{1}$ . The conclusion is that $σ_{1}$ is the maximal amplification factor in the two-norm

σ_{1} = {max}_{{|| 𝒙 ||}_{2} = 1} {|| 𝑨 𝒙 ||}_{2},

and the above satisfies the properties of a norm over matrices leading to the definition

{|| 𝑨_{} ||}_{2} = {max}_{{|| 𝒙 ||}_{2} = 1} {|| 𝑨 𝒙 ||}_{2} .

The largest singular value is thus the two-norm of a matrix.

3.SVD solution of linear algebra problems

The SVD can be used to solve common problems within linear algebra.

Linear systems.

To change from vector coordinates

𝒃

in the canonical basis

𝑰 \in ℝ^{m \times m}

to coordinates

𝒙

in some other basis

𝑨 \in ℝ^{m \times m}

, a solution to the equation

𝑰 𝒃 = 𝑨 𝒙

can be found by the following steps.

Compute the SVD, $𝑼 𝚺 𝑽^{T} = 𝑨$ ;
Find the coordinates of $𝒃$ in the orthogonal basis $𝑼$ , $𝒄 = 𝑼^{T} 𝒃$ ;
Scale the coordinates of $𝒄$ by the inverse of the singular values $y_{i} = c_{i} / σ_{i}$ , $i = 1, \dots, m$ , such that $Σ 𝒚 = 𝒄$ is satisfied;
Find the coordinates of $𝒚$ in basis $𝑽^{T}$ , $𝒙 = 𝑽 𝒚$ .

Least squares.

In the above

𝑨

was assumed to be a basis, hence

r = rank (𝑨) = m

. If columns of

𝑨

do not form a basis,

r < m

, then

𝒃 \in ℝ^{m}

might not be reachable by linear combinations within

C (𝑨)

. The closest vector to

𝒃

in the 2-norm is however found by the same steps, with the simple modification that in Step 3, the scaling is carried out only for non-zero singular values,

y_{i} = c_{i} / σ_{i}

i = 1, \dots, r

The pseudo-inverse.

From the above, finding either the solution of

𝑨 𝒙 = 𝑰 𝒃

or the best approximation possible if

𝑨

is not of full rank, can be written as a sequence of matrix multiplications using the SVD

(𝑼 𝚺 𝑽^{T}) 𝒙 = 𝒃 \Rightarrow 𝑼 (𝚺 𝑽^{T} 𝒙) = 𝒃 \Rightarrow (𝚺 𝑽^{T} 𝒙) = 𝑼^{T} 𝒃 \Rightarrow 𝑽^{T} 𝒙 = 𝚺^{+} 𝑼^{T} 𝒃 \Rightarrow 𝒙 = 𝑽 𝚺^{+} 𝑼^{T} 𝒃,

where the matrix $𝚺^{+} \in ℝ^{n \times m}$ (notice the inversion of dimensions) is defined as a matrix with elements $σ_{i}^{- 1}$ on the diagonal, and is called the pseudo-inverse of $𝚺$ . Similarly the matrix

𝑨^{+} = 𝑽 𝚺^{+} 𝑼^{T}