MATH347DS

Synopsis. The simple question of what directions remain unchanged by a linear mapping leads to widely useful concepts. In particular, constructive algorithms to obtain the orthonormal bases for the fundamental matrix subspaces predicted by the singular value decomposition are obtained. Several additional matrix factorizations arise and offer insight into linear mappings.

1.Determinants

Linear mappings of a vector space onto itself $𝒇 : ℝ^{m} \to ℝ^{m}$ are characterized by a square matrix $𝑨 \in ℝ^{m \times m}$ whose column vectors can be considered as the edges of a geometric object in $ℝ^{m}$ . When $m = 2$ a parallelogram is obtained with area $S = b h$ , with $b$ the length of the base edge and $h$ the height. The parallelogram edges are specified by the vectors $𝒂_{1}, 𝒂_{2}$ oriented at angles $φ_{1}, φ_{2}$ with respect to the $x_{1}$ $- axis$ . With $θ = φ_{2} - φ_{1}$ , the parallelogram height is

h = || 𝒂_{2} || \sin θ = || 𝒂_{2} || \sin . (. φ_{2} - φ_{1}) = || 𝒂_{2} || (\sin φ_{2} \cos φ_{1} - \sin φ_{1} \cos φ_{2}) .

The trigonometric functions can be related to the edge vectors through

\cos φ_{1} = a_{11} / || 𝒂_{1} ||, \sin φ_{1} = a_{12} / || 𝒂_{1} ||, \cos φ_{2} = a_{21} / || 𝒂_{2} ||, \sin φ_{2} = a_{22} / || 𝒂_{2} || .

The above allow statement of a formula for the parallelogram area strictly in terms of edge vector components

S = || 𝒂_{1} || || 𝒂_{2} || (\sin φ_{2} \cos φ_{1} - \sin φ_{1} \cos φ_{2}) = a_{11} a_{22} - a_{12} a_{21} .

By the formula $S$ can be either positive of negative, and indicates the order of denoting edge vectors since the parallelogram can be specified either as edges $[\begin{array}{ll} 𝒂_{1} & 𝒂_{2} \end{array}]$ or $[\begin{array}{ll} 𝒂_{2} & 𝒂_{1} \end{array}]$ .

Figure 1. Area of a parallelogram (hyperparallelipiped in two-dimensions) with edge vectors $𝒂_{1}, 𝒂_{2} \in ℝ^{2}$ .

Recall that the norm $|| 𝒗 ||$ of a vector $𝒗 \in ℝ^{m}$ is a functional giving the magnitude of a vector. The norm $|| 𝑨 ||$ of a matrix $𝑨 \in ℝ^{m \times n}$ is also a functional specifying the maximum amplification of vector norm through the linear mapping $𝒇 (𝒙) = 𝑨 𝒙$ , as encountered in the SVD singular values. The above calculations suggest yet another functional, the signed area of the geometric object with edge vectors $𝒂_{1}, \dots, 𝒂_{m}$ . This functional is a mapping from the vector space of $m \times m$ matrices $(ℝ^{m \times m}, ℝ, +, \cdot)$ to the reals and is known as the determinant of a matrix.

Definition. The determinant of a square matrix $𝑨 = [\begin{array}{lll} 𝒂_{1} & \dots & 𝒂_{m} \end{array}] \in ℝ^{m \times m}$

$| A | = \det (A) = | \begin{array}{cccc} a_{11} & a_{12} & \dots & a_{1 m} \\ a_{21} & a_{22} & \dots & a_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{m 1} & a_{m 2} & \dots & a_{m m} \end{array} | \in ℝ$

is a real number giving the (oriented) volume of the parallelipiped spanned by matrix column vectors.

The geometric interpretation of a determinant leads to algebraic computation rules. For example, consider the parallelogram with colinear edge vectors, $𝒂_{2} = α 𝒂_{1}$ . In this case the parallelogram area is zero. In general whenever one of the vectors within $𝒮 = {𝒂_{1}, \dots, 𝒂_{m}}$ is linearly dependent on the others the dimension of $span (𝒮)$ is less than $m$ , and the volume of the associated parallelipiped is zero. Deduce that

𝑨 \in ℝ^{m \times m}, rank (𝑨) < m \Rightarrow \det (𝑨) = 0 .

(1)

Swapping a pair of edges changes the orientation of the edge vectors and leads to a sign change of the determinant. Another swap changes the sign again. A sign $σ (P) = {(- 1)}^{s}$ , the parity of a permutation, is associated to any permutation $P$ where $s$ is the number of pair swaps needed to carry out the $P$ permutation. It results that applying a permutation matrix onto $𝑨$ changes the determinant sign

\det (𝑷 𝑨) = \det (𝑨 𝑷) = σ (P) \det (𝑨) .

(2)

Recall that $𝑷 𝑨$ permutes rows and $𝑨 𝑷$ permutes columns. The edges of a geometric object could be specified in either column format or row format with no change in the geometric properties leading to

\det (𝑨) = \det (𝑨^{T}) .

(3)

Scaling the length of an edge changes the volume in accordance with

\det [\begin{array}{llll} α 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}] = α \det [\begin{array}{llll} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}] .

(4)

With reference to Fig. (1), consider edge $𝒂_{1}$ decomposed into two parts along the same direction $𝒂_{1} = α 𝒂_{1} + (1 - α) 𝒂_{1} = 𝒃_{1} + 𝒄_{1}$ . The area of the parallelogram with edge $𝒂_{1}$ is the sum of those with edges $𝒃_{1}, 𝒄_{1}$ . This generalizes to arbitrary decompositions of the $𝒂_{1}$ vector, leading to the rule

\det [\begin{array}{llll} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}] = \det [\begin{array}{llll} 𝒃_{1} + 𝒄_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}] = \det [\begin{array}{llll} 𝒃_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}] + \det [\begin{array}{llll} 𝒄_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}] .

(5)

The above two rules (4,5) state that the determinant is a linear mapping in the first column. In conjunction with column permutation, deduce that the determinant is linear in any column or row of the matrix $𝑨$ . Consider now an important consequence, the result upon the determinant of adding a multiple of one column to another column

\det ([\begin{array}{llll} 𝒂_{1} + α 𝒂_{2} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}]) = \det ([\begin{array}{llll} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}]) + α \det ([\begin{array}{llll} 𝒂_{2} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}]) .

Since the second term in the above sum has two identical columns its value is zero and

\det ([\begin{array}{llll} 𝒂_{1} + α 𝒂_{2} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}]) = \det ([\begin{array}{llll} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}]),

stating that adding a multiple of a column to another does not change the value of a determinant. Similarly, adding a multiple of a row to another does not change the value of a determinant. The practical importance of the rules is that the elementary operations from Gaussian elimination do not change the value of a determinant.

The determinant of the identity matrix equals one, $\det (𝑰) = 1$ , and that of a matrix product is the product of determinants of the factors

\det (𝑨 𝑩) = \det (𝑨) \det (𝑩) .

The determinant of a diagonal matrix $𝑫 = diag (d_{1}, \dots, d_{m})$ is the product of its diagonal components

\det (𝑫) = d_{1} d_{2} \dots d_{m} .

The properties can be used in conjunction with known matrix factorizations.

Determinant of an orthogonal $𝑸$ matrix is $\pm 1$ . Since $𝑸^{T} 𝑸 = 𝑰$
$\det (𝑸^{T} 𝑸) = \det (𝑸^{T}) \det (𝑸) = {[\det (𝑸)]}^{2} = 1 \Rightarrow \det (𝑸) = \pm 1 .$
Determinant of a matrix equals the product of its singular values
$\det (𝑨) = \det (𝑼 𝚺 𝑽^{T}) = \det (𝑼) \det (𝚺) \det (𝑽^{T}) = \det (𝚺) = σ_{1} σ_{2} \dots σ_{m} .$
In particular if $rank (𝑨) < m$ , $\det (𝑨) = 0$ .

The determinant can be defined either in geometric terms as above or in algebraic terms. The ageneral algebraic definition is specified in terms of the components of the matrix $𝑨 = [a_{i j}]$ as

\det (𝑨) = \sum_{P} σ (P) a_{1 i_{1}} a_{2 i_{2}} \dots a_{m i_{m}},

with the sum carried out over all permutations of $m$ numbers, one of which is specified as

(\begin{array}{llll} 1 & 2 & \dots & m \\ i_{1} & i_{2} & \dots & i_{m} \end{array}) .

There are $m!$ such permuations, a number that grows rapidly with $m$ hence the algebraic definition holds little interest for practical computation, though it was the object of considerable historical interest through Cramer's rule to solve linear systems.

2.Eigenvalues and the characteristic polynomial

The eigenproblem is to find invariant directions of a linear mapping $𝒇 : ℝ^{m} \to ℝ^{m}$ , meaning those non-zero input vectors that lead to an output in the same direction perhaps scaled by factor $λ$

𝒇 (𝒙) = λ 𝒙, 𝑨 𝒙 = λ 𝒙, 𝒙 \neq 𝟎 .

The above can be restated as

(λ 𝑰 - 𝑨) 𝒙 = 𝟎 or (𝑨 - λ 𝑰) = 𝟎 .

The above is obviously satisfied for $𝒙 = 𝟎$ , but this case is excluded as a solution to the eigenproblem since the zero vector does not uniquely specify a direction in $ℝ^{m}$ . For there to be a non-zero solution to the eigenproblem the null space of $λ 𝑰 - 𝑨$ must be of dimension at least one. It results that the matrix $λ 𝑰 - 𝑨$ is not of full rank, often stated as $λ 𝑰 - 𝑨$ is singular, and at least one of the singular values of $λ 𝑰 - 𝑨$ must be zero. Through the determinant properties it results that

\det (λ 𝑰 - 𝑨) = 0 .

The algebraic definition of a determinant leads to

\det (λ 𝑰 - 𝑨) = p (λ) = λ^{m} + c_{m - 1} λ^{m - 1} + \dots + c_{1} λ + c_{0},

such that $\det (λ 𝑰 - 𝑨)$ is a polynomial of degree $m$ in $λ$ . It is known through the fundamental theorem of algebra that a polynomial of degree $m$ has $m$ roots $λ_{1}, λ_{2}, \dots, λ_{m} \in ℂ$ , with some perhaps repeated

p (λ) = (λ - λ_{1}) (λ - λ_{2}) \dots (λ - λ_{m}) .

Note that in general the roots can be complex even for polynomials with real coefficients as exemplified by $p (λ) = λ^{2} + 1$ . If $λ_{1}, λ_{2}, \dots, λ_{K}$ are the distinct roots, each repeated $m_{1}, m_{2}, \dots, m_{K}$ times

p (λ) = {(λ - λ_{1})}^{m_{1}} {(λ - λ_{2})}^{m_{2}} \dots {(λ - λ_{K})}^{m_{K}} .

The number of times a root is repeated is called the algebraic multiplicty of an eigenvalue.

For each eigenvalue

𝑨 - λ 𝑰

has a non-trivial null space called the eigenspace of

λ

In practice, it is customary to enforce

|| 𝒙 || = 1

. The dimension of the eigenspace is called the geometric multiplicity of eigenvalue

λ

The conclusion from the above is that

m

solutions of the eigenproblem exist. There are

m

eigenvalues, not necessarily distinct, and associated eigenvectors. As usual instead of the individual statements

𝑨 𝒙_{k} = λ_{k} 𝒙_{k}

it is more efficient to group eigenvalues and eigenvectors into matrices

3.Eigendecomposition

3.1.Simple cases

Insight into eigenproblem solution is gained by considering common geometric transformations.

projects a vector onto

C (𝒒)

. Vectors within

C (𝒒)

are unaffected

𝑷 𝒒 = (𝒒 𝒒^{T}) 𝒒 = 𝒒 (𝒒^{T} 𝒒) = 𝒒 \cdot 1 = 𝒒

hence

𝒒

is an eigenvector with associated eigenvalue

λ = 1

. The

𝑷

matrix is of rank one and by the FTLA,

\dim N (𝑨^{T}) = m - 1

. In the SVD of

𝑷 = 𝑼 𝚺 𝑽^{T}

It results that any vector

𝒛

within the left null space

𝒛 \in N (𝑷^{T})

is an eigenvector with associated eigenvalue

λ = 0

. In matrix form the eigenproblem solution is

and since

𝑼

is orthogonal

𝑼 𝑼^{T} = 𝑰

, multiplication no the left by

𝑼^{T}

is possible leading to

is the reflector across

𝒒

. Vectors colinear with

𝒒

do not change orientation

𝑹 𝒒 = 𝒒

and are therefore eigenvectors with associated eigenvalue

λ = 1

. Vectors

𝒖

orthogonal to

𝒒

change orientation along their direction and are hence eigenvectors with associated eigenvalue

λ = - 1

ℝ^{m}

there are

m - 1

different vectors orthogonal to

𝒒

and mutually orthonormal. The eigenproblem solution can again be stated in matrix form as

represents the isometric rotation of two-dimensional vectors. If

θ = 0

𝑹 = 𝑰

with eigenvalues

λ_{1} = λ_{2} = 1

, and eigenvector matrix

𝑿 = 𝑰

. For

θ = π

, the eigenvalues are

λ_{1} = λ_{2} = - 1

, again with eigenvector matrix

𝑿 = 𝑰

. If

\sin θ \neq 0

, the orientation of any non-zero

𝒙 \in ℝ^{2}

changes upon rotation by

θ

. The characteristic polynomial has complex roots

and the directions of invariant orientation have complex components (are outside the real plane

ℝ^{2}

)

3.2.Vectors and matrices with complex components

The above rotation case is an example of complex values arising in the solution of an eigenproblem for a matrix with real coefficients. Fortunately, the framework for working with vectors and matrices that have complex components is almost identical to working in the reals. The only significant difference is the realtionship between the two-norm and inner product. When

𝒖, 𝒗 \in ℝ^{m}

the inner product

𝒖^{T} 𝒗

corresponds to the dot product and the two-norm of a vector can be defined as

Consider a complex number

z = x + i y

, taken to represent a point at coordinates

(x, y)

in the

ℝ^{2}

plane. The magnitude or absolute value of

z

is defined as the distance from the origin to point

(x, y)

ℝ^{2}

Introducing the complex conjugate

\overline{z} = x - i y

(the reflection of

(x, y)

across the

x

-axis), the absolute value can also be stated as

Extending this idea to vectors of complex numbers, transposition is combined with taking the conjugate into an operation of taking the adjoint denoted by an

*

superscript

Everywhere a transposition appears when dealing with vectors in

ℝ^{m}

, it is replaced by taking the adjoint when working with vectors in

ℂ^{m}

. Most notably when

𝑼 \in ℝ^{m \times m}

it is said to be orthogonal if

Orthogonality of

𝒖, 𝒗 \in ℂ^{m}

is expressed as

𝒖^{*} 𝒗 = 0

. The FTLA is restated as

3.3.General solution of the eigenproblem

As exemplified above, solving an eigenproblem

𝑨 𝒙 = λ 𝒙, 𝑨 \in ℝ^{m \times m}

requires:

If the matrix

𝑿

has linearly independent columns (it is non-singular), it can then be inverted leading to the eigendecomposition

yet another factorization of the matrix

𝑨

. Such eigendecompositions are very useful when repeatedly applying a linear mapping since

𝒇 (𝒇 (𝒙)) = 𝒇 (𝑨 𝒙) = 𝑨^{2} 𝒙

. In general after

k

applications of

𝒇

the matrix

𝑨^{k}

arises. If an eigendecomposition is avaiable, then

reveals issues that might arise in more general situations. The characteristic polynomial

is already in row echelon form indicating that

rank (𝑨 - λ_{1} 𝑰) = 1

. The FTLA then states the

Note that the second column of

𝑿

is the same as the first, hence

𝑿

has linearly dependent columns and cannot be inverted. The matrix

𝑨

has an eigenproblem solution but does not have an eigendecomposition. An eigendecomposition is possible only when for each eigenvalue its algebraic multiplicty is equal to its geometric multiplicity.

An immediate question is to identify those matrices for which an eigendecomposition is possible and perhaps of a particularly simple form. A matrix

𝑨 \in ℂ^{m \times m}

is said to be normal if

A normal matrix has a unitary eigendecomposition. There exists

𝑸 \in ℂ^{m \times m}

unitary that satisfies

In the real case the above becomes

𝑨 𝑨^{T} = 𝑨^{T} 𝑨

, and there exists

𝑸 \in ℝ^{m \times m}

orthogonal that satisfies

Computational procedures to solve an eigenproblem become quite complicated for the general case and have been the object of extensive research. Fortunately the problem is well understood and solution procedures have been implemented in all major computational systems. In Julia for example:

∴	A=[1 0; 0 2]; eigvals(A)

$[\begin{array}{c} 1.0 \\ 2.0 \end{array}]$ (6)

$[\begin{array}{cc} 1.0 & 0.0 \\ 0.0 & 1.0 \end{array}]$ (7)

∴	A=[1 1; 0 1]; eigvals(A)

$[\begin{array}{c} 1.0 \\ 1.0 \end{array}]$ (8)

$[\begin{array}{cc} 1.0 & - 1.0 \\ 0.0 & 2.220446049250313 e - 16 \end{array}]$ (9)

4.Finding the SVD

An important application of eigendecompositions is actual computation of the SVD. The SVD theorem simply asserted existence of intrinsic orthogonal basis for the domain and codomain of a linear mapping in which the mapping behaved as a simple scaling. It did not provide a procedure to find those bases in general.

From the above, though an eigendecomposition for general

𝑨 \in ℝ^{m \times n}

may not exist, an orthogonal decomposition always exists for

𝑩 = 𝑨 𝑨^{T}

since

verifies that

𝑩

is normal. Consider now the SVD

𝑨 = 𝑼 𝚺 𝑽^{T}

to find that

stating that the left singular vectors

𝑼

are the eigenvectors of

𝑨 𝑨^{T}

. Likewise

𝑪 = 𝑨^{T} 𝑨

is a normal matrix and from

the right singular vectors are found as the eigenvectors of

𝑨^{T} 𝑨

. The singular values of

𝑨

are the square roots of the eigenvalues of either

𝑩

𝑪

The two eigenproblems above are solved whenever a computation of the SVD is invoked within Julia.