Eigenproblems

Synopsis. The simple question of what directions remain unchanged by a linear mapping leads to widely useful concepts. In particular, constructive algorithms to obtain the orthonormal bases for the fundamental matrix subspaces predicted by the singular value decomposition are obtained. Several additional matrix factorizations arise and offer insight into linear mappings.

1.Determinants

Linear mappings of a vector space onto itself 𝒇:mm are characterized by a square matrix 𝑨m×m whose column vectors can be considered as the edges of a geometric object in m. When m=2 a parallelogram is obtained with area S=bh, with b the length of the base edge and h the height. The parallelogram edges are specified by the vectors 𝒂1,𝒂2 oriented at angles φ1,φ2 with respect to the x1-axis. With θ=φ2-φ1, the parallelogram height is

h=||𝒂2||sinθ=||𝒂2||sin.(.φ2-φ1)=||𝒂2||(sinφ2cosφ1-sinφ1cosφ2).

The trigonometric functions can be related to the edge vectors through

cosφ1=a11/||𝒂1||,sinφ1=a12/||𝒂1||,cosφ2=a21/||𝒂2||,sinφ2=a22/||𝒂2||.

The above allow statement of a formula for the parallelogram area strictly in terms of edge vector components

S=||𝒂1||||𝒂2||(sinφ2cosφ1-sinφ1cosφ2)=a11a22-a12a21.

By the formula S can be either positive of negative, and indicates the order of denoting edge vectors since the parallelogram can be specified either as edges [ 𝒂1 𝒂2 ] or [ 𝒂2 𝒂1 ].

Figure 1. Area of a parallelogram (hyperparallelipiped in two-dimensions) with edge vectors 𝒂1,𝒂22.

Recall that the norm ||𝒗|| of a vector 𝒗m is a functional giving the magnitude of a vector. The norm ||𝑨|| of a matrix 𝑨m×n is also a functional specifying the maximum amplification of vector norm through the linear mapping 𝒇(𝒙)=𝑨𝒙, as encountered in the SVD singular values. The above calculations suggest yet another functional, the signed area of the geometric object with edge vectors 𝒂1,,𝒂m. This functional is a mapping from the vector space of m×m matrices (m×m,,+,) to the reals and is known as the determinant of a matrix.

Definition. The determinant of a square matrix 𝑨=[ 𝒂1 𝒂m ]m×m

|A|=det(A)=| a11 a12 a1m a21 a22 a2m am1 am2 amm |

is a real number giving the (oriented) volume of the parallelipiped spanned by matrix column vectors.

The geometric interpretation of a determinant leads to algebraic computation rules. For example, consider the parallelogram with colinear edge vectors, 𝒂2=α𝒂1. In this case the parallelogram area is zero. In general whenever one of the vectors within 𝒮={𝒂1,,𝒂m} is linearly dependent on the others the dimension of span(𝒮) is less than m, and the volume of the associated parallelipiped is zero. Deduce that

𝑨m×m,rank(𝑨)<mdet(𝑨)=0. (1)

Swapping a pair of edges changes the orientation of the edge vectors and leads to a sign change of the determinant. Another swap changes the sign again. A sign σ(P)=(-1)s, the parity of a permutation, is associated to any permutation P where s is the number of pair swaps needed to carry out the P permutation. It results that applying a permutation matrix onto 𝑨 changes the determinant sign

det(𝑷𝑨)=det(𝑨𝑷)=σ(P)det(𝑨). (2)

Recall that 𝑷𝑨 permutes rows and 𝑨𝑷 permutes columns. The edges of a geometric object could be specified in either column format or row format with no change in the geometric properties leading to

det(𝑨)=det(𝑨T). (3)

Scaling the length of an edge changes the volume in accordance with

det[ α𝒂1 𝒂2 𝒂m ]=αdet[ 𝒂1 𝒂2 𝒂m ]. (4)

With reference to Fig. (1), consider edge 𝒂1 decomposed into two parts along the same direction 𝒂1=α𝒂1+(1-α)𝒂1=𝒃1+𝒄1. The area of the parallelogram with edge 𝒂1 is the sum of those with edges 𝒃1,𝒄1. This generalizes to arbitrary decompositions of the 𝒂1 vector, leading to the rule

det[ 𝒂1 𝒂2 𝒂m ]=det[ 𝒃1+𝒄1 𝒂2 𝒂m ]=det[ 𝒃1 𝒂2 𝒂m ]+det[ 𝒄1 𝒂2 𝒂m ]. (5)

The above two rules (4,5) state that the determinant is a linear mapping in the first column. In conjunction with column permutation, deduce that the determinant is linear in any column or row of the matrix 𝑨. Consider now an important consequence, the result upon the determinant of adding a multiple of one column to another column

det([ 𝒂1+α𝒂2 𝒂2 𝒂m ])=det([ 𝒂1 𝒂2 𝒂m ])+αdet([ 𝒂2 𝒂2 𝒂m ]).

Since the second term in the above sum has two identical columns its value is zero and

det([ 𝒂1+α𝒂2 𝒂2 𝒂m ])=det([ 𝒂1 𝒂2 𝒂m ]),

stating that adding a multiple of a column to another does not change the value of a determinant. Similarly, adding a multiple of a row to another does not change the value of a determinant. The practical importance of the rules is that the elementary operations from Gaussian elimination do not change the value of a determinant.

The determinant of the identity matrix equals one, det(𝑰)=1, and that of a matrix product is the product of determinants of the factors

det(𝑨𝑩)=det(𝑨)det(𝑩).

The determinant of a diagonal matrix 𝑫=diag(d1,,dm) is the product of its diagonal components

det(𝑫)=d1d2dm.

The properties can be used in conjunction with known matrix factorizations.

  1. Determinant of an orthogonal 𝑸 matrix is ±1. Since 𝑸T𝑸=𝑰

    det(𝑸T𝑸)=det(𝑸T)det(𝑸)=[det(𝑸)]2=1det(𝑸)=±1.
  2. Determinant of a matrix equals the product of its singular values

    det(𝑨)=det(𝑼𝚺𝑽T)=det(𝑼)det(𝚺)det(𝑽T)=det(𝚺)=σ1σ2σm.

    In particular if rank(𝑨)<m, det(𝑨)=0.

The determinant can be defined either in geometric terms as above or in algebraic terms. The ageneral algebraic definition is specified in terms of the components of the matrix 𝑨=[aij] as

det(𝑨)=Pσ(P)a1i1a2i2amim,

with the sum carried out over all permutations of m numbers, one of which is specified as

( 1 2 m i1 i2 im ).

There are m! such permuations, a number that grows rapidly with m hence the algebraic definition holds little interest for practical computation, though it was the object of considerable historical interest through Cramer's rule to solve linear systems.

2.Eigenvalues and the characteristic polynomial

The eigenproblem is to find invariant directions of a linear mapping 𝒇:mm, meaning those non-zero input vectors that lead to an output in the same direction perhaps scaled by factor λ

𝒇(𝒙)=λ𝒙,𝑨𝒙=λ𝒙,𝒙𝟎.

The above can be restated as

(λ𝑰-𝑨)𝒙=𝟎  or  (𝑨-λ𝑰)=𝟎.

The above is obviously satisfied for 𝒙=𝟎, but this case is excluded as a solution to the eigenproblem since the zero vector does not uniquely specify a direction in m. For there to be a non-zero solution to the eigenproblem the null space of λ𝑰-𝑨 must be of dimension at least one. It results that the matrix λ𝑰-𝑨 is not of full rank, often stated as λ𝑰-𝑨 is singular, and at least one of the singular values of λ𝑰-𝑨 must be zero. Through the determinant properties it results that

det(λ𝑰-𝑨)=0.

The algebraic definition of a determinant leads to

det(λ𝑰-𝑨)=p(λ)=λm+cm-1λm-1++c1λ+c0,

such that det(λ𝑰-𝑨) is a polynomial of degree m in λ. It is known through the fundamental theorem of algebra that a polynomial of degree m has m roots λ1,λ2,,λm, with some perhaps repeated

p(λ)=(λ-λ1)(λ-λ2)(λ-λm).

Note that in general the roots can be complex even for polynomials with real coefficients as exemplified by p(λ)=λ2+1. If λ1,λ2,,λK are the distinct roots, each repeated m1,m2,,mK times

p(λ)=(λ-λ1)m1(λ-λ2)m2(λ-λK)mK.

The number of times a root is repeated is called the algebraic multiplicty of an eigenvalue.

For each eigenvalue 𝑨-λ𝑰 has a non-trivial null space called the eigenspace of λ

Ek=N(𝑨-λ𝑰).

The eigenvector associated with an eigenvalue is a member of this eigenspace

𝑨𝒙=λ𝒙,𝒙N(𝑨-λ𝑰).

Note that eigenvectors can be scaled by any non-zero number since

𝑨𝒙=λ𝒙𝑨(α𝒙)=λ(α𝒙).

In practice, it is customary to enforce ||𝒙||=1. The dimension of the eigenspace is called the geometric multiplicity of eigenvalue λ

nk=dim(N(𝑨-λk𝑰))

The conclusion from the above is that m solutions of the eigenproblem exist. There are m eigenvalues, not necessarily distinct, and associated eigenvectors. As usual instead of the individual statements 𝑨𝒙k=λk𝒙k it is more efficient to group eigenvalues and eigenvectors into matrices

𝑿=[ 𝒙1 𝒙2 𝒙m ],𝚲=diag(λ1,λ2,,λm)

and state the eigenproblem in matrix terms as

𝑨𝑿=𝑿𝚲.

3.Eigendecomposition

3.1.Simple cases

Insight into eigenproblem solution is gained by considering common geometric transformations.

Projection matrix. The matrix

𝑷=𝒒𝒒Tm×m,||𝒒||=1,

projects a vector onto C(𝒒). Vectors within C(𝒒) are unaffected 𝑷𝒒=(𝒒𝒒T)𝒒=𝒒(𝒒T𝒒)=𝒒1=𝒒 hence 𝒒 is an eigenvector with associated eigenvalue λ=1. The 𝑷 matrix is of rank one and by the FTLA, dimN(𝑨T)=m-1. In the SVD of 𝑷=𝑼𝚺𝑽T,

𝑼=[ 𝒒 𝒖2 𝒖m ],𝑷𝒖j=𝟎=0𝒖j,j=2,3,,m.

It results that any vector 𝒛 within the left null space 𝒛N(𝑷T) is an eigenvector with associated eigenvalue λ=0. In matrix form the eigenproblem solution is

𝑷𝑼=𝑼diag(1,0,0,,0)=𝑼𝚲

and since 𝑼 is orthogonal 𝑼𝑼T=𝑰, multiplication no the left by 𝑼T is possible leading to

𝑷=𝑼𝚲𝑼T.

Reflection matrix. The matrix

𝑹=2𝒒𝒒T-𝑰m×m,||𝒒||=1,

is the reflector across 𝒒. Vectors colinear with 𝒒 do not change orientation 𝑹𝒒=𝒒 and are therefore eigenvectors with associated eigenvalue λ=1. Vectors 𝒖 orthogonal to 𝒒 change orientation along their direction and are hence eigenvectors with associated eigenvalue λ=-1

𝑹𝒖=(2𝒒𝒒T-𝑰)𝒖=2𝒒(𝒒T𝒖)-𝒖=-𝒖.

In m there are m-1 different vectors orthogonal to 𝒒 and mutually orthonormal. The eigenproblem solution can again be stated in matrix form as

𝑹𝑼=𝑼𝚲,𝑼=𝑼=[ 𝒒 𝒖2 𝒖m ],𝚲=diag(1,-1,-1,,-1).

Since 𝑼 is an orthogonal matrix, it results that

𝑹=𝑼𝚲𝑼.

Rotation matrix. The matrix

𝑹(θ)=[ cosθ -sinθ sinθ cosθ ],

represents the isometric rotation of two-dimensional vectors. If θ=0, 𝑹=𝑰 with eigenvalues λ1=λ2=1, and eigenvector matrix 𝑿=𝑰. For θ=π, the eigenvalues are λ1=λ2=-1, again with eigenvector matrix 𝑿=𝑰. If sinθ0, the orientation of any non-zero 𝒙2 changes upon rotation by θ. The characteristic polynomial has complex roots

p(λ)=(λ-cosθ)2+sin2θλ1,2=cosθ±isinθ=e±iθ

and the directions of invariant orientation have complex components (are outside the real plane 2)

𝑿=[ 1 -1 i i ],𝑹𝑿=[ e-iθ -eiθ ie-iθ ieiθ ]=[ 1 -1 i i ][ e-iθ 0 0 eiθ ].

3.2.Vectors and matrices with complex components

The above rotation case is an example of complex values arising in the solution of an eigenproblem for a matrix with real coefficients. Fortunately, the framework for working with vectors and matrices that have complex components is almost identical to working in the reals. The only significant difference is the realtionship between the two-norm and inner product. When 𝒖,𝒗m the inner product 𝒖T𝒗 corresponds to the dot product and the two-norm of a vector can be defined as

||𝒖||2=(𝒖T𝒖)1/2=(u12++um2)1/2.

Consider a complex number z=x+iy, taken to represent a point at coordinates (x,y)in the 2 plane. The magnitude or absolute value of z is defined as the distance from the origin to point (x,y) in 2

|z|=(x2+y2)1/2.

Introducing the complex conjugate z=x-iy (the reflection of (x,y) across the x-axis), the absolute value can also be stated as

|z|=(zz)1/2.

Extending this idea to vectors of complex numbers, transposition is combined with taking the conjugate into an operation of taking the adjoint denoted by an superscript

𝒖=(𝒖)T[ u1 u2 um ]=[ u1 u2 um ].

Everywhere a transposition appears when dealing with vectors in m, it is replaced by taking the adjoint when working with vectors in m. Most notably when 𝑼m×m it is said to be orthogonal if

𝑼𝑼T=𝑼T𝑼=𝑰.

For 𝑼m×m the matrix is said to be unitary if

𝑼𝑼=𝑼𝑼=𝑰.

Orthogonality of 𝒖,𝒗m is expressed as 𝒖𝒗=0. The FTLA is restated as

C(𝑨)N(𝑨)=m,C(𝑨)N(𝑨)=n,C(𝑨)N(𝑨),C(𝑨)N(𝑨),

and the SVD as

𝑨m×n,𝑼m×m,𝑽n×n,𝚺+m×n:𝑨=𝑼𝚺𝑽.

3.3.General solution of the eigenproblem

As exemplified above, solving an eigenproblem 𝑨𝒙=λ𝒙,𝑨m×m requires:

  1. Finding the roots λ1,,λmof the characteristic polynomial p(λ)=det(λ𝑰-𝑨);

  2. Finding bases for the eigenspaces Ek=N(𝑨-λk𝑰).

The matrix form of the solution is then stated as

𝑨𝑿=𝑿𝚲,𝑿=[ 𝒙1 𝒙2 𝒙m ],𝚲=diag(λ1,λ2,,λm).

If the matrix 𝑿 has linearly independent columns (it is non-singular), it can then be inverted leading to the eigendecomposition

𝑨=𝑿𝚲𝑿-1,

yet another factorization of the matrix 𝑨. Such eigendecompositions are very useful when repeatedly applying a linear mapping since 𝒇(𝒇(𝒙))=𝒇(𝑨𝒙)=𝑨2𝒙. In general after k applications of 𝒇 the matrix 𝑨k arises. If an eigendecomposition is avaiable, then

𝑨k=(𝑿𝚲𝑿-1)(𝑿𝚲𝑿-1)=𝑿𝚲k𝑿-1,

and 𝚲k is easily computable

𝚲k=diag(λ1k,,λmk).

The simple eigenproblem for

𝑨=[ 1 1 0 1 ],

reveals issues that might arise in more general situations. The characteristic polynomial

det(λ𝑰-𝑨)=| λ-1 1 0 λ-1 |=(λ-1)2,

has a repeated root λ1=λ2=1. Note that

𝑨-λ1𝑰=[ 0 1 0 0 ]

is already in row echelon form indicating that rank(𝑨-λ1𝑰)=1. The FTLA then states the

dim(N(𝑨-λ1𝑰))=1,

and the matrix form of the eigenproblem solution is

𝑨𝑿=𝑿𝚲,𝑿=[ 𝒙1 𝒙1 ],𝚲=diag(1,1).

Note that the second column of 𝑿 is the same as the first, hence 𝑿 has linearly dependent columns and cannot be inverted. The matrix 𝑨 has an eigenproblem solution but does not have an eigendecomposition. An eigendecomposition is possible only when for each eigenvalue its algebraic multiplicty is equal to its geometric multiplicity.

An immediate question is to identify those matrices for which an eigendecomposition is possible and perhaps of a particularly simple form. A matrix 𝑨m×m is said to be normal if

𝑨𝑨=𝑨𝑨.

A normal matrix has a unitary eigendecomposition. There exists 𝑸m×m unitary that satisfies

𝑸𝑨=𝑸𝚲𝑨=𝑸𝚲𝑸,𝚲=diag(λ1,,λm),𝑸𝑸=𝑸𝑸=𝑰.

In the real case the above becomes 𝑨𝑨T=𝑨T𝑨, and there exists 𝑸m×m orthogonal that satisfies

𝑸𝑨=𝑸𝚲𝑨=𝑸𝚲𝑸T,𝚲=diag(λ1,,λm),𝑸𝑸T=𝑸T𝑸=𝑰.

Computational procedures to solve an eigenproblem become quite complicated for the general case and have been the object of extensive research. Fortunately the problem is well understood and solution procedures have been implemented in all major computational systems. In Julia for example:

A=[1 0; 0 2]; eigvals(A)

[ 1.0 2.0 ] (6)

eigvecs(A)

[ 1.0 0.0 0.0 1.0 ] (7)

A=[1 1; 0 1]; eigvals(A)

[ 1.0 1.0 ] (8)

eigvecs(A)

[ 1.0 -1.0 0.0 2.220446049250313e-16 ] (9)

4.Finding the SVD

An important application of eigendecompositions is actual computation of the SVD. The SVD theorem simply asserted existence of intrinsic orthogonal basis for the domain and codomain of a linear mapping in which the mapping behaved as a simple scaling. It did not provide a procedure to find those bases in general.

From the above, though an eigendecomposition for general 𝑨m×n may not exist, an orthogonal decomposition always exists for 𝑩=𝑨𝑨T since

(𝑨𝑨T)T=(𝑨T)T𝑨T=𝑨𝑨T,

verifies that 𝑩 is normal. Consider now the SVD 𝑨=𝑼𝚺𝑽T to find that

𝑩=𝑨𝑨T=𝑼𝚺𝑽T𝑽𝚺T𝑼T=𝑼𝚺𝚺T𝑼T𝑩𝑼=𝑼𝚲,𝚲=𝚺𝚺T,

stating that the left singular vectors 𝑼 are the eigenvectors of 𝑨𝑨T. Likewise 𝑪=𝑨T𝑨 is a normal matrix and from

𝑪=𝑽𝚺T𝑼T𝑼𝚺𝑽T=𝑽𝚺T𝚺𝑽T𝑪𝑽=𝑽𝚪,𝚪=𝚺T𝚺,

the right singular vectors are found as the eigenvectors of 𝑨T𝑨. The singular values of 𝑨 are the square roots of the eigenvalues of either 𝑩 or 𝑪.

The two eigenproblems above are solved whenever a computation of the SVD is invoked within Julia.