MATH661

Lecture 11: $L U$ Algorithm Variants

The basic procedures within $L U$ factorization can be adapted to account for special structure of the system matrix $𝑨$ or to obtain properties associated with $𝑨$ .

1.Determinants

$𝑨 \in ℝ^{m \times m}$ a square matrix, $\det (𝑨) \in ℝ$ is the oriented volume enclosed by the column vectors of $𝑨$ (a parallelipiped)
Geometric interpretation of determinants
Determinant calculation rules
Algebraic definition of a determinant

Definition. The determinant of a square matrix $𝑨 = (\begin{array}{lll} 𝒂_{1} & \dots & 𝒂_{m} \end{array}) \in ℝ^{m \times m}$ is a real number

$\det (A) = | \begin{array}{cccc} a_{11} & a_{12} & \dots & a_{1 m} \\ a_{21} & a_{22} & \dots & a_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{m 1} & a_{m 2} & \dots & a_{m m} \end{array} | \in ℝ$

giving the (oriented) volume of the parallelepiped spanned by matrix column vectors.

$m = 2$
$𝑨 = (\begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}),, \det (𝑨) = | \begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} |$
$m = 3$
$𝑨 = (\begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}),, \det (𝑨) = | \begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array} |$

Computation of a determinant with $m = 2$
$| \begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} | = a_{11} a_{22} - a_{12} a_{21}$
Computation of a determinant with $m = 3$
$\begin{array}{rcl} | \begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array} | & = & a_{11} a_{22} a_{33} + a_{21} a_{32} a_{13} + a_{31} a_{12} a_{23} \\ - a_{13} a_{22} a_{31} - a_{23} a_{32} a_{11} - a_{33} a_{12} a_{21} \end{array}$
Where do these determinant computation rules come from? Two viewpoints
- Geometric viewpoint: determinants express parallelepiped volumes
- Algebraic viewpoint: determinants are computed from all possible products that can be formed from choosing a factor from each row and each column

$m = 2$

Figure 1.
In two dimensions a “parallelepiped” becomes a parallelogram with area given as
$(Area) = (Length of Base) \times (Length of Height)$
Take $𝒂_{1}$ as the base, with length $b = || 𝒂_{1} ||$ . Vector $𝒂_{1}$ is at angle $φ_{1}$ to $x_{1}$ -axis, $𝒂_{2}$ is at angle $φ_{2}$ to $x_{2}$ -axis, and the angle between $𝒂_{1}$ , $𝒂_{2}$ is $θ = φ_{2} - φ_{1}$ . The height has length
$h = || 𝒂_{2} || \sin θ = || 𝒂_{2} || \sin . (. φ_{2} - φ_{1}) = || 𝒂_{2} || (\sin φ_{2} \cos φ_{1} - \sin φ_{1} \cos φ_{2})$
Use $\cos φ_{1} = a_{11} / || 𝒂_{1} ||$ , $\sin φ_{1} = a_{12} / || 𝒂_{1} ||$ , $\cos φ_{2} = a_{21} / || 𝒂_{2} ||$ , $\sin φ_{2} = a_{22} / || 𝒂_{2} ||$
$(Area) = || 𝒂_{1} || || 𝒂_{2} || (\sin φ_{2} \cos φ_{1} - \sin φ_{1} \cos φ_{2}) = a_{11} a_{22} - a_{12} a_{21}$

The geometric interpretation of a determinant as an oriented volume is useful in establishing rules for calculation with determinants:
- Determinant of matrix with repeated columns is zero (since two edges of the parallelepiped are identical). Example for $m = 3$
  $Δ = | \begin{array}{ccc} a & a & u \\ b & b & v \\ c & c & w \end{array} | = a b w + b c u + c a v - u b c - v c a - w a b = 0$
  This is more easily seen using the column notation
  $Δ = \det (\begin{array}{cccc} 𝒂_{1} & 𝒂_{1} & 𝒂_{3} & \dots \end{array}) = 0$
- Determinant of matrix with linearly dependent columns is zero (since one edge lies in the 'hyperplane' formed by all the others)

Separating sums in a column (similar for rows)
$\det (\begin{array}{cccc} 𝒂_{1} + 𝒃_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}) = \det (\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}) + \det (\begin{array}{cccc} 𝒃_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array})$
with $𝒂_{i}, 𝒃_{1} \in ℝ^{m}$
Scalar product in a column (similar for rows)
$\det (\begin{array}{cccc} α 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}) = α \det (\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array})$
with $α \in ℝ$
Linear combinations of columns (similar for rows)
$\det (\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{m} \end{array}) = \det (\begin{array}{cccc} 𝒂_{1} & α 𝒂_{1} + 𝒂_{2} & \dots & 𝒂_{m} \end{array})$
with $α \in ℝ$ .

A determinant of size $m$ can be expressed as a sum of determinants of size $m - 1$ by expansion along a row or column
$\begin{array}{rcl} | \begin{array}{ccccc} a_{11} & a_{12} & a_{13} & \dots & a_{1 m} \\ a_{21} & a_{22} & a_{23} & \dots & a_{2 m} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ a_{m 1} & a_{m 2} & a_{m 3} & \dots & a_{m m} \end{array} | & = & a_{11} | \begin{array}{cccc} a_{22} & a_{23} & \dots & a_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{m 2} & a_{m 3} & \dots & a_{m m} \end{array} | - \\ a_{12} | \begin{array}{cccc} a_{21} & a_{23} & \dots & a_{2 m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{m 1} & a_{m 3} & \dots & a_{m m} \end{array} | + \\ a_{13} | \begin{array}{ccccc} a_{21} & a_{22} & a_{24} & \dots & a_{2 m} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ a_{m 1} & a_{m 2} & a_{m 4} & \dots & a_{m m} \end{array} | - \\ \dots \\ + {(- 1)}^{m + 1} a_{1 m} | \begin{array}{cccc} a_{21} & a_{23} & \dots & a_{2, m - 1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{m 1} & a_{m 3} & \dots & a_{m, m - 1} \end{array} | \end{array}$

The formal definition of a determinant
$\det A = \sum_{σ \in Σ} ν (σ) a_{1 i_{1}} a_{2 i_{2}} \dots a_{m i_{m}}$
requires $m m!$ operations, a number that rapidly increases with $m$
A more economical determinant is to use row and column combinations to create zeros and then reduce the size of the determinant, an algorithm reminiscent of Gauss elimination for systems

Example:
$| \begin{array}{ccc} 1 & 2 & 3 \\ - 1 & 0 & 1 \\ - 2 & - 1 & 4 \end{array} | = | \begin{array}{ccc} 1 & 2 & 3 \\ 0 & 2 & 4 \\ 0 & 3 & 10 \end{array} | = | \begin{array}{cc} 2 & 4 \\ 3 & 10 \end{array} | = 20 - 12 = 8$
The first equality comes from linear combinations of rows, i.e. row 1 is added to row 2, and row 1 multiplied by 2 is added to row 3. These linear combinations maintain the value of the determinant. The second equality comes from expansion along the first column

1.1.Cross product

Consider $u, v \in ℝ^{3}$ . We've introduced the idea of a scalar product
$u \cdot v = u^{T} v = u_{1} v_{1} + u_{2} v_{2} + u_{3} v_{3}$
in which from two vectors one obtains a scalar
We've also introduced the idea of an exterior product
$u v^{T} = (\begin{array}{c} u_{1} \\ u_{2} \\ u_{3} \end{array}) (\begin{array}{ccc} v_{1} & v_{2} & v_{3} \end{array}) = (\begin{array}{ccc} u_{1} v_{1} & u_{1} v_{2} & u_{1} v_{3} \\ u_{2} v_{1} & u_{2} v_{2} & u_{2} v_{3} \\ u_{3} v_{1} & u_{3} v_{2} & u_{3} v_{3} \end{array})$
in which a matrix is obtained from two vectors
Another product of two vectors is also useful, the cross product, most conveniently expressed in determinant-like form
$u \times v = | \begin{array}{ccc} 𝒆_{1} & 𝒆_{2} & 𝒆_{3} \\ u_{1} & u_{2} & u_{3} \\ v_{1} & v_{2} & v_{3} \end{array} | = (u_{2} v_{3} - v_{2} u_{3}) 𝒆_{1} + (u_{3} v_{1} - v_{3} u_{1}) 𝒆_{2} + (u_{1} v_{2} - v_{1} u_{2}) 𝒆_{3}$

2.Structured Matrices

The special structure of a matrix can be exploited to obtain more efficient factorizations. Evaluation of the linear combination $𝑨 𝒙 = x_{1} 𝒂_{1} + \dots + x_{n} 𝒂_{n}$ requires $m n$ floating point operations (flops) for $𝑨 \in ℂ^{m \times n}$ . Evaluation of $p$ linear combinations $𝑨 𝑿$ , $𝑿 \in ℂ^{n \times p}$ requires $m n p$ flops. If it is possible to evaluate $𝑨 𝒙$ with fewer operations, the matrix is said to be structured. Examples include:

Banded matrices $𝑨 = [a_{i j}]$ , $a_{i j} = 0$ if $i - j > l$ or $j - i > u$ , with $l, u$ denoting the lower and upper bandwidths. If $l = u = 0$ the matrix is diagonal. If $l = u = b$ the matrix is said to have bandwidth $B = 2 b + 1$ , i.e., for $b = 1$ , the matrix is tridiagonal, and for $b = 2$ the matrix is pentadiagonal. Lower triangular matrices have $u = 0$ , while upper triangular matrices have $l = 0$ . The $𝑨 𝒙$ product requires $(l + u + 1) m$ flops.
Sparse matrices have $r$ non-zero elements per row or $c$ non-zero elements per column. The $𝑨 𝒙$ product requires $r m$ or $c n$ flops
Circulant matrices $𝑨 = [a_{i j}]$ are sqaure and have $a_{i j} = f (i - j)$ , a property that can be exploited to compute $𝑨 𝒙$ using $𝒪 (m \log m)$ operations
For square, rank-deficient matrices $𝑨 \in ℂ^{m \times m}$ , $rank (𝑨) = r$ , $𝑨 𝒙$ can be evaluated in $𝒪 (k m)$ flops
When $𝑨, 𝑿$ are symmetric (hence square), $𝑨 𝑿$ requires $𝒪 (m^{3} / 2)$ flops instead of $m^{3}$ .

3.Cholesky factorization of positive definite hermitian matrices

3.1.Symmetric matrices, hermitian matrices

Special structure of a matrix is typically associated with underlying symmetries of a particular phenomenon. For example, the law of action and reaction in dynamics (Newton's third law) leads to real symmetric matrices, $𝑨 \in ℝ^{m \times m}$ , $𝑨^{T} = 𝑨$ . Consider a system of $m$ point masses with nearest-neighbor interactions on the real line where the interaction force depends on relative position. Assume that the force exerted by particle $i + 1$ on particle $i$ is linear

f_{i + 1, i} = f (u_{i + 1} - u_{i}) = k (u_{i + 1} - u_{i}),

with $u_{i}$ denoting displacement from an equilibrium position. The law of action and reaction then states that

f_{i, i + 1} = - f_{i + 1, i} = k (u_{i} - u_{i + 1}) .

If the same force law holds at all positions, then

f_{i - 1, i} = k (u_{i - 1} - u_{i}) .

The force on particle $i$ is given by the sum of forces from neighboring particles $i - 1, i + 1$

f_{i} = f_{i - 1, i} + f_{i + 1, i} = k (u_{i - 1} - u_{i}) + k (u_{i + 1} - u_{i}) = k (u_{i + 1} - 2 u_{i} + u_{i - 1}) .

Introducing $𝒇, 𝒖 \in ℝ^{m}$ , and assuming $u_{0} = u_{m + 1} = 0$ , the above is stated as

𝒇 = 𝑲 𝒖,

with $𝑲 = k diag ([\begin{array}{lll} 1 & - 2 & 1 \end{array}])$ is a symmetric matrix, $𝑲 = 𝑲^{T}$ , a direct consequence of the law of action and reaction. The matrix $𝑲$ is in this case tridiagonal as a consequence of the assumption of nearest-neighbor interactions. Recall that matrices represent linear mappings, hence

𝑲 = [\begin{array}{llll} 𝒇 (𝒆_{1}) & 𝒇 (𝒆_{2}) & \dots & 𝒇 (𝒆_{m}) \end{array}],

with $𝒇 (𝒖)$ the force-displacement linear mapping, Fig. 2, obtaining the same symmetric, tri-diagonal matrix.

Figure 2. Image of $𝒆_{i}$ through mapping representing a linear force is $𝒇 (𝒆_{i}) = k {[\begin{array}{lllll} \dots & 1 & - 2 & 1 & \dots \end{array}]}^{T}$ .

This concept can be extended to complex matrices $𝑨 \in ℂ^{m \times m}$ through $𝑨^{*} = 𝑨$ , in which case $𝑨$ is said to be self-adjoint or hermitian. Again, this property is often associated with desired physical properties, such as the requirement of real observable quantitites in quantum mechanics. Diagonal elements of a hermitian matrix must be real, and for any $𝒙, 𝒚 \in ℂ^{m}$ , the computation

\overline{𝒙^{*} 𝑨 𝒚} = {(𝒙^{*} 𝑨 𝒚)}^{*} = 𝒚^{*} 𝑨^{*} 𝒙 = 𝒚^{*} 𝑨^{} 𝒙,

implies for $𝒙 = 𝒚$ that

\overline{𝒙^{*} 𝑨 𝒙} = 𝒙^{*} 𝑨 𝒙,

hence $𝒙^{*} 𝑨 𝒙$ is real.

3.2.Positive-definite matrices

The work (i.e., energy stored in the system) done by all the forces in the above point mass system is

𝒲 = \frac{1}{2} 𝒖^{T} 𝑲 𝒖,

and physical considerations state that $𝒲 ⩾ 0$ . This leads the following definitions.

Definition. A hermitian matrix $𝑨 \in ℂ^{m \times m}$ is positive definite if for any non-zero $𝒙 \in ℂ^{m},$ $𝒙^{*} 𝑨 𝒙 > 0$ .

Definition. A hermitian matrix $𝑨 \in ℂ^{m \times m}$ is positive semi-definite if for any non-zero $𝒙 \in ℂ^{m},$ $𝒙^{*} 𝑨 𝒙 ⩾ 0$ .

If $𝑨$ is hermitian positive definite, then so is $𝑿^{*} 𝑨 𝑿$ for any $𝑿 \in ℂ^{m \times n}$ . Choosing

𝑿 = [\begin{array}{lll} 𝒆_{1} & \dots & 𝒆_{n} \end{array}] \in ℂ^{m \times n}

gives $𝑨_{n} = 𝑿^{*} 𝑨 𝑿$ , the $n^{th}$ principal submatrix of $𝑨$ , itself a hermitian positive definite matrix. Choosing $𝑿 = 𝒆_{j}$ shows that the $j^{th}$ diagonal element of $𝑨$ is positive, $a_{j j} = 𝒆_{j}^{T} 𝑨 𝒆_{j} > 0$

3.3.Symmetric factorization of positive-definite hermitian matrices

The structure of a hermitian positive definite matrix $𝑨 \in ℂ^{m \times m}$ , can be preserved by modification of $L U$ -factorization. The resulting algorithm is known as Cholesky factorization, and its first stage is stated as

𝑨 = [\begin{array}{ll} a_{11} & 𝒘^{*} \\ 𝒘 & 𝑩 \end{array}] = [\begin{array}{ll} α & 𝟎 \\ 𝒘 / α & 𝑰 \end{array}] [\begin{array}{ll} 1 & 𝟎^{*} \\ 𝟎 & 𝑪 \end{array}] [\begin{array}{ll} α & 𝒘^{*} / α \\ 𝟎 & 𝑰 \end{array}] = [\begin{array}{ll} α & 𝟎 \\ 𝒘 / α & 𝑰 \end{array}] [\begin{array}{ll} α & 𝒘^{*} / α \\ 𝟎 & 𝑪 \end{array}] = [\begin{array}{ll} a_{11} & 𝒘^{*} \\ 𝒘 & 𝑪 + 𝒘 𝒘^{*} / a_{11} \end{array}],

whence $𝑪 = 𝑩 - 𝒘 𝒘^{*} / a_{11}$ . Repeating the stage-1 step

𝑨 = 𝑳_{1} 𝑨_{1} 𝑳_{1}^{*},

leads to

𝑨 = 𝑳_{1} 𝑳_{2} 𝑨_{2} 𝑳_{2}^{*} 𝑳_{1}^{*} = \dots = 𝑳 𝑳^{*}, 𝑳 = 𝑳_{1} 𝑳_{2} \dots 𝑳_{m} .

The resulting Cholesky algorithm is half as expensive as standard $L U$ -factorization.

Algorithm (Cholesky factorization, $A = L L^{*}$ )

$𝑳 = 𝑨$

for $i = 1 : m$

for $j = i + 1 : m$

$L [j : m, j] = L [j : m, j] - L [j : m, i] \overline{L} [j, i] / L [i, i]$

$L [i : m, i] = L [i : m, i] / \sqrt{L [i, i]}$

Lecture 11: L⁡U Algorithm Variants