MATH661

Lecture 30: Irregular Sparsity

1.Finite element discretization

For the steady-state heat equation $- \nabla \cdot (α \nabla u) = f$ with spatially-varying diffusivity, symmetric discretizations on uniform grids lead to systems $𝑨 𝒖 = 𝒄$ with $𝑨 = 𝑨^{T}$ , and a regular sparsity pattern. Irregular domain discretization will lead to more complicated sparsity patterns that require different approaches to solving the linear system. It is important to link the changes in the structure of $𝑨$ to specific aspects of the approximation procedure. Consider the difficulties of applying finite difference discretization on a domain $Ω$ of arbitrary shape with boundary $Γ = \partial Ω$ (Fig. 1). At grid node $(i, j)$ closer to the boundary than the uniform spacing $h$ , centered finite difference formulas would refer to undefined values outside the domain. One-sided finite difference formulas would fail to take into account boundary values for the problem. Taylor series expansions could be used,

u_{A} = u (ξ h, j h) = u_{i, j} + {(\frac{\partial u}{\partial x})}_{i, j} (ξ h) + \frac{1}{2} {(\frac{\partial^{2} u}{\partial x^{2}})}_{i, j} {(ξ h)}^{2} + \dots

u_{_{i + 1, j}} = u_{i, j} + {(\frac{\partial u}{\partial x})}_{i, j} h + \frac{1}{2} {(\frac{\partial^{2} u}{\partial x^{2}})}_{i, j} h^{2} + \dots

from which elimination of the second derivative leads to an approximation of the first derivative as

{(\frac{\partial u}{\partial x})}_{i, j} = \frac{u_{A} - ξ^{2} u_{i + 1, j} - (1 - ξ^{2}) u_{i, j}}{ξ h (1 - ξ)} .

(1)

Note that setting $ξ = - 1$ would place $A$ at a grid node, $u_{A} = u_{i - 1, j}$ and from (1) the familiar centered finite difference approximation of the first derivative

{(\frac{\partial u}{\partial x})}_{i, j} = \frac{u_{i + 1, j} - u_{i - 1, j}}{2 h},

is recovered. For an arbitrary domain the values of $ξ, η$ would vary and the resulting linear system $𝑨 𝒖 = 𝒄$ would no longer be symmetric. From a physical perspective this might be surprising at first since the operator $ℒ = -$ $\nabla \cdot (α \nabla)$ is isotropic, but this is true for an inifinitesimal domain. Upon irregular discretization the problem $𝑨 𝒖 = 𝒄$ is only an approximation of the physical problem $ℒ u = f$ , and can exhibit different behavior, in this case loss of isotropy near the boundaries.

Figure 1. Left: Modified finite difference stencil near a boundary not aligned with the grid. Boundary points $A, B$ are distances $ξ h$ , $η h$ from the nearest interior node, with $ξ, η \in (- 1, 1) .$ Right: Triangles covering the domain.

Computing the appropriate mesh size fractions $(ξ h, η h)$ for all grid points near a boundary is an onerous task, and suggests seeking a different approach. A frutiful idea is to separate the problem of geometric description from that of physics expressed by some operator $ℒ$ . Domains within $ℝ^{d}$ of arbitrary complexity can be approximated to any desired precision by a simplicial covering. Simplicia are the simplest geometric objects with non-zero measure $μ$ in a space. For $d = 1$ these are line segments that can approximate arbitrary curves. The corresponding simplicia for $d = 2$ and $d = 3$ are triangles and tetrahedra, respectively. Consider $d = 2$ and specify a set of triangles ${T_{k} | k = 1, 2, \dots, n .}$ with vertices $V_{j}$ , $j = 1, \dots, m$ , that form a partition of precision $ε ⩾ 0$ of the domain $Ω$ ,

\forall k, l \in {1, 2, \dots, n}, μ (T_{k} \cap T_{l}) = 0, | μ (⋃_{k = 1}^{n} T_{k}) - μ (Ω) | ⩽ ε .

The above state that intersections of triangles must have zero measure in $d = 2$ , i.e., triangles can share edges or vertices but cannot overlap over a non-zero area. The area of the union of triangles approximates the area of the overall domain $Ω$ .

$\circ$

Figure 2. Left: Triangulation of a domain with a hole. Right: Triangle form function

Load Julia modules and define domain boundaries

∴	using DelaunayTriangulation, CairoMakie

∴	outer_boundary = [[(0,0),(5,0),(5,3),(1,3),(1,4),(5,4),(5,5),(0,5),(0,0)]];

∴	inner_boundary = [[(2,1),(3,2),(4,1),(2,1)]];

∴	boundary_nodes, points = convert_boundary_points_to_indices([outer_boundary, inner_boundary]);

∴	edges = Set(((2, 8), (5, 7)));

Generate the triangularization

∴	tri = triangulate(points; boundary_nodes, edges);

∴	A = get_total_area(tri);

∴	refine!(tri; max_area=1e-3A, min_angle=31.5);

Draw the triangularization

∴	CairoMakie.activate!();

∴	`fig = CairoMakie.Figure()`;

∴	ax = Axis(fig[1,1],aspect=1);

∴	triplot!(ax, tri);

∴	cd(homedir()*"/courses/MATH661/images");

∴	save("L30triangulation.png",fig);

∴

In a finite difference discretization the function $u : ℝ \to Ω$ is approximated by a set of values ${u_{i, j}}$ , often referred to as a grid function. Similarly, a set of values $u_{j} ≅ u (x_{j}, y_{j})$ can be defined at the triangle vertices $V_{j} (x_{j}, y_{j})$ . Denote the vertex coordinates of triangle $T$ by $(x_{j}, y_{j})$ , $j = 1, 2, 3$ . Values of $u (x, y)$ within the triangle $T$ are determined through piecewise interpolation, a generalization of one-dimensional $B$ -splines, using the form functions

N_{1} (x, y) = \frac{1}{2 A} | \begin{array}{lll} 1 & 1 & 1 \\ x & x_{2} & x_{3} \\ y & y_{2} & y_{3} \end{array} |, N_{2} (x, y) = \frac{1}{2 A} | \begin{array}{lll} 1 & 1 & 1 \\ x_{1} & x & x_{3} \\ y_{1} & y & y_{3} \end{array} |, N_{3} (x, y) = \frac{1}{2 A} | \begin{array}{lll} 1 & 1 & 1 \\ x_{1} & x_{2} & x \\ y_{1} & y_{2} & y \end{array} |,

with $A$ the triangle area

A = \frac{1}{2} | \begin{array}{lll} 1 & 1 & 1 \\ x_{1} & x_{2} & x_{3} \\ y_{1} & y_{2} & y_{3} \end{array} | .

Note that for $(x, y) \in T$ the form functions give the fraction of the overall area occupied by the interior triangles such that $N_{j} (x, y) \in [0, 1]$ . The linear spline interpolation $p_{1}$ of $u$ based upon the vertex values $u_{1}, u_{2}, u_{3}$ is

u (x, y) ≅ p_{1} (x, y) = \sum_{j = 1}^{3} u_{j} N_{j} (x, y),

(2)

the familiar form of a linear combination. It is customary to set $N_{j} (x, y) = 0$ if $(x, y) \notin T$ , recovering the framework of $B$ -splines. Since $u (x, y)$ thus approximated is non-zero only over the single triangle $T$ , such an approach is commonly referred to as a finite element method (FEM).

Various approaches can be applied to derive an algebraic system for the vertex values from the conservation law of interest. Consider the operator $ℒ = -$ $\nabla \cdot (α \nabla)$ and the static equilibrium equation $ℒ u = f$ in $Ω$ with Dirichlet boundary conditions $u = g$ on $Γ = \partial Ω$ . When $u$ denotes temperature, this is a statement of thermal equilibrium where heat fluxes $q = - α \nabla u$ balance out external heating $f$ and imposed temperature values on the boundary. One commonly used approach closely resembles the least squares approximation of $𝒃 \in ℝ^{n}$ ,

{min}_{𝒙 \in ℝ^{n}} || 𝒃 - 𝑨 𝒙 || .

The approximant $\tilde{𝒃}$ of $𝒃$ in this case is its projection onto $C (𝑨)$ , $\tilde{𝒃} = 𝑸 𝑸^{T} 𝒃$ , with $𝑨 = 𝑸 𝑹$ the (incomplete) $Q R$ decomposition of $𝑨$ . The error of this approximation is $𝒆 = \tilde{𝒃} - 𝒃 \in N (𝑨^{T})$ is orthogonal to $C (𝑨)$

𝑸^{T} 𝒆 = 𝑸^{T} (𝑸 𝑸^{T} 𝒃 - 𝒃) = 𝟎 .

(3)

The generalization of (3) in which the finite-dimensional vector $𝒃 \in ℝ^{n}$ is replaced by the function $u \in C^{(2)} (Ω)$ that satisfies $ℒ u = f$ is

(N_{i}, ℒ \sum_{j = 1}^{3} u_{j} N_{j} (x, y) - f) = 0, i = 1, 2, 3,

(4)

for each triangle $T_{k}$ with $(u, v)$ denoting the scalar product

(u, v) = \int_{Ω} u (x, y) v (x, y) d ω .

The analogy can be understood by recognizing that finite element approximants lie within the span of the form functions ${N_{i}^{k}}$ for all triangles $T_{k}$ and their vertices $j = 1, 2, 3$ . This known as a Galerkin method with (4) expressing orthogonality of the error $e = ℒ \tilde{u} - f$ and all form functions ${N_{i}^{k}}$ , leading to

(N_{i}, ℒ \sum_{j = 1}^{3} u_{j} N_{j} (x, y) - f) = 0 \Rightarrow \sum_{j = 1}^{3} (\int_{T} N_{i} (x, y) ℒ N_{j} (x, y) d ω) u_{j} = \int_{T} N_{i} (x, y) f (x, y) d ω .

The null result of applying the second-order differential operator $ℒ = -$ $\nabla \cdot (α \nabla)$ onto a linear form function $N_{j}$ is avoided through integration by parts (divergence theorem

\int_{T_{k}} N_{i} (x, y) ℒ N_{j} (x, y) d ω = - \int_{T_{k}} N_{i} (x, y) [\nabla \cdot (α \nabla) N_{j} (x, y)] d ω = \int_{T_{k}} α [\nabla N_{i} (x, y)] \cdot [\nabla N_{j} (x, y)] d ω = a_{i j}^{(k)} .

Assembling contributions from all triangles $T_{k}$ results in a linear system $𝑨 𝒖 = 𝒄$ , expressing an approximation of the steady-state heat equation $ℒ u = f$ .

It is illuminating to note that though the physical process itself is isotropic, the FEM approximation typically leads to a non-symmetric system matrix $𝑨$ due to the different sizes of the triangularization elements. The fact that the approximation depends on the domain discretization is not surprising; this also occurred for finite difference approximations as evidenced by the eigenvalue dependency on grid spacing $h$ , e.g., $v_{l} = 4 \sin^{2} (l π h / 2)$ . The particularity of FEM discretization is that the single parameter $h$ has been replaced by the individual geometry of all triangles within the domain partition. It is to be expected that the resulting matrices will exhibit condition numbers that are monotonic with respect to ${max}_{k} μ (T_{k}) / {min}_{k} μ (T_{k})$ , the ratio of the area of the largest triangle area to the smallest. This is readily understood: when ${min}_{k} μ (T_{k}) \to 0$ the spanning set ${N_{i}^{k}}$ becomes linearly dependent since one of its members approaches the zero element. The same effect is obtained if the aspect ratio of a triangle becomes large (i.e., one of its angles is close to zero), since again the spanning set is close to linearly dependent. A finite element system matrix $𝑨$ will still exhibit sparsity since the form functions are non-zero on only one triangle. The sparsity pattern is however determined by the connectivity, i.e., the number of triangles at each shared vertex. A typical sparsity matrix is shown in Fig. 3. If the physical principle of action and reaction (Newton's third law) is respected by discretization the matrix will still be symmetric, a considerable advantage with respect to the use of Taylor series to extend finite difference methods to arbitrary domains.

Figure 3. Non-zero elements with $𝑨 \in ℝ^{m \times m}$ , $m = 3948$ of a matrix from the Boeing-Harwell collection.

2.Krylov methods, Arnoldi iteration

From the above general observations it becomes apparent that solution techniques considered up to now are inadequate. Factorization methods such as $L U$ or $Q R$ would lead to fill-in and loss of sparsity. Additive splitting is no longer trivially implemented since connectivity has be accounted for other than by simple loops. The already slow convergence rate of methods based upon additive splitting is likely to degrade further or perhaps diverge due to the influence the spatial discretization has upon eigenvalues of the iteration matrix $𝑴 = 𝑰 - 𝑩 𝑨$ . Similar considerations apply to gradient descent.

An alternative approach is to seek a suitable basis $ℬ = {𝒒_{1}, 𝒒_{2}, \dots, 𝒒_{m}}$ in which to iteratively construct improved approximations $𝒖_{k}$ of the solution $𝒖$ of the discretized system $𝑨 𝒖 = 𝒄$ ,

𝒖 ≅ 𝒖_{k} = 𝑸_{n} 𝒙, 𝑸 = [\begin{array}{llll} 𝒒_{1} & 𝒒_{2} & \dots & 𝒒_{n} \end{array}] \in ℝ^{m \times n} .

Vectors within the basis set should be economical to compute and also lead to fast convergence in the sense that the coefficient vector $𝒙$ should have components that rapidly decrease in absolute value. One idea is to recognize that for a sparse system matrix $𝑨$ with an average of $p ≪ m$ nonzero elements per row the cost to evaluate the matrix-vector product $𝑨 𝒖$ is only $𝒪 (m p)$ as opposed to $𝒪 (m^{2})$ for a full system with $p = m$ . This suggests considering a vector set

{𝒃, 𝑨 𝒃, 𝑨^{2} 𝒃, \dots},

starting from some arbitrary vector $𝒃$ . The resulting sequence of vectors has been encountered already in the power iteration method for computing eigenvalues and eigenvectors of $𝑨$ , and for large $n$ , $𝑨^{n} 𝒃$ will tend to belong to the null space associated with the largest eigenvalue, leading to the ill-conditioned matrices

𝑽_{n} = [\begin{array}{llll} 𝒃 & 𝑨 𝒃 & \dots & 𝑨^{n - 1} 𝒃 \end{array}] \in ℝ^{m \times n} .

As in the development of power iteration into the $Q R$ method for eigenvalue approximation, the ill-conditioning of $𝑽_{n}$ can be eliminating by orthogonalization of $𝑽_{n}$ . In fact, the procedure can be organized so as to iteratively add one more vector $𝒒_{n + 1}$ to the vectors $𝑸_{n} = [\begin{array}{llll} 𝒒_{1} & 𝒒_{2} & \dots & 𝒒_{n} \end{array}]$ already obtained from orthogonalization of $𝑽_{n}$ . Start in iteration $n = 1$ from $𝒒_{1} = 𝒃 / || 𝒃 ||$ . A new direction is obtained through multiplication by $𝑨$ , $𝒗_{2} = 𝑨 𝒒_{1}$ . Gram-Schmidt orthogonalization leads to

𝒗_{2} = h_{11} 𝒒_{1} + h_{21} 𝒒_{2}, h_{11} = 𝒒_{1}^{T} 𝒗_{2}, h_{21} = || 𝒗_{2} - h_{11} 𝒒_{1} ||, 𝒒_{2} = (𝒗_{2} - h_{11} 𝒒_{1}) / h_{21} .

The above can be written as.

𝑨 [𝒒_{1}] = [\begin{array}{ll} 𝒒_{1} & 𝒒_{2} \end{array}] [\begin{array}{l} h_{11} \\ h_{21} \end{array}], 𝑨 𝑸_{1} = 𝑸_{2} {\tilde{𝑯}}_{1} .

(5)

Note that

C (𝑽_{n}) = C (𝑸_{n}) = span {𝒃, 𝑨 𝒃, 𝑨^{2} 𝒃, \dots, 𝑨^{n - 1} 𝒃},

thus constructing a sequence of vector spaces of increasing dimension $C (𝑸_{1}) \subseteq C (𝑸_{2}) \subseteq \dots \subseteq C (𝑸_{n})$ when $𝒃$ is not an eigenvector of $𝑨$ . These are known as Krylov spaces $𝒦_{n} = C (𝑸_{n})$ . In the $n^{th}$ iteration (5) generalizes to

𝑨 𝑸_{n} = 𝑨 [\begin{array}{llll} 𝒒_{1} & 𝒒_{2} & \dots & 𝒒_{n} \end{array}] = [\begin{array}{llll} 𝑨 𝒒_{1} & 𝑨 𝒒_{2} & \dots & 𝑨 𝒒_{n} \end{array}] = [\begin{array}{lllll} 𝒒_{1} & 𝒒_{2} & \dots & 𝒒_{n} & 𝒒_{n + 1} \end{array}] [\begin{array}{llll} h_{11} & h_{21} & \dots & h_{1, n} \\ h_{21} & h_{22} & \dots & h_{2, n} \\ 0 & h_{32} & ⋱ & ⋮ \\ ⋮ & ⋮ & ⋱ & h_{n, n} \\ 0 & 0 & \dots & h_{n + 1, n} \end{array}] = 𝑸_{n + 1} {\tilde{𝑯}}_{n} .

(6)

The resulting algorithm is known as the Arnoldi iteration.

Algorithm (Arnoldi)

$𝒃$ , $𝒒_{1} = 𝒃 / || 𝒃 ||$

for $n = 1, 2, \dots$

$𝒗 = 𝑨 𝒒_{n}$

for $j = 1$ to $n$

$h_{j n} = 𝒒_{j}^{T} 𝒗$

$𝒗 = 𝒗 - h_{j n} 𝒒_{j}$

end

$h_{n + 1, n} = || 𝒗 ||$

$𝒒_{n + 1} = 𝒗 / h_{n + 1, n}$

end

3.GMRES

Approximate solutions $𝒖_{n} \in C (𝑸_{n})$ to the system $𝑨 𝒖 = 𝒄$ can now be obtained by choosing the starting vector of the embedded Krylov spaces as $𝒃 = 𝒄$ and solving the least squares problem

{min}_{𝒖_{n}} || 𝑨 𝑸_{n} 𝒖_{n} - 𝒄 || .

(7)

Problem (7) is reformulated using (6) as

{min}_{𝒖_{n}} || 𝑸_{n + 1} {\tilde{𝑯}}_{n} 𝒖_{n} - 𝒄 || \Leftrightarrow {min}_{𝒖_{n}} || {\tilde{𝑯}}_{n} 𝒖_{n} - 𝒘 ||,

with $𝒘 = || 𝒄 || 𝒆_{1}$ since $|| 𝑸_{n + 1} {\tilde{𝑯}}_{n} 𝒖_{n} - 𝒄 || = || 𝑸_{n + 1}^{T} (𝑸_{n + 1} {\tilde{𝑯}}_{n} 𝒖_{n} - 𝒄) ||$ . This is known as the generalized minimal residual algorithm (GMRES).

Algorithm (GMRES)

$𝒄$ , $s = || 𝒄 ||$ , $𝒒_{1} = 𝒄 / s$