MATH661

Alternatives to exploiting problem structure by operator splitting are suggested by the least action principle. The action

is a functional over the phase space of the system, e.g.,

S : ℝ^{2 n} \to ℝ

for a system composed of

n

point masses. The least action principle states that the observed trajectory minimizes the action, hence it is to be expected that optimization algorithms that solve the problem

would be of interest. This indeed is the case and leads to a class of methods that can exhibit remarkably fast convergence and more readily generalize to variable physical properties and arbitrary domain geometry.

1.Spatially dependent diffusivity

A first step in considering linear operators that still exhibit structure but are more complex than the constant-coefficient discretization of the Laplacian

\nabla^{2}

is to consider spatially-varying diffusivity in which case the steady-state heat equation in domain

Ω

becomes

again with Dirichlet boundary conditions

u = b

\partial Ω

. Maintaining simple domain geometry for now, the centered finite-difference discretization of (1) on

Ω = [0, 1] \times [0, 1]

with grid points

(x_{i} = i h, y_{j} = j h, h = 1 / (n + 1))

becomes

where

{\overline{α}}_{i, j} = (α_{i + 1 / 2, j} + α_{i - 1 / 2, j} + α_{i, j + 1 / 2} + α_{i, j - 1 / 2}) / 4

denotes a diffusivity average at

(i, j)

and

𝒄

contains the boundary conditions and forcing term as before,

𝒄 = 𝒃 + h^{2} 𝒇

. The sparsity pattern is the same as in the constant diffusivity case, but the system

𝑨 𝒖 = 𝒄

has a system matrix with variable coefficients. The matrix

𝑨

expresses a self-adjoint operator through a symmetric discretization, namely centered finite differences on a uniform grid. It can be expected to be symmetric

𝑨 = [a_{k, r}] = 𝑨^{T} = [a_{r, k}]

, as verified by considering row

k = (j - 1) n + i

, that has non-zero components in columns

k

k \pm 1

k \pm n

. It is sufficient to verify symmetry for entries within the lower triangle of

𝑨

. The

k, k - 1

component is the coefficient of

u_{i - 1, j}

in (2)

a_{k, k - 1} = - α_{i - 1 / 2, j}

. Symmetry of

𝑨

would require

a_{k, k - 1} = a_{k - 1, k}

. The

a_{k - 1, k}

component arises from row

k - 1

The diagonal element for row

k - 1

has indices

(i - 1, j)

and the

k^{th}

column has indices

(i, j)

for which

a_{k - 1, k} = - α_{i - 1 / 2, j}

, indeed verifying

a_{k, k - 1} = a_{k - 1, k}

. Such opaque index manipulations can readily be avoided by symmetry considerations as stated above: self-adjoint operator expressed through symmetric discretization. The physical argument is even simpler. Diffusivity expresses how readily heat is transferred between two spatial positions of unequal temperature, and there is no reason for this material property to differ in considering the heat flux from point

P

to point

Q

q_{PQ} = α_{PQ} (u_{P} - u_{Q})

from that from point

Q

P

q_{QP} = α_{QP} (u_{Q} - u_{P})

. Setting

q_{PQ} = - q_{QP}

to account for direction of heat flow leads to

α_{PQ} = α_{QP}

, and this material property is reflected in symmetry of

𝑨

. Note that even though the operator

\nabla \cdot (α \nabla)

might be self-adjoint under appropriate boundary conditions, unsymmetric discretization such as one-sided finite differences can lead to a non-symmetric system matrix

𝑨

The implications for iterative method convergence can again be surmised from the one-dimensional case with homogeneous boundary conditions

\partial_{x} (α (x) \partial_{x} u) = f

u (0) = u (1) = 0

. The convergence rate for an iterative method depends on the eigenvalues of the matrix

𝑨

obtained by discretization of the operator

\partial_{x} (α (x) \partial_{x})

. The regular Sturm-Liouvile eigenproblem

\partial_{x} (α (x) \partial_{x} u) = λ u

u (0) = u (1) = 0

is known to have a solution, albeit difficult to obtain analytically. Replacing analytical estimates by a numerical experiment taking

α (x) = 1 + c x

, Fig. 1 shows that convergence becomes marginally slower as the diffusivity gradient

c

increases, though the main difficulty is the

ρ (M) ⪅ 1

spectral radius for constant diffusivity.

$\circ$

Figure 1. Spectral radius of Jacobi iteration $𝑴 = 𝑰 - 𝑫^{- 1} 𝑨$ for $\partial_{x} (α (x) \partial_{x} u) = f$ with increasing diffusivity gradient $α = 1 + c x$ .

2.Steepest descent

The heat equation can be obtained as the stationary solution

δ Φ = 0

, to an optimization problem for the functional

among all functions

u

that satisfy the boundary condition

u = b

\partial Ω

. The above can be understood as the generalization of the one-dimensional case

Since all

u

must satisfy boundary conditions the perturbations are null at endpoints

δ u (0) = δ u (1) = 0

, and stationarity for arbitrary perturbations

δ u

implies that

How can the above observations guide algorithm construction? The key point is that the discrete problem should also be expressible as an optimization problem for

Φ : ℝ^{m} \to ℝ

with

𝑨 = [a_{j k}]

. The discrete stationarity condition is

\nabla_{𝒖} Φ = 0

leading to

Using the Kronecker delta properties

δ_{l l} = 1

δ_{l j} = 0

for

l \neq j

gives

Symmetric discretization of the self-adjoint operator

\nabla \cdot (α \nabla u)

produces a symmetric matrix that is unitarily diagonalizable

𝑨 = 𝑸 𝚲 𝑸^{T}

, and, as seen previously, with strictly positive eigenvalues. Hence stationary points of

Φ (𝒖)

are minima and the solution to

𝑨 𝒖 = 𝒄

can be sought by minimizing

Φ (𝒖)

Equation (4) states that the gradient of

Φ

is opposite the direction of the residual

\nabla Φ = 𝑨 𝒖 - 𝒄 = - 𝒓

. Since this is the direction of fastest increase of

Φ

, travel in the opposite direction will decrease

Φ

leading to an update

of the current approximation

𝒖_{k}

. The correction direction is also referred to as a search direction for the optimization procedure. In the residual correction formulation

steepest descent corresponds to the choice

𝑩 = β_{k} 𝑰

. The remaining question is to determine how far to travel along the

- \nabla Φ (𝒖_{k}) = 𝒓_{k}

search direction. As

β

increases the local gradient direction changes. Steepest descent proceeds along the

𝒓_{k}

direction until further decrease is no longer possible, that is when the new gradient direction is orthogonal to the previous one

The convergence rate is given by the spectral radius of

𝑴 = 𝑰 - 𝑩 𝑨

that becomes

Recall that the one-dimensional, constant diffusivity heat equation had eigenvalues of

𝑨

Since

β_{k}

is the inverse of a Rayleigh quotient, if the residual is in the direction of eigenvector

l

β_{k} = 1 / ν_{l}

and

λ_{l} = 0

suggesting the possibility of fast convergence. However, the distribution of eigenvalues

ν_{k}

for

𝑨

is uniformly distributed in the interval [0,4] such that the residual component in other eigendirections is not significantly reduced. The typical behavior of gradient descent is rapid decrease of the residual in the first few iterations followed by slow convergence to the solution. Consider the problem

The behavior of gradient descent is shown in Fig. 2. A good approximation of the solution shape is obtained after only a few gradient descent iterations, but convergence thereafter is slow.

$\circ$

Figure 2. Convergence of gradient descent. Blue: exact solution. Orange, green, red: iterates after 4, 40, 400 iterations.

∴	function ResidualCorrection(A,B,c,u,maxIter) m = length(u0) for k=1:maxIter r = c - A(u) e = B(r,A) u = u + e end return u end;

∴	function ALaplace1D(u) m = length(u); v = zeros(m,1) v[1] = 2u[1] - u[2] for k=2:m-1 v[k] = -u[k-1]+2u[k]-u[k+1] end v[m] = -u[m-1] + 2*u[m] return v end;

∴	function BGradDescLaplace1D(r) beta = dot(r,r)/dot(r,A(r)) return beta*r end;

∴	m=50; h=1.0/(m+1); x=(1:m)h; pix=pix; u0=zeros(m,1);

∴	c = (pih)^2(sin.(pix)+4sin.(2pix)+9sin.(3pix));

∴	uex=sin.(pix)+sin.(2pix)+sin.(3pix);

∴	u1=ResidualCorrection(ALaplace1D,BGradDescLaplace1D,c,u0,4);

∴	u2=ResidualCorrection(ALaplace1D,BGradDescLaplace1D,c,u0,40);

∴	u3=ResidualCorrection(ALaplace1D,BGradDescLaplace1D,c,u0,400);

∴	clf(); plot(x,uex,".",x,u1,x,u2,x,u3);

∴	grid("on"); xlabel(L"$x$"); ylabel(L"$u(x)$");

∴	title("Gradient descent convergence");

∴

3.Conjugate gradient

Steepest descent is characterized by a correction in the direction of the residual (5). Enforcing

𝒓_{k}^{T} 𝒓_{k + 1} = 0

leads to orthogonality of both succesive residuals and correction directions. A more insightful interpretation of (3) is to recognize the role of the scalar products

in the continuum, discrete cases respectively. Similarly to how vectors that satisfy

𝒖^{T} 𝒗 = 0

are said to be orthogonal, those that satisfy

𝒖^{T} 𝑨 𝒗 = 0

are said to be

𝑨

-conjugate. Gradient descent minimizes the 2-norm of the error

|| 𝒆_{k} ||

at each iteration. However, the variational formulation suggests that a more appropriate norm is the

𝑨

-norm

This leads to a modification of the search directions

𝒑_{k}

, which are no longer taken in the direction of the residual and orthogonal, but rather

𝑨

-conjugate

Lecture 29: Gradient Descent Methods

1.Spatially dependent diffusivity

2.Steepest descent

3.Conjugate gradient