Alternatives to exploiting problem structure by operator splitting are suggested by the least action principle. The action
is a functional over the phase space of the system, e.g., for a system composed of point masses. The least action principle states that the observed trajectory minimizes the action, hence it is to be expected that optimization algorithms that solve the problem
would be of interest. This indeed is the case and leads to a class of methods that can exhibit remarkably fast convergence and more readily generalize to variable physical properties and arbitrary domain geometry.
A first step in considering linear operators that still exhibit structure but are more complex than the constant-coefficient discretization of the Laplacian is to consider spatially-varying diffusivity in which case the steady-state heat equation in domain becomes
(1) |
again with Dirichlet boundary conditions on . Maintaining simple domain geometry for now, the centered finite-difference discretization of (1) on with grid points becomes
(2) |
where denotes a diffusivity average at and contains the boundary conditions and forcing term as before, . The sparsity pattern is the same as in the constant diffusivity case, but the system has a system matrix with variable coefficients. The matrix expresses a self-adjoint operator through a symmetric discretization, namely centered finite differences on a uniform grid. It can be expected to be symmetric , as verified by considering row , that has non-zero components in columns , , . It is sufficient to verify symmetry for entries within the lower triangle of . The component is the coefficient of in (2) . Symmetry of would require . The component arises from row
The diagonal element for row has indices and the column has indices for which , indeed verifying . Such opaque index manipulations can readily be avoided by symmetry considerations as stated above: self-adjoint operator expressed through symmetric discretization. The physical argument is even simpler. Diffusivity expresses how readily heat is transferred between two spatial positions of unequal temperature, and there is no reason for this material property to differ in considering the heat flux from point to point , from that from point to , . Setting to account for direction of heat flow leads to , and this material property is reflected in symmetry of . Note that even though the operator might be self-adjoint under appropriate boundary conditions, unsymmetric discretization such as one-sided finite differences can lead to a non-symmetric system matrix .
The implications for iterative method convergence can again be surmised from the one-dimensional case with homogeneous boundary conditions , . The convergence rate for an iterative method depends on the eigenvalues of the matrix obtained by discretization of the operator . The regular Sturm-Liouvile eigenproblem , is known to have a solution, albeit difficult to obtain analytically. Replacing analytical estimates by a numerical experiment taking , Fig. 1 shows that convergence becomes marginally slower as the diffusivity gradient increases, though the main difficulty is the spectral radius for constant diffusivity.
The heat equation can be obtained as the stationary solution , to an optimization problem for the functional
(3) |
among all functions that satisfy the boundary condition on . The above can be understood as the generalization of the one-dimensional case
The stationarity condition point for is
Since all must satisfy boundary conditions the perturbations are null at endpoints , and stationarity for arbitrary perturbations implies that
the one-dimensional variable diffusivity heat equation.
How can the above observations guide algorithm construction? The key point is that the discrete problem should also be expressible as an optimization problem for
with . The discrete stationarity condition is leading to
Using the Kronecker delta properties , for gives
which for symmetric leads to
(4) |
Symmetric discretization of the self-adjoint operator produces a symmetric matrix that is unitarily diagonalizable , and, as seen previously, with strictly positive eigenvalues. Hence stationary points of are minima and the solution to can be sought by minimizing .
Equation (4) states that the gradient of is opposite the direction of the residual . Since this is the direction of fastest increase of , travel in the opposite direction will decrease leading to an update
(5) |
of the current approximation . The correction direction is also referred to as a search direction for the optimization procedure. In the residual correction formulation
steepest descent corresponds to the choice . The remaining question is to determine how far to travel along the search direction. As increases the local gradient direction changes. Steepest descent proceeds along the direction until further decrease is no longer possible, that is when the new gradient direction is orthogonal to the previous one
The convergence rate is given by the spectral radius of that becomes
Recall that the one-dimensional, constant diffusivity heat equation had eigenvalues of
Eigenvalues of gradient descent iteration are therefore
Since is the inverse of a Rayleigh quotient, if the residual is in the direction of eigenvector , and suggesting the possibility of fast convergence. However, the distribution of eigenvalues for is uniformly distributed in the interval [0,4] such that the residual component in other eigendirections is not significantly reduced. The typical behavior of gradient descent is rapid decrease of the residual in the first few iterations followed by slow convergence to the solution. Consider the problem
with solution
The behavior of gradient descent is shown in Fig. 2. A good approximation of the solution shape is obtained after only a few gradient descent iterations, but convergence thereafter is slow.
Steepest descent is characterized by a correction in the direction of the residual (5). Enforcing leads to orthogonality of both succesive residuals and correction directions. A more insightful interpretation of (3) is to recognize the role of the scalar products
in the continuum, discrete cases respectively. Similarly to how vectors that satisfy are said to be orthogonal, those that satisfy are said to be -conjugate. Gradient descent minimizes the 2-norm of the error at each iteration. However, the variational formulation suggests that a more appropriate norm is the -norm
This leads to a modification of the search directions , which are no longer taken in the direction of the residual and orthogonal, but rather -conjugate
Algorithm Conjugate gradient
for
end