Lecture 28: Linear Operator Splitting

A first example of problem-specific algorithms is afforded by consideration of the steady-state diffusion problem

-∇2u=-Δu=f, (1)

where u(𝒙),đ’™âˆˆÎ©âŠ†â„d can be understood to denote the temperature in an infinitesimal volume positioned at 𝒙 in d-dimensional space, and f is a local rate of heat generation. The above arises from setting the time derivative zero in ∂tu-â–”u=f. Though often stated in this thermal language, the Poisson equation (1) is generally valid for unresolved transport by Brownian motion. The mathematical concept of an “infinitesimal volume” is interpreted as setting some length scale ℓ much smaller than the length scale L characterizing the size of the domain Ω. As an example, for heat conduction in a column of water of length L=1m the length scale of a quasi-infinitesimal volume can be considered as, say, ℓ=1ÎŒm. There is no physical significance to the mathematical limit process ℓ→0 due to the discrete structure of matter, and for all practical purposes setting ℓ=1ÎŒm is an acceptable threshold to delimit a phenomenon of interest to behavior that can be neglected. In this case the phenomenon of interest is the average transport of thermal energy in volumes of size greater than đ’Ș(ℓd). The detailed Brownian motion of the water molecules that occurs on length scales s≈1nmâ‰Șℓ can be neglected and is said to be unresolved. The only observable effect of this motion is that temperature gradients lead to a heat flux as described by 𝒇(u)=-α∇u (Fourier's law). The same equation (1) arises in epidemiology when ℓ=10m, an average separation between an infected and susceptible individual, L=10km, the size of a large city, and u is reinterpreted as the percentage of infected individuals in the population. Again, the detailed Brownian steps of size s≈10cmâ‰Șℓ taken by individuals can be neglected.

1.Poisson equation discretization

Understanding the underlying unresolved Brownian motion is useful in constructing numerical solutions. For f=0, (1) becomes ∇2u=0,which states that there is no net heat flux in an infinitesimal volume, div⋅(α∇u)=0, colloquially: “what flows in on one side goes out on the other”. A function that satisfies Laplace's equation ∇2u=0 is said to be harmonic. For d=1, the ordinary differential equation ∂x2u=0 is obtained with solution u(x)=a+bx, and boundary conditions u0 at x=0 and u1 at x=1 gives u(x)=u0+(u1-u0)x. A temperature difference u0≠u1 at the boundaries induces diffusive fluxes that lead to the mean value u(1/2)=(u0+u1)/2 at the midpoint. An analogous statement is made for d>1 starting from Green's formula on ball B with spherical boundary S of radius R

B(uΔv-vΔu)dω=S=∂B(u∂v∂n-v∂u∂n)dσ,

for scalar functions u,v. For d=3 and v=1/r the mean value theorem

u(𝒙)=14πR2S=∂Bu(𝒚)dσ(𝒚), (2)

is obtained, which states that the value of a harmonic function is the average of the values on a surrounding sphere. For d=2 the analogous statement is

u(𝒂)=12πr02πu(𝒂+rcosξ𝒆x+rsinξ𝒆y)dΞ. (3)

Midpoint quadrature of (3) over four subintervals gives

u0=u1+u2+u3+u44. (4)

Apply the above on a grid covering some arbitrary domain Ω with boundary ÎŁ=∂Ω to obtain

ui,j=14(ui,j-1+ui-1,j+ui+1,j+ui,j+1). (5)

Complications arise for general boundaries ÎŁ, but for a square domain Ω=[0,1]×[0,1] grid points (xi=ih,yj=jh) align with the boundary, h=1/(n+1). Instead of two indices, one for each spatial direction, organize the grid values u through a single index k=(j-1)n+i, with uk denoting the value at the interior points (xi,yj). The vector of interior grid values 𝒖=[ u1 
 um ] has m=n2 components, and the mean value theorem leads to the linear system

∘

𝑹𝒖=𝒃, (6)

where 𝑹 has a regular sparsity pattern induced by the uniform spacing of the grid

𝑹=[ 4 -1 0 
 0 -1 0 
 
 0 -1 4 -1 0 
 0 -1 0 
 0 ⋱ ⋱ ⋱ ⋱ ].

The vector 𝒃 contains the boundary values, for example b1=u1,0+u0,1.

Figure 1. Left: Mean value theorem leads to u0=(u1+u2+u3+u4)/4. Middle: Five-point finite difference stencil for Laplace operator. Right: Structure of 𝑹 matrix resulting from discretization of Laplace operator.

The system (6) was obtained by discretization of the mean-value integral (3). The same linear system is also obtained by discretization of the differential equation

-∇2u=-uxx-uyy=0,

where indices denote differentiation. The minus sign arises from compatibility with the unsteady form of the heat equation ∂tu-∇2u=f. A centered finite difference approximation of the derivatives on the uniform grid leads to

uxx(xi,yj)≅ui-1,j-2ui,j+ui+1,jh2,uyy(xi,yj)≅ui,j-1-2ui,j+ui,j+1h2,

and (6) is recovered. For the Poisson equation f≠0 the right hand side changes to

𝑹𝒖=𝒃+h2𝒇=𝒄 (7)

with fk=f(xi,yj). It is often the case that the same discrete system arises from both the differential and the integral formulation of a conservation law on a uniform grid.

2.Matrix splitting iteration

The above discussion of the underlying physics of the Poisson equation productively guides construction of numerical solution procedures. Solving the linear system (7) by general factorizations such as 𝑹=đ‘łđ‘Œ or 𝑹=𝑾đ‘č is costly in terms of memory usage since the sparisty pattern is lost. For the uniform grid and square domain considered above the matrix 𝑹 need not be explicitly stored at all since ai,i=4, and ai,j=-1 when |i-j|=1 or |i-j|=n. The discrete mean value theorem (5) suggests that some approximation 𝒖(l) can be improved by the iteration

ui,j(l+1)=14(ui,j-1(l)+ui-1,j(l)+ui+1,j(l)+ui,j+1(l)). (8)

The above is known as Jacobi iteration, and can be stated in matrix form by expressing 𝑹 as

𝑹=𝑳+đ‘«+đ‘Œ, (9)

with 𝑳,đ‘Œ containing non-zero components of 𝑹 below, above the diagonal, and đ‘« containing the diagonal of 𝑹. In contrast to the multiplicative decompositions considered up to now, the QR, LU, SVD, eigen or Schur decompositions, the decomposition (9) is now additive. Note that in (9) 𝑳,đ‘Œ are strictly lower, upper diagonal matrices with zeros on the diagonal in contrast to the notation for the standard LU factorization algorithm. Recall that the utility of matrix multiplication was associated with the representation of linear mapping composition. Additive decompositions such as (9) generally are useful when separating different aspects of a physical process, and are a simple example of operator splitting. For the discrete Poisson system Jacobi iteration can be expressed as

𝑹𝒖=𝒄⇒(𝑳+đ‘«+đ‘Œ)𝒖=𝒄⇒𝒖(l+1)=đ‘«-1(𝒄-𝑳𝒖(l)-đ‘Œđ’–(l)). (10)

Several variants of the idea can be pursued. The matrix splitting (9) is useful in theoretical convergence analysis, but implementations directly use (8) within loops over the (i,j) indices. Updated values can be immediately utilized leading to either of the following iterations

𝒖(l+1)=đ‘«-1(𝒄-𝑳𝒖(l+1)-đ‘Œđ’–(l)),𝒖(l+1)=đ‘«-1(𝒄-𝑳𝒖(l)-đ‘Œđ’–(l+1)), (11)

depending on loop organization. These are known as Gauss-Seidel iterations. Convergence might accelerated by extrapolation,

𝒖(l+1)=𝒖(l)+ω[đ‘«-1(𝒄-𝑳𝒖(l)-đ‘Œđ’–(l+1))-𝒖(l)]=(1-ω)𝒖(l)+Ï‰đ‘«-1(𝒄-𝑳𝒖(l)-đ‘Œđ’–(l+1)), (12)

where the new iteration is continued by factor ω in the direction of the Gauss-Seidel update. When ω>1 this is known as successive over-relaxation (SOR) and goes further in the Gauss-Seidel direction. Choosing 0<ω<1 leads to successive under-relaxation.

3.Convergence analysis

Turning now from algorithm construction to analysis of its behavior, simplify notation by letting 𝒖k denote the current iterate. The previous notation 𝒖(l) was convenient since individual components were referenced as in ui,j(l), but convergence analysis is determined by the properties of the operator splitting and not of the current iterate. Introduce the error đœčk at iteration k as the difference between the exact solution 𝒖 and the current iterate 𝒖k, đœčk=𝒖-𝒖k. Also introduce the residual 𝒓k=𝒄-𝑹𝒖k=𝑹đœčk, and the correction to the current iterate 𝒆k=𝒖k+1-𝒖k

The above methods can be formulated as a residual correction algorithm through the steps:

  1. residual computation, 𝒓k=𝒄-𝑹𝒖k

  2. correction computation, 𝒆k=đ‘©đ’“k

  3. approximation update, 𝒖k+1=𝒖k+𝒆k

When đ‘©=𝑹-1 the exact solution is recovered in one step

𝒆k=𝑹-1(𝒄-𝑹𝒖k)=𝒖-𝒖k⇒𝒖k+1=𝒖k+𝒆k=𝒖.

Iterative methods use some approximation of the (unknown) inverse, đ‘©â‰…đ‘š-1. Jacobi iteration uses đ‘©=đ‘«-1 since

𝒖k+1=𝒖k+đ‘«-1[𝒄-(𝑳+đ‘«+đ‘Œ)𝒖k]=đ‘«-1[𝒄-(𝑳+đ‘Œ)𝒖k],

recovering (10). Table 1 shows several common choices for đ‘©. Two key aspects govern the choice of the inverse approximant:

  1. Computational efficiency stated as a requirement that each iteration cost either đ’Ș(m) or đ’Ș(mlogm) operations;

  2. Capturing the essential aspects of 𝑹.

Jacobi đ‘«-1 Forward Gauss-Seidel (đ‘«+𝑳)-1
Weighted Jacobi Ï‰đ‘«-1 Backward Gauss-Seidel (đ‘«+đ‘Œ)-1
SOR ω(đ‘«+Ï‰đ‘ł)-1 Symmetric Gauss-Seidel (đ‘«+đ‘Œ)-1đ‘«(đ‘«+𝑳)-1
Symmetric SOR ω(2-ω)(đ‘«+Ï‰đ‘Œ)-1đ‘«(đ‘«+Ï‰đ‘ł) Richardson Ï‰đ‘°

Table 1. Common iterative methods

The iteration converges to the solution if ||đœčk||→0 for increasing k. The error at iteration k+1 is expressed as

đœčk+1=𝒖-𝒖k+1=𝒖-(𝒖k+đ‘©đ’“k)=đœčk-đ‘©đ‘š(𝒖-𝒖k)=(𝑰-đ‘©đ‘š)đœčk=(𝑰-đ‘©đ‘š)k+1đœč0. (13)

The repeated matrix multiplication indicates that the eigenstructure of the iteration matrix 𝑮=𝑰-đ‘©đ‘š determines iteration convergence. Indeed the above is simply power iteration for 𝑮 and can be expected to converge as

đœčk→Ό1kc1𝒒1,

with (ÎŒ1,𝒒1) the eigenpair that corresponds to the largest eigenvalue in absolute value, 𝑮𝒒1=ÎŒ1𝒒1, known as the spectral radius of 𝑮, denoted by ρ(𝑮). Clearly, the above iterations will exhibit linear order of convergence when ρ(𝑮)<1. The rate of convergence sk at iteration k is estimated by the Rayleigh quotient

sk=đœčkT𝑮đœčkđœčkTđœčk=đœčkTđœčk+1đœčkTđœčk,

and is monitored in implementations of iterative methods, and determined by the eigenvalues λ=1-ÎŒ of đ‘©đ‘š.

The eigenstructure of đ‘©đ‘š is difficult for arbitrary matrices 𝑹, but can be carried out when 𝑹 has special structure induced by known physical phenomena. The relation between analytical and numerical formulations plays an essential role in convergence analysis. The diffusion equation (5) leads to a symmetric matrix 𝑹, 𝑹=𝑹T due to two aspects:

  1. the chosen discretization is symmetric using centered finite differences;

  2. the operator itself exhibits symmetry.

Insight into the above two aspects is most readily gained from the one-dimensional case -uxx=f with homogeneous Dirichlet boundary conditions u(0)=u(1)=0. The linear system 𝑹𝒖=𝒇 obtained from the centered finite difference discretization

-ui-1+2ui-ui+1=h2fi,i=1,
,m,h=1/(m+1),u0=um+1=0,

has a symmetric tridiagonal system matrix 𝑹=diag([ -1 2 -1 ]). The 𝑹T=𝑹 symmetry can be expressed through scalar products in a way that generalizes to differential operators. Recall that a real-valued scalar product (𝒖,𝒗) must satisfy symmetry (𝒗,𝒖)=(𝒖,𝒗). For 𝒖,𝒗∈ℝm the standard inner product (𝒖,𝒗)=𝒖T𝒗 has this property. Consider the action of the operator 𝑹∈ℝm×m on the two terms. If (𝑹𝒖,𝒗)=(𝒖,𝑹𝒗) the operator 𝑹 is said to be symmetric. For the inner product

(𝑹𝒖,𝒗)=(𝑹𝒖)T𝒗=𝒖T𝑹T𝒗,(𝒖,𝑹𝒗)=𝒖T𝑹𝒗,

and the two expressions are equal if 𝑹=𝑹T. The same approach extends to the d=1 diffusion operator L=-∂xx using the scalar product

(f,g)=01f(x)g(x)dx.

Applying integration by parts

(∂xxf,g)=01f''(x)g(x)dx=[f'g]x=0x=1-01f'(x)g'(x)dx=[f'g]x=0x=1-[fg']x=0x=1+01f(x)g''(x)dx=b+(f,∂xxg).

For homogeneous boundary conditions f(0)=f(1)=g(0)=g(1)=0, the symmetry condition is satisfied. Note that symmetry involves both the operator anId the boundary conditions of the problem. For d=2 the scalar product

(u,v)=Ωu(x,y)v(x,y)dxdy,

is defined on the unit square Ω=[0,1]×[0,1], and for homogeneous Dirichlet boundary conditions two applications of Green's formula leads to (∇2u,v)=(u,∇2v), and the Laplace operator is symmetric.

For d=1, xj=jh, h=1/(m+1), the eigenvalues Μkof 𝑹=diag([ 1-2 1 ]) are inferred from those of the -∂xx operator with homogeneous boundary conditions at x=0, x=1

-∂xxsin(Îșπx)=(Îșπ)2sin(Îșπx),Îș∈ℕ.

Positing that an eigenvector 𝒒 of 𝑹 is the discretization of the continuum eigenfunction leads to

qj=sin(Îșπxj)=sin(Îșπjh)=sin(jÎșπm+1),

hypothesis that is verified by the calculation of component j of 𝑹𝒒

(𝑹𝒒)j=-sin[(j-1)Îșπm+1]+2sin[jÎșπm+1]-sin[(j+1)Îșπm+1]=2[1-cos(Îșπm+1)]sin(jÎșπm+1),

thereby obtaining the eigenvalue

vÎș=2[1-cos(Îșπm+1)]=4sin2[Îșπh2],

which recovers the analytical eigenvalue in the h→0 limit

limh→0vÎșh2=(Îșπ)2.

For Jacobi đ‘©=đ‘«-1, so the eigenvalues of đ‘©đ‘š are

λÎș=2sin2[Îșπh2],Îș=1,2,
,m.

The eigenvalues of 𝑮=𝑰-đ‘©đ‘š are therefore

ÎŒÎș=1-2sin2[Îșπh2]=cos(Îșπh)

Replacing h=1/(m+1) the largest eigenvalue is obtained for Îș=1

ÎŒmax=ÎŒ1=cos(πm+1).

For large m, ÎŒ1âȘ…1, and slow convergence is expected as verified in the numerical experiment from Fig. 2.

∘

Figure 2. Convergence of Jacobi iteration. Blue: exact solution. Orange, green, red: iterates after 1000, 2000, 3000 iterations.

Jacobi iteration and its variants have limited practical utility by comparison to other iterative procedures due to slow convergence, but the concept of operator spliting has wide applicability. A more consequential example of operator splitting is to consider the advection diffusion equation

qt+𝒗⋅∇q=αΔq,

which can be interpreted as stating that the time evolution of q is due to the effect of a diffusion operator 𝒜=αΔ and an advection operator ℬ=-𝒗⋅∇

qt=(𝒜+ℬ)q.

Suppose both operators are discretized leading to matrices 𝑹,đ‘© and the discrete system

𝒒t=(𝑹+đ‘©)𝒒,

with initial condition 𝒒(𝒙,t=0)=𝒒0. Advancing the solution by a time step Δt can be written as

𝒒(t+Δt)=eΔt(𝑹+đ‘©)𝒒(t),

and can be separated into two stages

𝒒(l+1)=eΔt𝑹eΔtđ‘©đ’’(l).

The quantity 𝒒=eΔtđ‘©đ’’(l) captures advection effects and 𝒒(l+1)=eΔt𝑹𝒒 is the diffusion correction. Since advection and diffusion are markedly different physical effects describing resolved versus unresolved transport, it can be expected that matrices 𝑹,đ‘© have different properties that require specific solution procedures.

By contrast, the Jacobi iteration splitting 𝑹=𝑳+đ‘«+đ‘Œ does not separate physical effects and simply is suggested by the sparsity of 𝑹 and computational efficiency per iteration. For example, the forward Gauss-Seidel iteration with đ‘©=(đ‘«+𝑳)-1 leads to

𝒖k+1=𝒖k+(đ‘«+𝑳)-1[𝒄-(𝑳+đ‘«+đ‘Œ)𝒖k]=(đ‘«+𝑳)-1(𝒄-đ‘Œđ’–k)⇒(đ‘«+𝑳)𝒖k+1=𝒄-đ‘Œđ’–k. (14)

The matrix đ‘«+𝑳 is (non-strictly) lower-triangular and (14) is easily solved by forward substitution. The implementation is very simple to express while preserving two-dimensional indexing.

Algorithm (Componentwise forward Gauss-Seidel)

fori=1:mx

forj=1:my

u(i,j)=[c(i,j)+u(i+1,j)+u(i-1,j)+u(i,j+1)+u(i,j-1)]/4

end

end

In the above implementation a single memory space is used for 𝒖 with new values taking place of the old, leading to just four floating point additions and one multiplication per grid point. Since the algorithm essentially takes the current average of neighboring values, it is also known as a relaxation method, smoothing out spatial variations in 𝒖.

Though such simple implementation is desirable, the non-physical splitting 𝑹=𝑳+đ‘«+đ‘Œ and associated slow convergence usually outweighs ease of coding effort and suggests looking for alternative approaches. The only scenarios where such simply implemented iterations find practical use is in parallel execution and as a preliminary modification of the system prior to use of some other algorithm.