Lecture 24: Nonlinear Vector Operator Equations

1.Multivariate root-finding algorithms

Consider now nonlinear finite-dimensional mappings 𝒇:dd, and the root-finding problem

𝒇(𝒙)=𝟎, (1)

whose set of solutions generalize the linear mapping concept of a null space, N(𝑨)={𝒙|𝑨𝒙=𝟎.,𝑨d×d}. As in the scalar-valued case, algorithms are sought to construct an approximating sequence {𝒙k}k whose limit is a root of (1), by approximating 𝒇 with 𝒈k, and solving

𝒈k(𝒙)=0. (2)

Multivariate approximation is however considerably more complex than univariate approximation. For example, consider d=2, 𝒇:22, and the univariate monomial interpolants in Lagrange form

t𝒇(s,t)=i=0m𝒇(xi,t)lix(s),s𝒇(s,t)=j=0n𝒇(s,yj)ljy(t),

with

lix(s)=k=0m's-xkxi-xk,ljy(s)=l=0n't-ylyj-yl.

The operator t carries out interpolation at fixed t value of the data set 𝒟x={(xi,𝒇(xi,t)),i=0,,m}. Similarly, operator s carries out interpolation at fixed s value of the data set 𝒟y={(yj,𝒇(s,yj)),j=0,,n}. Multivariate interpolation of the data set

𝒟={(xi,yj,𝒇(xi,yj)),i=0,,m,j=0,,n},

can be carried out through multiple operator composition procedures.

Operator product

Define =ts as

𝒇(s,t)=(ts)𝒇(s,t)=t(s𝒇(s,t))=t(i=0m𝒇(xi,t)lix(s))=i=0mj=0n𝒇(xi,yj)lix(s)ljy(t).

Operator Boolean sum

Define =ts as =t+s-ts

𝒇(s,t)=i=0m𝒇(xi,t)lix(s)+j=0n𝒇(s,yj)ljy(t)-i=0mj=0n𝒇(xi,yj)lix(s)ljy(t).

Bivariate (d=2) root-finding algorithms already exemplify the additional complexity in constructing root finding algorithms. The goal is to determine a new approximation (xk,yk) from the prior approximants

(x0,y0),,(xk-2,yk-2),(xk-1,yk-1).

Whereas in the scalar case two prior points allowed construction of a linear approximant, the two points in data

𝒟={(xk-2,yk-2),(xk-1,yk-1)}

are insufficient to determine

𝒇=i=k-2k-1j=k-2k-1𝒇(xi,yj)lix(s)ljy(t),

which requires four data points. Various approaches to exploit the additional degrees of freedom are available, of which the class of quasi-Newton methods finds widespread applicability.

2.Quasi-Newton methods

A linear multivariate approximant in d dimensions requires 2d data. A Hermite interpolant based upon function and partial derivative values can be constructed, but it is more direct to truncate the multivariate Taylor series

𝒇(𝒙)=𝒇(𝒙k)+𝒇𝒙(𝒙k)(𝒙-𝒙k)+,

where

𝑱=𝒇𝒙=[ f1x1 f1x2 f1xd f2x1 f2x2 f2xd fdx1 fdx2 fdxd ]=𝒇,

is the Jacobian matrix of 𝒇. Setting 𝒇(𝒙k+1)=𝟎, as the condition for the next iterate leads to the update

𝑱(𝒙k)(𝒙k+1-𝒙k)=-𝒇(𝒙k),

a linear system that is solved at each iteration. Computation of the multiple partial derivatives arising in the Jacobian might not be possible or too expensive, hence approximations are sought 𝑩k𝑱(𝒙k), similar in principle to the approximation of a tangent by a secant. In such quasi-Newton methods, a secant condition on 𝑩k is stated as

𝑩k(𝒙k-𝒙k-1)=𝒇(𝒙k)-𝒇(𝒙k-1),

and corresponds to a truncation of the Taylor series expansion around 𝒙k-1. The above secant condition is not sufficient by itself to determine 𝑩k, hence additional considerations can be imposed.

  1. Recalling that the scalar Newton method for finding roots of f(x)=0 converges in a region where f',f''>0, imposing analogous behavior for 𝑩k suggests itself. This is typically done by requiring 𝑩k to be symmetric positive definite.

  2. Assuming convergence of the approximating sequence {𝒙k}k to a root, 𝑩k+1 should be close to the previous approximation suggesting the condtion

    min𝑩k+1||𝑩k+1-𝑩k||.

    Various algorithms arise from a particular choice of norm and procedure to apply (2).

One widely used quasi-Newton method, arising from a rank-two update at each iteration to maintain positive definiteness, is the Broyden-Fletcher-Goldfard-Shanno update

𝑩k+1=𝑩k+𝒚k𝒚kT𝒚kT𝒔k-𝑩k𝒔k𝒔kT𝑩kT𝒔kT𝑩k𝒔k,

where the updates are determined by

  1. Solving 𝑩k𝒑k=-[𝒇(𝒙k)-𝒇(𝒙k-1)] to find a search direction 𝒑k ;

  2. Finding the distance along the search direction by αk=argmin||𝒇(𝒙k+αk𝒑k)||2 ;

  3. Updating the approximation 𝒔k=αk𝒑k, 𝒙k+1=𝒙k+𝒔k

  4. Computing 𝒚k=𝒇(𝒙k+1)-𝒇(𝒙k).

3.Gradient descent methods