Lecture 23: Nonlinear scalar operator equations

1.Root-finding algorithms

The null space of a linear mapping represented through matrix 𝑨m×n is defined as N(𝑨)={𝒙|𝑨𝒙=𝟎.}, the set of all points that have a null image through the mapping. The null space is a vector subspace of the domain of the linear mapping. A first step in the study of nonlinear mappings is to consider the generalization of the concept of a null set, starting with the simplest case,

f(x)=0 (1)

where f:, fCp(), p0, i.e., f has p continuous derivatives. It is assumed that a closed form analytical solution is not available, and algorithms are sought to construct an approximating sequence {xn}n whose limit is a root of (1). The general approach is to replace (1) with

gn(x)=0, (2)

where gn is some approximation of f, and xn the root of (2) can be easily determined.

1.1.First-degree polynomial approximants

Secant method.
Consider g(x)=ax+b (a linear function, but not a linear mapping for b0), an approximant of f based upon data {(xn-2,fn-2=f(xn-2)),(xn-1,fn-1=f(xn-1))}, given in Newton interpolant form by

gn(x)=fn-2+fn-1-fn-2xn-1-xn-2(x-xn-2). (3)

The solution of (3) is

xn=xn-2-fn-2fn-1-fn-2(xn-1-xn-2)=xn-2fn-1-xn-1fn-2fn-1-fn-2,

an iteration known as the secant method. The error in root approximation is

en=xn-x=en-2-fn-2fn-1-fn-2(en-1-en-2),

and can be estimated by Taylor series expansions around the root x for which f(x)=0,

fn-k=f(xn-k)=f'(x)(xn-k-x)+12f''(x)(xn-k-x)2+=f'en-k+12f''en-k2+,

where derivatives f',f'' are assumed to be evaluated at x. In the result

en=en-2-f'en-2+12f''en-22+f'(en-1-en-2)+12f''(en-12-en-22)+(en-1-en-2)=en-2[1-f'+12f''en-2+f'+12f''(en-1+en-2)+],

assuming f'(x)0, (i.e., x is a simple root) gives

en=en-2[1-1+cen-2+1+c(en-1+en-2)+],c=12(f''/f').

For small errors, to first order the above can be written as

en=en-2[1-(1+cen-2)(1-c(en-1+en-2))]=cen-1en-2.

Assuming p-order convergence of en,

|en|A|en-1|p,

leads to

Ap+1|en-2|p2cA|en-2|p+1|en-2|p2-p-1cA-p.

Since c,A are finite while en0, the above asymptotic relation can only be satisfied if

p2-p-1=0p=1+521.62,

hence the secant method exhibits superlinear, but subquadratic convergence.

Newton-Raphson method.
A different linear approximant arises from the Hermite interpolant based on data

{(xn-1,fn-1=f(xn-1),fn-1'=f'(xn-1))},

which is given in Newton form as

gn(x)=fn-1+fn-1'(x-xn-1),

with root

xn=xn-1-fn-1fn-1', (4)

an iteration known as the Newton-Raphson method. The error is given by

en=xn-x=en-1-fn-1fn-1'. (5)

Taylor series exapnsion around the root gives for small en-1,

en=en-1-f'en-1+12f''en-12+f'+f''en-1+=en-1[1-1+cen-1+1+2cen-1+]en-1[1-(1+cen-1)(1-2cen-1)].

The resulting expression

encen-12=12f''f'en-12, (6)

states quadratic convergence for Newton's method. This faster convergence than the secant method requires however knowledge of the derivative, and the computational expense of evaluating it.

The above estimate assumes convergence of {xn}n, but this is not guaranteed in general. Newton's method requires an accurate initial approximation x0, within a neighborhood of the root in which f is increasing, f'>0, and convex, f''>0. Equivalently, since roots of f are also roots of -f, Newton's method converges when f',f''<0. In both cases (6) in the prior iteration states that en-1=xn-1-r>0, hence xn-1>r . Since f is increasing f(xn-1)>f(r)=0, hence (5) implies en<en-1. Thus the sequence {en}n is decreasing and bounded below by zero, hence limnen=0, and Newton's method converges.

1.2.Second-degree polynomial approximants

An immediate extension of the above approach is to increase the accuracy of the approximant by seeking a higher-degree polynomial interpolant. The expense of the resulting algorithm increases rapidly though, and in practice linear and quadratic approximants are the most widely used. Consider the Hermite interpolant based on data

{(xn-1,fn-1=f(xn-1),fn-1'=f'(xn-1),,fn-1''=f''(xn-1))},

given in Newton form as

gn(x)=fn-1+fn-1'(x-xn-1)+12fn-1''(x-xn-1)2=C+Bs+As2,

with roots

xn=xn-1+-fn-1'±(fn-1')2-2fn-1fn-1''fn-1''.

Tha above exhibits the difficulties arising in higher-order interpolants. The iteration requires evaluation of a square root, and checking for a positive discriminant.

Halley's method.
Algebraic manipulations can avoid the appearance of radicals in a root-finding iteration. As an example, Halley's method

xn=xn-1-2fn-1fn-1'2(fn-1')2-fn-1fn-1'',

exhibits cubic convergence.

2.Composite approximations

The secant iteration

xn=xn-2-fn-2fn-1-fn-2(xn-1-xn-2)=xn-2-fn-2,

in the limit of xn-2xn-1 recovers Newton's method

xn=xn-1-fn-1fn-1'.

This suggests seeking advantageous approximations of the derivative

xn=xn-1-fn-1,

based upon some step-size sequence {hn}. Since f(xn)0, the choice hn-1=f(xn-1) suggests itself, leading to Steffensen's method

xn=xn-1-fn-1=xn-1-fn-1gn-1,gn-1=-1.

Steffensen's method exhibits quadratic convergence, just like Newton's method, but does not require knowledge of the derivative. The higher order by comparison to the secant method is a direct result of the derivative approximation

f'(xn-1),

which, remarkably, utilizes a composite approximation

f(xn-1+f(xn-1))=(f(1+f))(xn-1).

Such composite techniques are a prominent feature of various nonlinear approximations such as a k-layer deep neural network 𝒇(𝒙)=(𝒍k𝒍k-1𝒍1)(𝒙).

3.Fixed-point iteration

The above iterative sequences have the form

xn=F(xn-1),

and the root is a fixed point of the iteration

x=F(x).

For example, in Newton's method

F(x)=x-f(x)f'(x),

and indeed at a root x=F(x). Characterization of mappings F that lead to convergent approximation sequences is of interest and leads to the following definition and theorem.

Definition. A function F:[a,b][a,b] is said to be a contractive mapping if x,y[a,b] there exists c(0,1) such that

|F(x)-F(y)|c|x-y|.

Theorem. (Contractive Mapping theorem). If F:[a,b][a,b] is a contractive mapping then F has a unique fixed point x[a,b], x=F(x).

The fixed point theorem is an entry point to the study of non-additive approximation sequences.

Example 1. The sequence

x1=p,x2=p+p,(p>0)

is expressed recursively as

xn+1=p+xn,

and has the limit

x=p+p+p+,

that is the fixed point of F,

x=F(x)=p+x=1+1+p2.

Over the interval [0,p+1], F is a contraction since

F'(x)=12p+x12p<1.

Example 2. The sequence

x1=1p,x2=1p+1p,(p>0)

is expressed recursively as

xn+1=1p+xn,

and has the limit

x=1p+1p+,

that is the fixed point of F,

x=F(x)=1p+x=-p+p2+12.

Over the interval [0,1], F is a contraction since

|F'(x)|=1(p+x)21p2<1.