MATH661

Lecture 24: Nonlinear Vector Operator Equations

1.Multivariate root-finding algorithms

Consider now nonlinear finite-dimensional mappings $𝒇 : ℝ^{d} \to ℝ^{d}$ , and the root-finding problem

𝒇 (𝒙) = 𝟎,

(1)

whose set of solutions generalize the linear mapping concept of a null space, $N (𝑨) = {𝒙 | 𝑨 𝒙 = 𝟎 ., 𝑨 \in ℂ^{d \times d}}$ . As in the scalar-valued case, algorithms are sought to construct an approximating sequence ${𝒙_{k}}_{k \in ℕ}$ whose limit is a root of (1), by approximating $𝒇$ with $𝒈_{k}$ , and solving

𝒈_{k} (𝒙) = 0 .

(2)

Multivariate approximation is however considerably more complex than univariate approximation. For example, consider $d = 2$ , $𝒇 : ℝ^{2} \to ℝ^{2}$ , and the univariate monomial interpolants in Lagrange form

ℒ_{t} 𝒇 (s, t) = \sum_{i = 0}^{m} 𝒇 (x_{i}, t) l_{i}^{x} (s), ℒ_{s} 𝒇 (s, t) = \sum_{j = 0}^{n} 𝒇 (s, y_{j}) l_{j}^{y} (t),

with

l_{i}^{x} (s) = \prod_{k = 0}^{m}^{'} \frac{s - x_{k}}{x_{i} - x_{k}}, l_{j}^{y} (s) = \prod_{l = 0}^{n}^{'} \frac{t - y_{l}}{y_{j} - y_{l}} .

The operator $ℒ_{t}$ carries out interpolation at fixed $t$ value of the data set $𝒟_{x} = {(x_{i}, 𝒇 (x_{i}, t)), i = 0, \dots, m}$ . Similarly, operator $ℒ_{s}$ carries out interpolation at fixed $s$ value of the data set $𝒟_{y} = {(y_{j}, 𝒇 (s, y_{j})), j = 0, \dots, n}$ . Multivariate interpolation of the data set

𝒟 = {(x_{i}, y_{j}, 𝒇 (x_{i}, y_{j})), i = 0, \dots, m, j = 0, \dots, n},

can be carried out through multiple operator composition procedures.

Operator product: Define $ℒ = ℒ_{t} \otimes ℒ_{s}$ as
$ℒ 𝒇 (s, t) = (ℒ_{t} ℒ_{s}) 𝒇 (s, t) = ℒ_{t} (ℒ_{s} 𝒇 (s, t)) = ℒ_{t} (\sum_{i = 0}^{m} 𝒇 (x_{i}, t) l_{i}^{x} (s)) = \sum_{i = 0}^{m} \sum_{j = 0}^{n} 𝒇 (x_{i}, y_{j}) l_{i}^{x} (s) l_{j}^{y} (t) .$
Operator Boolean sum: Define $ℒ = ℒ_{t} \oplus ℒ_{s}$ as $ℒ = ℒ_{t} + ℒ_{s} - ℒ_{t} ℒ_{s}$
$ℒ 𝒇 (s, t) = \sum_{i = 0}^{m} 𝒇 (x_{i}, t) l_{i}^{x} (s) + \sum_{j = 0}^{n} 𝒇 (s, y_{j}) l_{j}^{y} (t) - \sum_{i = 0}^{m} \sum_{j = 0}^{n} 𝒇 (x_{i}, y_{j}) l_{i}^{x} (s) l_{j}^{y} (t) .$

1.1.First-degree polynomial approximants

Secant method.

Bivariate (

d = 2

) root-finding algorithms already exemplifies the additional complexity in constructing root finding algorithms. The goal is to determine a new approximation

(x_{k}, y_{k})

from the prior approximants

(x_{0}, y_{0}), \dots, (x_{k - 2}, y_{k - 2}), (x_{k - 1}, y_{k - 1}) .

Whereas in the scalar case two prior points allowed construction of a linear approximant, the two points in data

𝒟 = {(x_{k - 2}, y_{k - 2}), (x_{k - 1}, y_{k - 1})}

are insufficient to determine

ℒ 𝒇 = \sum_{i = k - 2}^{k - 1} \sum_{j = k - 2}^{k - 1} 𝒇 (x_{i}, y_{j}) l_{i}^{x} (s) l_{j}^{y} (t),

which requires four data points. Various approaches to exploit the additional degrees of freedom are available, of which the class of quasi-Newton methods finds widespread applicability.

Newton, quasi-Newton methods.

A linear multivariate approximant in

d

dimensions requires

2^{d}

data. A Hermite interpolant based upon function and partial derivative values can be constructed, but it is more direct to truncate the multivariate Taylor series

𝒇 (𝒙) = 𝒇 (𝒙_{k}) + \frac{\partial 𝒇}{\partial 𝒙} (𝒙_{k}) (𝒙 - 𝒙_{k}) + \dots,

where

𝑱 = \frac{\partial 𝒇}{\partial 𝒙} = [\begin{array}{llll} \frac{\partial f_{1}}{\partial x_{1}} & \frac{\partial f_{1}}{\partial x_{2}} & \dots & \frac{\partial f_{1}}{\partial x_{d}} \\ \frac{\partial f_{2}}{\partial x_{1}} & \frac{\partial f_{2}}{\partial x_{2}} & \dots & \frac{\partial f_{2}}{\partial x_{d}} \\ ⋮ & ⋮ & ⋱ \\ \frac{\partial f_{d}}{\partial x_{1}} & \frac{\partial f_{d}}{\partial x_{2}} & \dots & \frac{\partial f_{d}}{\partial x_{d}} \end{array}] = \nabla 𝒇,

is the Jacobian matrix of $𝒇$ . Setting $𝒇 (𝒙_{k + 1}) = 𝟎$ , as the condition for the next iterate leads to the update

𝑱 (𝒙_{k}) (𝒙_{k + 1} - 𝒙_{k}) = - 𝒇 (𝒙_{k}),

a linear system that is solved at each iteration. Computation of the multiple partial derivatives arising in the Jacobian might not be possible or too expensive, hence approximations are sought $𝑩_{k} ≅ 𝑱 (𝒙_{k})$ , similar in principle to the approximation of a tangent by a secant. In such quasi-Newton methods, a secant condition on $𝑩_{k}$ is stated as

𝑩_{k} (𝒙_{k} - 𝒙_{k - 1}) = 𝒇 (𝒙_{k}) - 𝒇 (𝒙_{k - 1}),

and corresponds to a truncation of the Taylor series expansion around $𝒙_{k - 1}$ . The above secant condition is not sufficient by itself to determine $𝑩_{k}$ , hence additional considerations can be imposed.

Recalling that the scalar Newton method for finding roots of $f (x) = 0$ converges in a region where $f^{'}, f^{''} > 0$ , imposing analogous behavior for $𝑩_{k}$ suggests itself. This is typically done by requiring $𝑩_{k}$ to be symmetric positive definite.
Assuming convergence of the approximating sequence ${𝒙_{k}}_{k \in ℕ}$ to a root, $𝑩_{k + 1}$ should be close to the previous approximation suggesting the condtion
${min}_{𝑩_{k + 1}} || 𝑩_{k + 1} - 𝑩_{k} || .$
Various algorithms arise from a particular choice of norm and procedure to apply (2).

One widely used quasi-Newton method, arising from a rank-two update at each iteration to maintain positive definiteness, is the Broyden-Fletcher-Goldfard-Shanno update

𝑩_{k + 1} = 𝑩_{k} + \frac{𝒚_{k} 𝒚_{k}^{T}}{𝒚_{k}^{T} 𝒔_{k}} - \frac{𝑩_{k} 𝒔_{k} 𝒔_{k}^{T} 𝑩_{k}^{T}}{𝒔_{k}^{T} 𝑩_{k} 𝒔_{k}},

where the updates are determined by

Solving $𝑩_{k} 𝒑_{k} = - [𝒇 (𝒙_{k}) - 𝒇 (𝒙_{k - 1})]$ to find a search direction $𝒑_{k}$ ;
Finding the distance along the search direction by $α_{k} = argmin {|| 𝒇 (𝒙_{k} + α_{k} 𝒑_{k}) ||}_{2}$ ;
Updating the approximation $𝒔_{k} = α_{k} 𝒑_{k}$ , $𝒙_{k + 1} = 𝒙_{k} + 𝒔_{k}$
Computing $𝒚_{k} = 𝒇 (𝒙_{k + 1}) - 𝒇 (𝒙_{k})$ .