Lecture 25: Introduction to nonlinear approximation

All of the approximation techniques presented so far have been based upon linear approximation. For example, the polynomial interpolant

pm(t)=a0+a1t++amtm

of function f(t) based upon data set 𝒟={(xi,yi),i=0,1,,m} is linear in the unkown coefficients a0,,am. A topic of current research is the development of nonlinear approximation procedures, for instance an approximation of f: by g:

f(t)g(t,𝒂),

where g is nonlinear in the parameters 𝒂n. Whereas the fundamental theorem of linear algebra completely characterizes linear approximants, there is currently no complete theory of nonlinear approximants. This leads to the ubiquity of heuristic techniques such as the mimicking of biological neurons that leads to artificial neuron approximants. The moniker of “machine learning” has been associated with adoption of such techniques, though the field is perhaps more insightfully seen as a natural development of linear approximants to consideration of nonlinear approximants.

1.Historical analogue - operator calculus

The appearance of heuristic solution techniques in nonlinear approximation is typical of exploration of new mathematical fields. An instructive comparable development is the refinement of the formal rules of Heaviside operator calculus into a complete theory of distributions after some six decades of mathematical research.

1.1.Heavisde study of telgraphist equation

In late nineteenth century, telegrapher's equations, a system of linear PDEs for current I(x,t) and voltage V(x,t)

xV(x,t)=-LtI(x,t)-RI(x,t)
xI(x,t)=-CtC(x,t)-GV(x,t)

Heaviside avoided solution of the PDEs by reduction to an algebraic formulation historical formulation, e.g., for the ODE for y(t)

dydt+ay=b

Heaviside considered the associated algebraic problem for Y(s)

sY+aY=bY(s)=ba+sy(t)=-1[Y(s)]

1.2.Development of mathematical theory of operator calculus

Why should I refuse a good dinner simply because I don't understand the digestive processes involved? (Heaviside, ?)

Heaviside's formal framework (1890's) for solving ODEs was discounted since it lacked mathematical rigour.

2.Basic approximation theory

Consider function f:d, d1 assumed large, f of unknown form, difficult to compute for general input. Seek g:n, T:dn such that

||f-gT||<ε

for some ε>0.

2.1.Linear approximation example

Choose a basis set (Monomials, Exponentials, Wavelets) {ϕ1,ϕ2,} to approximation of L2() functions in Hibert space

gn(t)=j=1n(f,ϕj)ϕj=j=1ncjϕj

The approximation is convergent if

limn||f-gT||=0,

This assumes cj=(f,ϕj) rapidly decrease.

Theorem. (Parseval) The Fourier transform is unitary. For A,B:, square integrable, 2π-periodic with Fourier series

A(t)=n=-aneint,B(t)=n=-bneint,

n=-anbn=12π-ππA(t)B(t)dt.

Bessel inequality:

j=1n|(f,ϕj)|2||f||2.

Fourier coefficient decay: for fC(k-1)(), f(k-1) absolutely continuous,

|cn|min0jk||f(j)||1|n|j.

In practice: coefficients decay as

Fourier coefficients for analytic functions decay faster than any monomial power cn=ο(n-p),p, a property known as exponential convergence.

Denote such approximations by , and they are linear

(αf+βg)=α(f)+β(g)

2.2.Non-Linear approximation example

Choose a basis set (Monomials, Exponentials, Wavelets) {ϕ1,ϕ2,} to approximation of L2() functions in Hibert space

gn(t)=j=1ncjϕj

Let Φn={φk(1),φk(2),,φk(n)} such

(f,φk(1))(f,φk(2))(f,φk(n)).

Choose cj=(f,φk(j)), and

gn(t)=j=1ncjϕj.

Denote such approximations by 𝒢, and they are non-linear.

3.Nonlinear approximation by composition

Consider function f:d, d1 assumed large, f of unknown form, difficult to compute for general input. Seek g:n, T:dn such that

||f-gT||<ε

for some ε>0.

What questions do you ask?

Does Texist?

f,ε,T,suchthat||f-gT||<ε

Can arbitrary ε be achieved?

Can we construct T?

How big is n?