All of the approximation techniques presented so far have been based upon linear approximation. For example, the polynomial interpolant
of function based upon data set is linear in the unkown coefficients . A topic of current research is the development of nonlinear approximation procedures, for instance an approximation of by
where is nonlinear in the parameters . Whereas the fundamental theorem of linear algebra completely characterizes linear approximants, there is currently no complete theory of nonlinear approximants. This leads to the ubiquity of heuristic techniques such as the mimicking of biological neurons that leads to artificial neuron approximants. The moniker of “machine learning” has been associated with adoption of such techniques, though the field is perhaps more insightfully seen as a natural development of linear approximants to consideration of nonlinear approximants.
The appearance of heuristic solution techniques in nonlinear approximation is typical of exploration of new mathematical fields. An instructive comparable development is the refinement of the formal rules of Heaviside operator calculus into a complete theory of distributions after some six decades of mathematical research.
In late nineteenth century, telegrapher's equations, a system of linear PDEs for current and voltage
Heaviside avoided solution of the PDEs by reduction to an algebraic formulation historical formulation, e.g., for the ODE for
Heaviside considered the associated algebraic problem for
Why should I refuse a good dinner simply because I don't understand the digestive processes involved? (Heaviside, ?)
Heaviside's formal framework (1890's) for solving ODEs was discounted since it lacked mathematical rigour.
Russian mathematician 1920's established first results (Vladimirov)
Theory of Distributions (Schwartz, 1950s)
Consider function , assumed large, of unknown form, difficult to compute for general input. Seek , such that
for some .
Choose a basis set (Monomials, Exponentials, Wavelets) to approximation of functions in Hibert space
The approximation is convergent if
This assumes rapidly decrease.
Bessel inequality:
Fourier coefficient decay: for , absolutely continuous,
In practice: coefficients decay as
for functions with discontinuities on a set of Lebesgue measure 0;
for functions with discontinuous first derivative on a set of Lebesgue measure 0;
for functions with discontinuous second derivative on a set of Lebesgue measure 0.
Fourier coefficients for analytic functions decay faster than any monomial power , a property known as exponential convergence.
Denote such approximations by , and they are linear
Choose a basis set (Monomials, Exponentials, Wavelets) to approximation of functions in Hibert space
Let such
Choose =, and
Denote such approximations by , and they are non-linear.
Consider function , assumed large, of unknown form, difficult to compute for general input. Seek , such that
for some .
What questions do you ask?
By what procedure?
with simple modifications of identity (ReLU)
At what cost?