Lecture 14: Interpolation

The linear algebra concepts arising from study of linear mappings between vector spaces 𝒇:UV, 𝒇(α𝒖+β𝒗)=αf(𝒖)+β𝒇(𝒗), are widely applicable to nonlinear functions also. The study of nonlinear approximation starts with the simplest case of approximation of a function with scalar values and arguments, f: through additive corrections.

1.Function spaces

An immediate application of the linear algebra framework is to construct vector spaces of real functions =(F,+,), with F={f|f:.}, and the addition and scaling operations induced from ,

(αf+βg)(t)=αf(t)+g(t),f,gF,α,β.

Comparing with the real vector space (m,+,) in which the analogous operation is α𝒖+β𝒗,𝒖,𝒗m,α,β, or componentwise

(α𝒖+β𝒗)i=αui+βvi,i=1,2,,m,

the key difference that arises is the dimension of the set of vectors. Finite-dimensional vectors within m can be regarded as functions defined on a finite set 𝒖u:{1,2,,m}, with u(i)=ui. The elements of F are however functions defined on , a set with cardinality 𝔠=20, with 0 the cardinality of the naturals . This leads to a review of the concept of a basis for this infinite-dimensional case.

1.1.Infinite dimensional basis set

In the finite dimensional case 𝑩m×m constituted a basis if any vector 𝒚m could be expressed uniquely as a linear combination of the column vectors of

𝒚m,!𝒄msuchthat𝒚=𝑩𝒄=c1𝒃1++cm𝒃m.

While the above finite sum is well defined, there is no consistent definition of an infinite sum of vectors. As a simple example, in the vector space of real numbers 1=(,+,), any finite sum of reals is well defined, for instance

Sn=k=0n(-1)k={ 1 ifneven 0 ifnodd .

but the limit Sn cannot be determined. This leads to the necessity of seeking finite-dimensional linear combinations to span a vector space 𝒱=(V,S,+,). First, define linear independence of an infinite (possibly uncountable) set of vectors ={vγ|.γΓ,vγV}, where Γ is some indexing set.

Definition. The vector set ={vγ|.γΓ,vγV},is linearly independent if the only n scalars, x1,,xnS, that satisfy

x1vγ1++xnvγn=0,γiΓ (1)

are x1=0, x2=0,…,xn=0.

The important aspect of the above definition is that all finite vector subsets are linearly independent. The same approach is applied in the definition of a spanning set.

Definition. Vectors within the set ={vγ|.γΓ,vγV},span V, stated as V=span(), if for any uV there exist n scalars, x1,,xnS, such that

x1vγ1++xnvγn=u,γiΓ. (2)

This now allows a generally-applicable definition of basis and dimension.

Definition. The vector set ={vγ|.γΓ,vγV} is a basis for vector space 𝒱=(V,S,+,) if

  1. is linearly independent;

  2. span()=V.

Definition. The dimension of a vector space 𝒱=(V,S,+,) is the cardinality of a basis set , dim(𝒱)=||.

The use of finite sums to define linear independence and bases is not overly restrictive since it can be proven that every vector space has a basis. The proof of this theorem is based on Zorn's lemma from set theory, and asserts exsitence of a basis, but no constructive procedure. The difficulty in practical construction of bases for infinite dimensional vector spaces is ascertained through basic examples.

Example. . As a generalization of m=(m,,+,), consider the vector space of real sequences {xn}n represented as a vectors with a countably infinite number of components 𝒙=[ x1 x2 x3 ]T. Linear combinations are defined by

α𝒙+β𝒚=[ αx1+βy1 αx2+βy2 αx3+βy3 ]T.

Let 𝒆i denote the vector of all zeros except the ith position. In m, the identity matrix 𝑰=[ 𝒆1 𝒆m ] was a basis, but this does not generalize to ; for example the vector 𝒗=[ 1 1 1 ]T cannot be obtained by finite linear combination of the 𝒆i vectors. In fact, there is no countable set of vectors that spans .

Example. P(). The vector space of polynomials P()={p|p(t)=cntn+cn-1tn-1++c0.,n,ci,i=0,1,,N} on the real line has an easily constructed basis, namely the set of the monomials

(t)={tn|n.},

an infinite set with the cardinality as the naturals ||=||=0.

1.2.Alternatives to the concept of a basis

The difficulty in ascribing significance to an infinite sum of vectors i=1𝒗i can be resolved by endowing the vector space with additional structure, in particular a way to define convergence of the partial sums

𝒔n=i=1n𝒗i

to a limit limn𝒔n=𝒗.

Fourier series.
One approach is the introduction of an inner product (𝒖,𝒗) and the associated norm ||𝒖||=(𝒖,𝒗)1/2. A considerable advantage of this approach is that it not only allows infinite linear combinations, but also definition of orthonormal spanning sets. An example is the vector space of continuous functions defined on [-π,π] with the inner product

(f,g)=1π-ππf(t)g(t)dt,

and norm ||f||=(f,f)1/2. An orthonormal spanning set for C[-π,π] is given by

{12}{cos(nx)|n+.}{sin(nx)|n+.}.

Vector spaces with an inner product are known as Hilbert spaces.

Taylor series.
Convergence of infinite sums can be determined through a norm, without the need of an inner product. An example is the space of real-analytic functions with the inf-norm

||f||=supx|f(t)|,

for which a spanning set is given by the monomials {1,t,t2,}, and the infinite exapnsion

f(t)=k=0ak(t-c)k

is convergent, with coefficients given by the Taylor series

f(t)=f(c)+f'(c)1!(t-c)+,ak=f(k)(c)k!.

Note that orthogonality of the spanning set cannot be established, absent an inner product.

1.3.Common function spaces

Several function spaces find widespread application in scientific computation. An overview is provided in Table 1.

B() bounded functions
C() continuous functions Cr() with continuous derivatives up to r
Cc() with compact support Ccr() and compact support
C0() that vanish at infinity C() smooth functions
Lp() with finite p-norm Wk,p() Sobolev space, with norm
||f||p=(-|f(t)|2dt)1/p ||f||k,p=(i=0k||f(i)||pp)1/p

Table 1. Common vector spaces of functions

2.Interpolation

The interpolation problem seeks the representation of a function f known only through a sample data set 𝒟={(xi,yi=f(xi))|i=0,,m.}×, by an approximant p(t), obtained through combination of elements from some family of computable functions, ={b0,,bn|bk:.}. The approximant p(t) is an interpolant of 𝒟 if

p(xi)=f(xi)=yi,i=0,,m,

or p(t) passes through the known poles (xi,yi) of the function f. The objective is to use p(t) thus determined to approximate the function f at other points. Assuming x0<x1<<xm, evaluation of p(t) at t(x0,xm) is an interpolation, while evaluation at t<x0 or t>xm, is an extrapolation. The basic problems arising in interpolation are:

Algorithms for interpolation of real functions can readily be extended to more complicated objects, e.g., interpolation of matrix representations of operators. Implementation is aided by programming language polymorphism as in Julia.

2.1.Additive corrections

As to be expected, a widely used combination technique is linear combination,

p(t)=c0b0(t)+c1b1(t)++cnbn(t).

The idea is to capture the nonlinearity of f(t) through the functions b0(t),,bn(t), while maintaining the framework of linear combinations. Sampling of bj(t) at the poles xi of a data set 𝒟, constructs the vectors

𝒃j=bj(𝒙)=[ bj(x0) bj(xm) ]Tm+1,

which gathered together into a matrix leads to the formulation of the interpolation problem as

𝑩𝒄=𝒚=𝑰𝒚,𝑩(m+1)×(n+1). (3)

Before choosing some specific function set , some general observations are useful.

  1. The function values yi=f(xi),i=0,,m, are directly incorporated into the interpolation problem (3). Any estimate of the error at other points requires additional information on f. Such information can be furnished by bounds on the function values, or knowledge of its derivatives for example.

  2. A solution to (3) exists if 𝒚C(𝑩). Economical interpolations would use n<m functions in the set , hopefully nm.

2.2.Polynomial interpolation

Monomial form of interpolating polynomial.
As noted above, the vector space of polynomials P() has an easily constructed basis, that of the monomials mj(t)=tj which shall be organized as a row vector of functions

(t)=[ 1 t t2 ].

With n+1(t) denoting the first n+1 monomials

n+1(t)=[ 1 t tn ],

a polynomial of degree n is the linear combination

p(t)=n+1(t)𝒂=[ 1 t tn ][ a0 a1 an ]=a0+a1t++antn.

Let 𝑴(m+1)×(n+1) denote the matrix obtained from evaluation of the first n+1 monomials at the sample points 𝒙=[ x0 x1 xm ]T,

𝑴=n+1(𝒙).

The above notation conveys that a finite-dimensional matrix 𝑴(m+1)×(n+1) is obtained from evaluation of the row vector of the monomial basis function (x):n+1, at the column vector of sample points 𝒙m+1. The interpolation condition p(𝒙)=𝒚 leads to the linear system

𝑴𝒂=𝒚. (4)

For a solution to exist for arbitrary 𝒚, 𝑴 must be of full rank, hence m=n, in which case 𝑴 becomes the Vandermonde matrix

𝑴=[ 1 x0 x0n 1 x1 x1n 1 xn xnn ],

known to be ill-conditioned. Since 𝑴 is square and of full rank, (4) has a unique solution.

Finding the polynomial coefficients by solving the above linear system requires 𝒪(n3/3) operations. Evaluation of the monomial form is economically accomplished in 𝒪(n) operations through Horner's scheme

p(t)=a0+(a1++(an-2+(an-1+ant)t)t)t. (5)

Figure 1. Monomial basis over interval [-π,π]

Algorithm (Horner's scheme)

Input: t,𝒂n+1

Output: p(t)=a0+a1t++antn

p=an

for k=n-1 downto 0

p=ak+pt

end

return p

Lagrange form of interpolating polynomial.
It is possible to reduce the operation count to find the interpolating polynomial by carrying out an LU decomposition of the monomial matrix 𝑴,

n+1(𝒙)=𝑴=𝑳𝑼.

Let n+1(t)=[ 0(t) 1(t) n(t) ] denote another set of basis functions that evaluates to the identity matrix at the sample points 𝒙, such that n+1(𝒙)=𝑰,

n+1(𝒙)=𝑴=𝑳𝑼=𝑰𝑳𝑼=n+1(𝒙)𝑳𝑼.

For arbitrary t, the relationship

n+1(t)=n+1(t)𝑳𝑼,

describes a linear mapping between the monomials n+1(t) and the n+1(t) functions, a mapping which is invertible since 𝑴=𝑳𝑼 is of full rank

n+1(t)=n+1(t)𝑼-1𝑳-1.

Note that organization of bases as row vectors of functions leads to linear mappings expressed through right factors.

The LU factorization of the Vandermonde matrix can be determined analytically, as exemplified for n=3 by

( 1 x0 x02 x03 1 x1 x12 x13 1 x2 x22 x23 1 x3 x32 x33 )=( 1 0 0 0 1 1 0 0 1 x0-x2x0-x1 1 0 1 x0-x3x0-x1 (x0-x3)(x3-x1)(x0-x2)(x2-x1) 1 )( 1 x0 x02 x03 0 x1-x0 x12-x02 x13-x03 0 0 (x0-x2)(x1-x2) (x0-x2)(x1-x2)(x0+x1+x2) 0 0 0 -((x0-x3)(x3-x1)(x3-x2)) )

Both factors can be inverted analytically, e.g., for n=3,

𝑳-1=( 1 0 0 0 -1 1 0 0 x1-x2x0-x1 x2-x0x0-x1 1 0 (x1-x3)(x3-x2)(x0-x1)(x0-x2) (x0-x3)(x2-x3)(x0-x1)(x1-x2) (x0-x3)(x1-x3)(x0-x2)(x2-x1) 1 ),
𝑼-1=( 1 x0x0-x1 -x0x1(x0-x2)(x2-x1) x0x1x2(x0-x3)(x3-x1)(x3-x2) 0 1x1-x0 x0+x1(x0-x2)(x2-x1) -x1x2+x0(x1+x2)(x0-x3)(x3-x1)(x3-x2) 0 0 1(x0-x2)(x1-x2) x0+x1+x2(x0-x3)(x3-x1)(x3-x2) 0 0 0 -1(x0-x3)(x3-x1)(x3-x2) ).

The functions that result for n=3

{(t-x1)(t-x2)(t-x3)(x0-x1)(x0-x2)(x0-x3),(t-x0)(t-x2)(t-x3)(x1-x0)(x1-x2)(x1-x3),(t-x0)(t-x1)(t-x3)(x2-x0)(x2-x1)(x2-x3),(t-x0)(t-x1)(t-x2)(x3-x0)(x3-x1)(x3-x2)},

can be generalized as

i(t)=j=0n't-xjxi-xj,

known as the Lagrange basis set, where the prime on the product symbol skips the index j=i. Note that each member of the basis is a polynomial of degree n.

By construction, through the condition n+1(𝒙)=𝑰, a Lagrange basis function evaluated at a sample point is

i(xj)=δij={ 1 i=j 0 ij ..

A polynomial of degree n is expressed as a linear combinations of the Lagrange basis functions by

p(t)=n+1(t)𝒄=[ 0(t) 1(t) n(t) ][ c0 c1 cn ]=c00(t)+c11(t)+cnn(t).

The interpolant of data {(xi,yi=f(xi)),i=0,1,,n} is determined through the conditions

p(𝒙)=𝒚=n+1(𝒙)𝒄=𝑰𝒄=𝒄𝒄=𝒚,

i.e., the linear combination coefficients are simply the sampled function values ci=yi=f(xi).

p(t)=i=0nyii(t)=i=0nyij=0n't-xjxi-xj. (7)

Determining the linear combination coefficients may be without cost, but evaluation of the Lagrange form (7) of the interpolating polynomial requires 𝒪(n2) operations, significantly more costly than the 𝒪(n) operations required by Horner's scheme (5)

Algorithm (Lagrange evaluation)

Input: 𝒙,𝒚n+1,t

Output: p(t)=i=0nyij=0n'(t-xj)/(xi-xj)

p=0

for i=0 to n

w=1

for j=0 to n

if ji then w=w(t-xj)/(xi-xj)

end

p=p+wyi

end

return p

function Lagrange(t,x,y)
  n=length(x)-1; p=0
  for i=1:n+1
    w=1
    for j=1:n+1
      if (i!=j) w=w*(t-x[j])/(x[i]-x[j]); end
    end
    p = p + w*y[i]
  end
  return p
end;
p2(t)=3*t^2-2*t+1;
x=[-2 0 2]; y=p2.(x);
t=-3:3; [p2.(t) Lagrange.(t,Ref(x),Ref(y))]

[ 34.0 34.0 17.0 17.0 6.0 6.0 1.0 1.0 2.0 2.0 9.0 9.0 22.0 22.0 ] (8)

Figure 2. Lagrange basis for n=6 for sin(x) over interval [-π,π]

n=6; x=range(-π,π,length=n+1); y=sin.(x);
t=range(-π,π,length=10*n);
M=ones(length(t),length(x));
function lagrange(j,t,x,y)
  n=length(x)-1; l=1
  for i=1:n+1
    if (i!=j) l = l*(t-x[i])/(x[j]-x[i]); end
  end
  return l
end;

for j=1:n+1
  M[:,j] = lagrange.(j,t,Ref(x),Ref(y))
end
figure(1); clf();
for j=1:n+1
  plot(t,M[:,j])
end
xlabel("t"); ylabel("l(i,t)"); grid("on"); title("Lagrange basis");
imdir=home*"/courses/MATH661/images/";
savefig(imdir*"L18LagrangeBasis.eps")

A reformulation of the Lagrange basis can however reduce the operation count. Let (t)=k=0n(t-xk), and rewrite i(t) as

i(t)=j=0n't-xjxi-xj=(t)wit-xi,

with the weights

wi=j=0n'1xi-xj,

depending only on the function sample arguments xi, but not on the function values yi. The interpolating polynomial is now

p(t)=i=0nyii(t)=(t)i=0nyiwit-xi.

Interpolation of the function g(t)=1 would give

1=(t)i=0nwit-xi,

and taking the ratio yields

p(t)=, (9)

known as the barycentric Lagrange formula (by analogy to computation of a center of mass). Evaluation of the weights wi costs 𝒪(n2) operations, but can be done once for any set of xi. The evaluation of p(t) now becomes an 𝒪(2n) process, comparable in cost to Horner's scheme.

Algorithm (Barycentric Lagrange evaluation)

Input: 𝒙,𝒚n+1,t

Output: p(t)=(i=0nyi)/(i=0n)

for i=0 to n

wi=1

for j=0 to n

if ji wi=wi/(xi-xj)

end

end

q=r=0

for i=0 to n

s=wi/(t-xi); q=q+yis; r=r+s

end

p=q/r

return p

function BaryLagrange(t,x,y)
  n=length(x)-1; w=ones(size(x));
  for i=1:n+1
    w[i]=1
    for j=1:n+1
      if (i!=j) w[i]=w[i]/(x[i]-x[j]); end
    end
  end
  q=r=0
  for i=1:n+1
    d=t-x[i]
    if d≈0 return y[i]; end
    s=w[i]/d; q=q+y[i]*s; r=r+s
  end
  return q/r
end;
p2(t)=3*t^2-2*t+1;
x=[-2 0 2]; y=p2.(x);
t=-3:3; [p2.(t) BaryLagrange.(t,Ref(x),Ref(y))]

[ 34 33.99999999999999 17 17 6 6.0 1 1 2 2.0 9 9 22 21.999999999999996 ] (10)

Newton form of interpolating polynomial.
Inverting only one factor of the n+1(t)=n+1(t)𝑳𝑼 mapping yields yet another basis set 𝒮(t)=[ N0(t) N1(t) N2(t) ]

n+1(t)𝑼-1=n+1(t)𝑳=𝒮n+1(t).

The first four basis polynomials are

{1,t-x0x1-x0,(t-x0)(t-x1)(x2-x0)(x2-x1),(t-x0)(t-x1)(t-x2)(x3-x0)(x3-x1)(x3-x2)},

with N0(t)=1, and in general

Nk(t)=j=0k-1t-xjxk-xj,

for k>0.

Computation of the scaling factors wk=1/j=0k-1(xk-xj) would require 𝒪(n2/2) operations, but can be avoided by redefining the basis set as 𝒩(t)=[ n0(t) n1(t) n2(t) ], with n0(t)=1, and

nk(t)=j=0k-1(t-xj),

known as the Newton basis. As usual, the coefficients 𝒅n+1 of the linear combination of Newton polynomials

p(t)=𝒩n+1(t)𝒄=[ n0(t) n1(t) nn(t) ][ d0 d1 dn ]=d0n0(t)+d1n1(t)++dnnn(t),

are determined from the interpolation conditions p(𝒙)=𝒚. The resulting linear system is of triangular form,

[ 1 0 0 0 1 x1-x0 0 0 1 x2-x0 (x2-x0)(x2-x1) 0 1 x3-x0 (x3-x0)(x3-x1) 0 1 xn-x0 (xn-x0)(xn-x1) j=0n-1(xn-xj) ][ d0 d1 d2 dn ]=[ y0 y1 y2 yn ]

and readily solved by forward substitution.

The first few coefficients are

d0=y0,d1=y1-d0x1-x0=y1-y0x1-x0,
d2=y2-(x2-x0)d1-d0(x2-x0)(x2-x1)=y2-(x2-x0)-y0(x2-x0)(x2-x1)=-x2-x0.

The forward substitution is efficiently expressed through the definition of divided differences

[yi]=yi,[yi+1,yi]=[yi+1]-[yi]xi+1-xi=yi+1-yixi+1-xi,[yi+2,yi+1,yi]=[yi+2,yi+1]-[yi+1,yi]xi+2-xi,

or in general, the kth divided difference

[yi+k,yi+k-1,,yi]=[yi+k,yi+k-1,,yi+1]-[yi+k-1,yi+k-1,,yi]xi+k-xi,

given in terms of the (k-1) divided differences. The forward substitution computations are conveniently organized in a table, useful both for hand computation and also for code implementation.

i xi [yi] [yi,yi-1] [yi,yi-1,yi-2] 0 x0 y0 - - 1 x1 y1 - 2 x2 y2 n xn yn

Table 2. Table of divided differences. The Newton basis coefficients 𝒅 are the diagonal terms.

Algorithm (Forward substitution, Newton coefficients)

Input: 𝒙,𝒚n+1

Output: 𝒅n+1

𝒅=𝒚

for i=1 to n

for j=n downto i

dj=(dj-dj-1)/(xj-xj-i)

end

end

The above algorithm requires only 𝒪(n) operations, and the Newton form of the interpolating polynomial

p(t)=[y0]1+[y1,y0](t-x0)+[y1,y0](t-x0)(t-x1)++[yn,yn-1,,y0](t-x0)(t-x1)(t-xn-1),

can also be evaluated in 𝒪(n) operations

Algorithm (Newton polynomial evaluation)

Input: 𝒙,𝒅n+1,t

Output: p(t)=d0+d1t++dntn

p=d0; r=1

for k=1 to n

r=r(t-xk-1)

p=p+dkr

end

return p

Figure 3. Newton basis for n=6 for sin(x) over interval [-π,π]

3.Interpolation error

As mentioned, a polynomial interpolant of f: already incorporates the function values yi=f(xi),i=0,,n, so additional information on f is required to estimate the error

e(t)=f(t)-p(t),

when t is not one of the sample points. One approach is to assume that f is smooth, fC(), in which case the error is given by

f(t)-p(t)=f(n+1)(ξt)(n+1)!i=0n(t-xi)=f(n+1)(ξt)(n+1)!w(t), (13)

for some ξt[x0,xn], assuming x0<x1<<xn. The above error estimate is obtained by repeated application of Rolle's theorem to the function

Φ(u)=f(u)-p(u)-f(t)-p(t)w(t)w(u),

that has n+2 roots at t,x0,x1,,xn, hence its (n+1)-order derivative must have a root in the interval (x0,xn), denoted by ξt

Φ(n+1)(ξt)=dn+1Φdun+1(ξt)=0=f(n+1)(ξt)-f(t)-p(t)w(t)(n+1)!,

establishing (13). Though the error estimate seems promising due to the (n+1)! in the denominator, the derivative f(n+1) can be arbitrarily large even for a smooth function. This is the behavior that arises in the Runge function f(t)=1/[1+(5t)2] (Fig. 4), for which a typical higher-order derivative appears as

f(10)=35437500000000(107421875t10-64453125t8+7218750t6-206250t4+1375t2-1)(25t2+1)11,||f(10)||3.5×1013.

The behavior might be attributable to the presence of poles of f in the complex plane at t1,2=±i/5, but the interpolant of the holomorphic function g(t)=exp(-(5t)2), with a similar power series to f,

f(t)1-25t2+625t4-15625t6+O(t7), g(t)1-25t2+625t42-15625t66+O(t7),

also exhibits large errors (Fig. 4), and also has a high-order derivative of large norm ||g||3×1011.

g(10)(t)=1562500000e-25t2(62500000t10-56250000t8+15750000t6-1575000t4+47250t2-189),

Figure 4. Interpolants of f(t)=1/[1+(5t)2], g(t)==exp(-(5t)2), based on equidistant sample points.

function MonomialBasis(t,n)
  m=size(t)[1]; A=ones(m,1);
  for j=1:n-1    
    A = [A t.^j]
  end
  return A
end;
function plotInterp(a,b,f,Basis,m,n,M,txt)
  data=sample(a,b,f,m); t=data[1]; y=data[2]
  Data=sample(a,b,f,M); T=Data[1]; Y=Data[2]
  A = Basis(t,n); x = A\y; z = Basis(T,n)*x
  plot(t,y,"ok",T,z,"-r",T,Y,"-b"); grid("on");
  xlabel("t"); ylabel("y");
  title(txt)
end;
function sample(a,b,f,m)
  t = LinRange(a,b,m); y = f.(t)
  return t,y
end;
f(t)=1/(1+25*t^2); g(t)=exp(-(5*t)^2);
FigPrefix=homedir()*"/courses/MATH661/images/L19";
clf(); plotInterp(-1,1,f,MonomialBasis,10,10,100,"Interpolant of f");
savefig(FigPrefix*"Fig01a.eps")
clf(); plotInterp(-1,1,g,MonomialBasis,10,10,100,"Interpolant of g");
savefig(FigPrefix*"Fig01b.eps")

3.1.Error minimization - Chebyshev polynomials

Since ||f(n+1)|| is problem-specific, the remaining avenue to error control suggested by formula (13) is a favorable choice of the sample points xi, i=0,,n. The w(t) polynomial

w(t)=i=0n(t-xi)

is monic (coefficient of highest power is unity), and any interval [a,b] can be mapped to the [-1,1] interval by t=2(s-a)/(b-a)-1, leading to consideration of the problem

min𝒙||w||=min𝒙max-1t1|w(t)|,

i.e., finding the sample points or roots of w(t) that lead to the smallest possible inf-norm of w(t). Plots of the Lagrange basis (L18, Fig. 2), or Legendre basis, suggest study of basis functions that have distinct roots in the interval [-1,1] and attain the same maximum. The trigonometric functions satisfy these criteria, and can be used to construct the Chebyshev family of polynomials through

Tn(x)=cos[ncos-1x]=cos(nθ),cosθ=x,θ=cos-1x.

The trigonometric identity

cos[(n+1)θ]+cos[(n-1)θ]=2cosθcos(nθ)

leads to a recurrence relation for the Chebyshev polynomials

Tn+1(x)=2xTn(x)-Tn-1(x),T0(x)=1,T1(x)=x.

The definition in terms of the cosine function easily leads to the roots, Tn(xi)=0,

cos[nθ]=0nθi=(2i-1)π2θi=2i-12nπxi=cos[2i-12nπ],i=1,,n,

and extrema xj, Tn(xj)=(-1)j

cos[nθ]=±1nθj=jπxj=cos[jπn],j=0,1,,n.

The Chebyshev polynomials are not themselves monic, but can readily be rescaled through

Pn(x)=21-nTn(x),n>0,P0(x)=1.

Since |Tn(x)|=|cos(nθ)|, the above suggests that the monic polynomials Pn have ||Pn||=21-n, small for large n, and are indeed among all possible monic polynomials defined on [-1,1] the ones with the smallest inf-norm.

Theorem. The monic polynomial p:[-1,1] of degree n has a inf-norm lower bound

||p||=max-1t1|p(t)|21-n.

Proof. By contradiction, assume the monic polynomial p:[-1,1] has ||p||<21-n. Construct a comparison with the Chebyshev polynomials by evaluating p at the extrema xj=cos(jπ/n),

(-1)jp(xj)|p(xj)|<21-n=(-1)jPn(xj)=(-1)j21-nTn(xj).

Since the above states (-1)jp(xj)<(-1)jPn(xj) deduce

(-1)j[p(xj)-Pn(xj)]<0,forj=0,1,,n (14)

However, p,Pn both monic implies that p(xj)-Pn(xj) is a polynomial of degree n-1 that would change signs n times to satisfy (14), and thus have n roots contradicting the fundamental theorem of algebra.

Figure 5. First n=6 Chebyshev polynomials

3.2.Best polynomial approximant

Based on the above, the optimal choice of n+1 sample points is given by the roots xj=cos(θj) of the Chebyshev polynomial of (n+1)th degree Tn+1(x), for which cos[(n+1)θ]=0,

xj=cos[πn+1(12+j)],j=0,,n,

For this choice of sample points the interpolation error has the bound

|f(t)-pn(t)|=|f(n+1)(ξt)(n+1)!i=0n(t-xi)||f(n+1)(ξt)|(n+1)!||Pn+1||||f(n+1)||(n+1)!2n.