MATH661

1.Spectral approximations

The monomial basis ${1, t, t^{2}, \dots}$ for the vector space of all polynomials $P (ℝ)$ , and its derivatives (Lagrange, Newton, $B - spline$ ) allow the definition of an approximant $p \in P (ℝ)$ for real functions $f : ℝ \to ℝ$ , e.g., for smooth functions $f \in C^{\infty} (ℝ)$ . A different approach to approximation in infinite-dimensional vector spaces such as $P (ℝ)$ or $C^{\infty} (ℝ)$ is to endow the vector space with a scalar product $(f, g)$ and associated norm $|| f || = {(f, f)}^{1 / 2}$ . The availability of a norm allows definition of convergence of sequences and series.

Definition. A sequence ${f_{n}}_{n \in ℕ}$ of elements of the normed vector space $ℱ = (F, ℂ, +, \cdot)$ converges to $f$ , $f_{n} \to f$ if $\forall ε > 0$ , $\exists N (ε)$ such that $|| f_{n} - f || < ε$ for all $n > N (ε)$ .

Definition. The vector space $ℱ = (F, ℂ, +, \cdot)$ with a scalar product $(,) : F \times F \to ℂ$ is a Hilbert space if the limit of all Cauchy sequences is an element of $F$ .

All Hilbert spaces have orthonormal bases, and of special interest are bases that arise Sturm-Liouville problems of relevance to the approximation task.

1.1.Fourier series - Fast Fourier transform

The $L^{2} ([0, 2 π])$ space of periodic, square-integrable functions is a Hilbert space ( $L^{2}$ is the only Hilbert space among the $L^{p}$ function spaces), and has a basis

{\frac{1}{2}, \cos t, \sin t, \dots, \cos k t, \sin k t, \dots}

that is orthonormal with respect to the scalar product

(f, g) = \frac{1}{π} \int_{0}^{2 π} f (t) \overline{g (t)} d t .

An element $f \in L^{2} ([0, 2 π])$ can be expressed as the linear combination

f (t) = \frac{a_{0}}{2} + \sum_{k = 1}^{\infty} [a_{k} \cos k t + b_{k} \sin k t] .

An alternative orthonormal basis is formed by the exponentials

{e^{\pm i n t}}, n \in ℕ,

with respect to the scalar product

(f, g) = \frac{1}{2 π} \int_{0}^{2 π} f (t) \overline{g (t)} d t .

The partial sum

S_{N} f (t) = \sum_{k = - N}^{N} c_{k} e^{i k t}

has coefficients $F_{k}$ determined by projection

c_{k} = (f, e^{i k t}) = \frac{1}{2 π} \int_{0}^{2 π} f (t) e^{- i k t} d t,

that can be approximated by the Darboux sum on the partition $t_{j} = 2 π j / N$

c_{k} ≅ \frac{1}{N} \sum_{j = 1}^{N} f_{j} e^{- i k t_{j}} = \frac{1}{N} \sum_{j = 0}^{N} f_{j} ω_{N}^{- j k}

with

ω = \exp [\frac{2 π i}{N}],

denoting the $N^{th}$ root of unity. The Fourier coefficients are obtained through a linear mapping

𝒄 = 𝑾 𝒇,

with $𝒄, 𝒇 \in ℂ^{N}$ , and $𝑾 \in ℂ^{N \times N}$ with elements

𝑾 = {[ω^{- j k}]}_{1 ⩽ j, k ⩽ N} .

The above discrete Fourier transform can be seen as a change of basis from the basis $𝑰$ in which the coefficients of $f$ are $𝒄$ to the basis $𝑾$ in which the coefficients are $𝒇$ .

1.2.Fast Fourier transform

Carrying out the matrix vector product $𝑾 𝒇$ directly would require $𝒪 (N^{2})$ operations, but the cyclic structure of the $𝑾$ matrix arising from the exponentiation of $ω$ can be exploited to reduce the computational effort. Assume $N = 2 P$ and separate even and odd indexed components of $𝒇$

c_{k} = \sum_{j = 1}^{N} f_{j} ω_{N}^{- j k} = \sum_{j = 1}^{P} [f_{2 j - 1} ω_{N}^{- (2 j - 1) k} + f_{2 j} ω_{N}^{- 2 j k}] = \sum_{j = 1}^{P} f_{2 j} ω_{P}^{- j k} + ω^{k} \sum_{j = 1}^{P} f_{2 j - 1} ω_{P}^{- j k} .

Through the above, the $𝒪 (N^{2})$ matrix-vector product is reduced to two smaller matrix-vector products, each requiring $𝒪 (N^{2} / 4)$ operations. For $N = 2^{q}$ , recursion of the above procedure reduces the overall operation count to $𝒪 (q N)$ , or in general for $N$ composed of a small numer of prime factors, $𝒪 (N \log N)$ . The overall algorithm is known as the fast Fourier transform or FFT.

1.3.Data-sparse matrices from Sturm-Liouville problems

One step of the FFT can be understood as a special matrix factorization

𝑾_{N} = [\begin{array}{ll} 𝑰 & 𝑫_{N} \\ 𝑰 & - 𝑫_{N} \end{array}] [\begin{array}{ll} 𝑾_{P} & 𝟎 \\ 𝟎 & 𝑾_{P} \end{array}] 𝑷_{N}

where $𝑫_{N}$ is diagonal and $𝑷_{N}$ is the even-odd permutation matrix. Though the matrix $𝑾_{N}$ is full (all elements are non-zero), its factors are sparse, with many zero elements. The matrix $𝑾_{N}$ is said to be data sparse, in the sense that its specification requires many fewer than $N^{2}$ numbers. Other examples of data sparse matrices include:

Toeplitz matrices

$𝑨 \in ℂ^{m \times m}$ has constant diagonal terms, e.g., for $m = 4$

𝑨 = [\begin{array}{llll} a & b & c & d \\ e & a & b & c \\ f & e & a & b \\ g & f & e & a \end{array}],

or in general the elements of $𝑨 = {[a_{i j}]}_{1 ⩽ i, j ⩽ m}$ can be specified in terms of $2 m - 1$ numbers $a_{1 - n}, \dots, a_{n - 1}$ through $a_{i j} = a_{i - j}$ .

Exterior products

Rank-1 updates arising in the singular value or eigenvalue decompositions have the form

𝑨 = 𝒖 𝒗^{T} = [\begin{array}{llll} v_{1} 𝒖 & v_{2} 𝒖 & \dots & v_{m} 𝒖 \end{array}],

and the $2 m$ components of $𝒖, 𝒗$ are suficient to specify the matrix $𝑨$ with $m^{2}$ components. This can be generalized to any exterior product of matrices $𝑩 \in ℂ^{n \times n}$ , $𝑪 \in ℂ^{p \times p}$ through

𝑨 = 𝑩 \otimes 𝑪 = [\begin{array}{llll} 𝒃_{1} \otimes 𝑪 & 𝒃_{2} \otimes 𝑪 & \dots & b_{n} \otimes 𝑪 \end{array}] = [\begin{array}{llll} b_{11} 𝑪 & b_{12} 𝑪 & \dots & b_{1 n} 𝑪 \\ b_{21} 𝑪 & b_{22} 𝑪 & \dots & b_{2 n} 𝑪 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ b_{n 1} 𝑪 & b_{n 2} 𝑪 & \dots & b_{n n} 𝑪 \end{array}] .

The $m^{2} = {(n p)}^{2}$ components of $𝑨$ are specified through only $n^{2} + p^{2}$ components of $𝑩, 𝑪$ .

The relevance to approximation of functions typically arises due basis sets that are solutions to Sturm-Liouville problems. In the case of the Fourier transform $e^{\pm i k t}$ are eigenfunctions of the Sturm-Liouville problem

w^{''} + λ w = 0, w = u + i v, u^{'} (0) = u^{'} (π) = 0, v (0) = v (π) = 0,

with eigenvalues $λ_{n} = k^{2}$ . The solution set ${φ_{1}, φ_{2}, \dots}$ to a general Sturm-Liouville problem to find $f : [a, b] \to ℝ$

\frac{d}{d t} [p (t) \frac{d f}{d t}] + q (t) f = - λ w (t) f,

form an orthonormal basis under the scalar product

(f, g) = \int_{a}^{b} f (t) g (t) w (t) d t,

and approximations of the form

Φ_{N} f (t) = \sum_{k = 1}^{N} c_{k} φ_{k} (t),

and Parseval's theorem states that

{|| 𝒄 ||}_{2}^{2} = \sum_{k = 1}^{\infty} c_{k} \overline{c_{k}} = {|| f ||}_{2}^{2} = (f, f) = \int_{a}^{b} f (t) f (t) w (t) d t,

read as an equality between the energy of $f$ and that of $𝒄$ . By analogy to the finite-dimensional case, the Fourier transform is unitary in that it preserves lengths in the $|| f || + {(f, f)}^{1 / 2}$ norm with weight function $w (t) = 1$ .

2.Wavelet approximations

The bases ${φ_{1}, φ_{2}, \dots}$ arising from Sturm-Liouville problems are single-indexed, giving functions of increasing resolution over the entire definition domain. For example $\sin k x$ resolves ever finer features over $[0, 2 π]$ . When applied to a function with localized features, $k$ must be increased with increased resolution in the entire $[0, 2 π]$ domain. This leads to uneconomical approximation series $S_{N} f (t)$ with many terms, as exemplified by the Gibbs phenomenon in approximation of a step function, $f (t) = H (t - π / 2) - H (t - 3 π / 2)$ for $t \in [0, 2 π]$ , and $f (t + 2 π) = f (t)$ . The approach can be represented as the decomposition of a space of functions by the direct sum

F = Φ_{1} \oplus Φ_{2} \oplus \dots,

with $Φ_{k} = span (φ_{k})$ , for example

L^{2} = E_{0} \oplus E_{1} \oplus E_{- 1} \oplus E_{2} \oplus E_{- 2} \oplus \dots,

with $E_{k} = span {e^{i k t}}$ for the Fourier series.

Approximation of functions with localized features is more efficiently accomplished by choosing some generating function $ψ (t)$ and then defining a set of functions through translation and scaling, say

ψ_{j k} (t) = 2^{- j / 2} ψ (2^{- j} t - k) .

Such systems are known as wavelets, and the simplest example is the step function

ψ (t) = {\begin{cases} 1 & 0 ⩽ t < 1 / 2 \\ - 1 & 1 / 2 ⩽ t < 1 \\ 0 & otherwise \end{cases} .,

with $ψ_{j k}$ having support on the half-open interval $h_{j k} = [k 2^{- j}, (k + 1) 2^{- j})$ . The set ${ψ_{00}, ψ_{01}, \dots}$ is known as an Haar orthonormal basis for $L^{2} (ℝ)$ since

(ψ_{j k}, ψ_{l m}) = \int_{- \infty}^{\infty} ψ_{j k} (t) ψ_{l m} (t) d t = δ_{j l} δ_{k m} .

Approximations based upon a wavelet basis

f (t) = \sum_{j \in ℤ} \sum_{k \in ℤ} (f, ψ_{j k}) ψ_{j k} (t),

allow identification of localized features in $f$ .

The costly evaluation of scalar products $(f, ψ_{j k})$ in the double summation can be avoided by a reformulation of the expansion as

f (t) = \sum_{k} c_{l, k} φ_{l} (t) + \sum_{j ⩽ l} \sum_{k} d_{j, k} ψ_{j k} (t),

(1)

with . In addition to the $ψ$ (“mother” wavelet), an auxilliary $φ$ scaling function (“father” wavelet) is defined, for example

φ (t) = {\begin{cases} 1 & 0 ⩽ t < 1 \\ 0 & otherwise \end{cases} .,

for the Haar wavelet system.

The above approach is known as a multiresolution representation and is based upon a hierarchical decomposition of the space of functions, e.g.,

L^{2} = V_{l} \oplus W_{l} \oplus W_{l - 1} \oplus W_{l - 2} \oplus \dots

with

V_{j} = span {φ_{j k} | k \in ℤ .}, W_{j} = span {ψ_{j k} | k \in ℤ .} .

The hierarchical decomposition is based upon the vector subspace inclusions

{0} < \dots < V_{1} < V_{0} < V_{- 1} < V_{- 2} < \dots < L^{2} (ℝ),

and the relations

V_{m} \oplus W_{m} = V_{m - 1},

that state that the orthogonal complement of $V_{m}$ within $V_{m - 1}$ is $W_{m}$ . Analogous to the FFT, a fast wavelet transformation can be defined to compute coefficients of (1).