Model Reduction

1.Projection of mappings

The least-squares problem

min𝒙n||𝒚-𝑨𝒙|| (1)

focuses on a simpler representation of a data vector 𝒚m as a linear combination of column vectors of 𝑨m×n. Consider some phenomenon modeled as a function between vector spaces 𝒇:XY, such that for input parameters 𝒙X, the state of the system is 𝒚=𝒇(𝒙). For most models 𝒇 is differentiable, a transcription of the condition that the system should not exhibit jumps in behavior when changing the input parameters. Then by appropriate choice of units and origin, a linearized model

𝒚=𝑨𝒙,𝑨m×n,

is obtained if 𝒚C(𝑨), expressed as (1) if 𝒚C(𝑨).

A simpler description is often sought, typically based on recognition that the inputs and outputs of the model can themselves be obtained as linear combinations 𝒙=𝑩𝒖, 𝒚=𝑪𝒗, involving a smaller set of parameters 𝒖q, 𝒗p, p<m, q<n. The column spaces of the matrices 𝑩n×q, 𝑪m×p are vector subspaces of the original set of inputs and outputs, C(𝑩)n, C(𝑪)m. The sets of column vectors of 𝑩,𝑪 each form a reduced basis for the system inputs and outputs if they are chosed to be of full rank. The reduced bases are assumed to have been orthonormalized through the Gram-Schmidt procedure such that 𝑩T𝑩=𝑰q, and 𝑪T𝑪=𝑰p. Expressing the model inputs and outputs in terms of the reduced basis leads to

𝑪𝒗=𝑨𝑩𝒖𝒗=𝑪T𝑨𝑩𝒖𝒗=𝑹𝒖.

The matrix 𝑹=𝑪T𝑨𝑩 is called the reduced system matrix and is associated with a mapping 𝒈:UV, that is a restriction to the U,V vector subspaces of the mapping 𝒇. When 𝒇 is an endomorphism, 𝒇:XX, m=n, the same reduced basis is used for both inputs and outputs, 𝒙=𝑩𝒖, 𝒚=𝑩𝒗, and the reduced system is

𝒗=𝑹𝒖,𝑹=𝑩T𝑨𝑩.

Since 𝑩 is assumed to be orthogonal, the projector onto C(𝑩) is 𝑷𝑩=𝑩𝑩T. Applying the projector on the inital model

𝑷𝑩𝒚=𝑷𝑩𝑨𝒙

leads to 𝑩𝑩T𝒚=𝑩𝑩T𝑨𝒙, and since 𝒗=𝑩T𝒚 the relation 𝑩𝒗=𝑩𝑩T𝑨𝑩𝒖 is obtained, and conveniently grouped as

𝑩𝒗=𝑩(𝑩T𝑨𝑩)𝒖𝑩𝒗=𝑩(𝑹𝒖),

again leading to the reduced model 𝒗=𝑩𝒖. The above calculation highlights that the reduced model is a projection of the full model 𝒚=𝑨𝒙 on C(𝑩).

2.Reduced bases

2.1.Correlation matrices

Correlation coefficient.
Consider two functions x1,x2:, that represent data streams in time of inputs x1(t) and outputs x2(t) of some system. A basic question arising in modeling and data science is whether the inputs and outputs are themselves in a functional relationship. This usually is a consequence of incomplete knowledge of the system, such that while x1,x2 might be assumed to be the most relevant input, output quantities, this is not yet fully established. A typical approach is to then carry out repeated measurements leading to a data set D={(x1(ti),x2(ti))|i=1,,N.}, thus defining a relation. Let 𝒙1,𝒙2N denote vectors containing the input and output values. The mean values μ1,μ2 of the input and output are estimated by the statistics

μ1x1=1Ni=1Nx1(ti)=E[x1],μ2x2=1Ni=1Nx2(ti)=E[x2],

where E is the expectation seen to be a linear mapping, E:N whose associated matrix is

𝑬=1N[ 1 1 1 ],

and the means are also obtained by matrix vector multiplication (linear combination),

x1=𝑬𝒙1,x2=𝑬𝒙2.

Deviation from the mean is measured by the standard deviation defined for x1,x2 by

σ1=E[(x1-μ1)2],σ2=E[(x2-μ2)2].

Note that the standard deviations are no longer linear mappings of the data.

Assume that the origin is chosen such that x1=x2=0. One tool to estalish whether the relation D is also a function is to compute the correlation coefficient

ρ(x1,x2)=E[x1x2]σ1σ2=E[x1x2]E[x12]E[x22],

that can be expressed in terms of a scalar product and 2-norm as

ρ(x1,x2)=𝒙1T𝒙2||𝒙1||||𝒙2||.

Squaring each side of the norm property ||𝒙1+𝒙2||||𝒙1||+||𝒙2||, leads to

(𝒙1+𝒙2)T(𝒙1+𝒙2)𝒙1T𝒙1+𝒙2T𝒙2+2||𝒙1||||𝒙2||𝒙1T𝒙2||𝒙1||||𝒙2||,

known as the Cauchy-Schwarz inequality, which implies -1ρ(x1,x2)1. Depending on the value of ρ, the variables x1(t),x2(t) are said to be:

  1. uncorrelated, if ρ=0;

  2. correlated, if ρ=1;

  3. anti-correlated, if ρ=-1.

The numerator of the correlation coefficient is known as the covariance of x1,x2

cov(x1,x2)=E[x1x2].

The correlation coefficient can be interpreted as a normalization of the covariance, and the relation

cov(x1,x2)=𝒙1T𝒙2=ρ(x1,x2)||𝒙1||||𝒙2||,

is the two-variable version of a more general relationship encountered when the system inputs and outputs become vectors.

Patterns in data.
Consider now a related problem, whether the input and output parameters 𝒙n, 𝒚m thought to characterize a system are actually well chosen, or whether they are redundant in the sense that a more insightful description is furnished by 𝒖q,𝒗p with fewer components p<m,q<n. Applying the same ideas as in the correlation coefficient, a sequence of N measurements is made leading to data sets

𝑿=[ 𝒙1 𝒙2 𝒙n ]N×n,𝒀=[ 𝒚1 𝒚2 𝒚n ]N×m.

Again, by appropriate choice of the origin the means of the above measurements is assumed to be zero

E[𝒙]=𝟎,E[𝒚]=𝟎.

Covariance matrices can be constructed by

𝑪𝑿=𝑿T𝑿=[ 𝒙1T 𝒙2T 𝒙nT ][ 𝒙1 𝒙2 𝒙n ]=[ 𝒙1T𝒙1 𝒙1T𝒙2 𝒙1T𝒙n 𝒙2T𝒙1 𝒙2T𝒙2 𝒙2T𝒙n 𝒙nT𝒙1 𝒙nT𝒙2 𝒙nT𝒙n ]n×n.

Consider now the SVDs of C𝑿=𝑵𝚲𝑵T, 𝑿=𝑼𝚺𝑺T, and from

𝑪𝑿=𝑿T𝑿=(𝑼𝚺𝑺T)T𝑼𝚺𝑺T=𝑺𝚺T𝑼T𝑼𝚺𝑺T=𝑺𝚺T𝚺𝑺T=𝑵𝚲𝑵T,

identify 𝑵=𝑺, and 𝚲=𝚺T𝚺.

Recall that the SVD returns an order set of singular values σ1σ2, and associated singular vectors. In many applications the singular values decrease quickly, often exponentially fast. Taking the first q singular modes then gives a basis set suitable for mode reduction

𝒙=𝑺q𝒖=[ 𝒔1 𝒔2 𝒔q ]𝒖.