Model Reduction

1.Projection of mappings

1.1.Reduced matrices

The least-squares problem

min𝒙n||𝒚-𝑨𝒙|| (1)

focuses on a simpler representation of a data vector 𝒚m as a linear combination of column vectors of 𝑨m×n. Consider some phenomenon modeled as a function between vector spaces 𝒇:XY, such that for input parameters 𝒙X, the state of the system is 𝒚=𝒇(𝒙). For most models 𝒇 is differentiable, a transcription of the condition that the system should not exhibit jumps in behavior when changing the input parameters. Then by appropriate choice of units and origin, a linearized model

𝒚=𝑨𝒙,𝑨m×n,

is obtained if 𝒚C(𝑨), expressed as (1) if 𝒚C(𝑨).

A simpler description is often sought, typically based on recognition that the inputs and outputs of the model can themselves be obtained as linear combinations 𝒙=𝑩𝒖, 𝒚=𝑪𝒗, involving a smaller set of parameters 𝒖q, 𝒗p, p<m, q<n. The column spaces of the matrices 𝑩n×q, 𝑪m×p are vector subspaces of the original set of inputs and outputs, C(𝑩)n, C(𝑪)m. The sets of column vectors of 𝑩,𝑪 each form a reduced basis for the system inputs and outputs if they are chosed to be of full rank. The reduced bases are assumed to have been orthonormalized through the Gram-Schmidt procedure such that 𝑩T𝑩=𝑰q, and 𝑪T𝑪=𝑰p. Expressing the model inputs and outputs in terms of the reduced basis leads to

𝑪𝒗=𝑨𝑩𝒖𝒗=𝑪T𝑨𝑩𝒖𝒗=𝑹𝒖.

The matrix 𝑹=𝑪T𝑨𝑩p×q is called the reduced system matrix and is associated with a mapping 𝒈:UV, that is a restriction to the U,V vector subspaces of the mapping 𝒇. When 𝒇 is an endomorphism, 𝒇:XX, m=n, the same reduced basis is used for both inputs and outputs, 𝒙=𝑩𝒖, 𝒚=𝑩𝒗, and the reduced system is

𝒗=𝑹𝒖,𝑹=𝑩T𝑨𝑩.

Since 𝑩 is assumed to be orthogonal, the projector onto C(𝑩) is 𝑷𝑩=𝑩𝑩T. Applying the projector on the inital model

𝑷𝑩𝒚=𝑷𝑩𝑨𝒙

leads to 𝑩𝑩T𝒚=𝑩𝑩T𝑨𝒙, and since 𝒗=𝑩T𝒚 the relation 𝑩𝒗=𝑩𝑩T𝑨𝑩𝒖 is obtained, and conveniently grouped as

𝑩𝒗=𝑩(𝑩T𝑨𝑩)𝒖𝑩𝒗=𝑩(𝑹𝒖),

again leading to the reduced model 𝒗=𝑩𝒖. The above calculation highlights that the reduced model is a projection of the full model 𝒚=𝑨𝒙 on C(𝑩).

1.2.Dynamical system model reduction

An often encountered situation is the reduction of large-dimensional dynamical system

𝑴𝒙¨+𝑫𝒙˙+𝑲𝒙=𝒇,𝑴,𝑫,𝑲m×m,𝒙,𝒇:+m, (2)
𝒙˙=d𝒙dt,𝒙¨=d𝒙˙dt,

a generalization to multiple degrees of freedom of the damped oscillator equation

mx¨+dx˙+kx=f.

In (2), 𝒙(t) are the time-depenent coordinates of the system, 𝒇(t) the forces acting on the system, and 𝑴,𝑫,𝑲 are the mass, drag, stiffness matrices, respectively.

When m1, a reduced description is sought by linear combination of nm basis vectors

𝒙𝒙=𝑩𝒚𝑴𝑩𝒚¨+𝑫𝑩𝒚˙+𝑲𝑩𝒚=𝒇

Choose 𝑩m×n to have orthonormal columns, and project (2) onto C(𝑩) by multiplication with the projector 𝑷=𝑩𝑩T

𝑩𝑩T𝑴𝑩𝒚¨+𝑩𝑩T𝑫𝑩𝒚˙+𝑩𝑩T𝑲𝑩𝒚=𝑩𝑩T𝒇
𝑩(𝑩T𝑴𝑩𝒚¨+𝑩T𝑫𝑩𝒚˙+𝑩T𝑲𝑩𝒚-𝑩T𝒇)=𝟎𝑩𝒛=𝟎.

Since N(𝑩)={𝟎}, deduce 𝒛=𝟎, hence

𝑩T𝑴𝑩𝒚¨+𝑩T𝑫𝑩𝒚˙+𝑩T𝑲𝑩𝒚=𝑩T𝒇.

Introduce notations

𝑴=𝑩T𝑴𝑩,𝑫=𝑩T𝑫𝑩,𝑲=𝑩T𝑲𝑩

for the reduced mass, drag, stiffness matrices, with 𝑴,𝑫,𝑲n×n of smaller size. The reduced coordinates and forces are

𝒇=𝑩T𝒇,𝒚,𝒇n.

The resulting reduced dynamical system is

𝑴𝒚¨+𝑫𝒚˙+𝑲𝒚=𝒇.

2.Reduced bases

One elemenet is missing from the description of model reduction above: how is 𝑩 determined? Domain-specific knowledge can often dictate an appropriate basis (e.g., Fourier basis fo periodic phenomena). An alternative approach is to extract an appropriate basis from observations of a phenomenon, known as data-driven modeling.

2.1.Correlation matrices

Correlation coefficient.
Consider two functions x1,x2:, that represent data streams in time of inputs x1(t) and outputs x2(t) of some system. A basic question arising in modeling and data science is whether the inputs and outputs are themselves in a functional relationship. This usually is a consequence of incomplete knowledge of the system, such that while x1,x2 might be assumed to be the most relevant input, output quantities, this is not yet fully established. A typical approach is to then carry out repeated measurements leading to a data set D={(x1(ti),x2(ti))|i=1,,N.}, thus defining a relation. Let 𝒙1,𝒙2N denote vectors containing the input and output values. The mean values μ1,μ2 of the input and output are estimated by the statistics

μ1x1=1Ni=1Nx1(ti)=E[x1],μ2x2=1Ni=1Nx2(ti)=E[x2],

where E is the expectation seen to be a linear mapping, E:N whose associated matrix is

𝑬=1N[ 1 1 1 ],

and the means are also obtained by matrix vector multiplication (linear combination),

x1=𝑬𝒙1,x2=𝑬𝒙2.

Deviation from the mean is measured by the standard deviation defined for x1,x2 by

σ1=E[(x1-μ1)2],σ2=E[(x2-μ2)2].

Note that the standard deviations are no longer linear mappings of the data.

Assume that the origin is chosen such that x1=x2=0. One tool to estalish whether the relation D is also a function is to compute the correlation coefficient

ρ(x1,x2)=E[x1x2]σ1σ2=E[x1x2]E[x12]E[x22],

that can be expressed in terms of a scalar product and 2-norm as

ρ(x1,x2)=𝒙1T𝒙2||𝒙1||||𝒙2||.

Squaring each side of the norm property ||𝒙1+𝒙2||||𝒙1||+||𝒙2||, leads to

(𝒙1+𝒙2)T(𝒙1+𝒙2)𝒙1T𝒙1+𝒙2T𝒙2+2||𝒙1||||𝒙2||𝒙1T𝒙2||𝒙1||||𝒙2||,

known as the Cauchy-Schwarz inequality, which implies -1ρ(x1,x2)1. Depending on the value of ρ, the variables x1(t),x2(t) are said to be:

  1. uncorrelated, if ρ=0;

  2. correlated, if ρ=1;

  3. anti-correlated, if ρ=-1.

The numerator of the correlation coefficient is known as the covariance of x1,x2

cov(x1,x2)=E[x1x2].

The correlation coefficient can be interpreted as a normalization of the covariance, and the relation

cov(x1,x2)=𝒙1T𝒙2=ρ(x1,x2)||𝒙1||||𝒙2||,

is the two-variable version of a more general relationship encountered when the system inputs and outputs become vectors.

Patterns in data.
Consider now a related problem, whether the input and output parameters 𝒙n, 𝒚m thought to characterize a system are actually well chosen, or whether they are redundant in the sense that a more insightful description is furnished by 𝒖q,𝒗p with fewer components p<m,q<n. Applying the same ideas as in the correlation coefficient, a sequence of N measurements is made leading to data sets

𝑿=[ 𝒙1 𝒙2 𝒙n ]N×n,𝒀=[ 𝒚1 𝒚2 𝒚n ]N×m.

Again, by appropriate choice of the origin the means of the above measurements is assumed to be zero

E[𝒙]=𝟎,E[𝒚]=𝟎.

Covariance matrices can be constructed by

𝑪𝑿=𝑿T𝑿=[ 𝒙1T 𝒙2T 𝒙nT ][ 𝒙1 𝒙2 𝒙n ]=[ 𝒙1T𝒙1 𝒙1T𝒙2 𝒙1T𝒙n 𝒙2T𝒙1 𝒙2T𝒙2 𝒙2T𝒙n 𝒙nT𝒙1 𝒙nT𝒙2 𝒙nT𝒙n ]n×n.

Consider now the SVDs of C𝑿=𝑵𝚲𝑵T, 𝑿=𝑼𝚺𝑺T, and from

𝑪𝑿=𝑿T𝑿=(𝑼𝚺𝑺T)T𝑼𝚺𝑺T=𝑺𝚺T𝑼T𝑼𝚺𝑺T=𝑺𝚺T𝚺𝑺T=𝑵𝚲𝑵T,

identify 𝑵=𝑺, and 𝚲=𝚺T𝚺.

Recall that the SVD returns an order set of singular values σ1σ2, and associated singular vectors. In many applications the singular values decrease quickly, often exponentially fast. Taking the first q singular modes then gives a basis set suitable for mode reduction

𝒙=𝑺q𝒖=[ 𝒔1 𝒔2 𝒔q ]𝒖.

3.Stochastic systems - Karhunen-Loève theorem

The data reduction inherent in SVD representations is a generic feature of natural phenomena. A paradigm for physical systems is the evolution of correlated behavior against a backdrop of thermal enery, typically represented as a form of noise.

One mathematical technique to model such systems is the definition of a stochastic process {Xt}atb, where for each fixed t, Xt is a random variable, i.e., a measurable function X:ΩE from a set of possible outcomes Ω to a measurable space E. The set Ω is the sample space of a probability triple (Ω,,P), where for SE

P(XS)=P({ωΩ}|X(ω)S|).

A measurable space is a set coupled with procedure to determine measurable subsets, known as a σ-algebra.

Theorem. Let Xt be a zero-mean (𝔼[Xt]=0), square-integrable stochastic process defined over probability space (Ω,,P) indexed by t, atb. Then Xt admits a representation

Xt=k=1Zkek(t),

with

Zk=abXtek(t)dt,𝔼[Zk]=0,𝔼[Zi,Zj]=δijσj.