MATH590: Approximation in d

Abstract

The methods of linear algebra are used to distinguish between different gaits.

Table of contents

1 Karhunen-Loève theorem ?

2Singular-value decomposition ?

3Covariance matrices ?

4Data-driven bases ?

5Application to gait analysis ?

1Karhunen-Loève theorem

A probability space is a triplet (Ω,,P) with:

Rather improperly named, a random variable X:ΩE, is a function defined on a sample space with values in a measurable space (e.g., d). For some measurable subset SE, the probability of XS is

Pr(XS)=P({ωΩ|X(ω)S.})

A stochastic process Xt(ω) is indexed collection of random variables. Often the index is time, and Xt:×ΩE. A centered stochastic process has mean value zero

𝔼[Xt(ω)]=0,

with 𝔼 the expectation operator.

The Karhunen-Loève theorem affirms the existence of a canonical description of a stochastic process as a linear combination of random variables Zk with time-dependent coefficients ek(t), or, conversely, as a linear combination of time-varying functions ek(t) with random coefficients Zk

Xt(ω)=X(t,ω)=k=1Zkek(t).

2Singular-value decomposition

As will be explained in detail in the STC module, the singular value decomposition is a discrete form of the Karhunen-Loève theorem. Let the matrix 𝐗 denote multiple samples of some real-valued, centered stochastic process

𝐗=( X(t1,ω1) X(t1,ω2) X(t1,ωN) X(t2,ω1) X(t2,ω2) X(t2,ωN) X(tm,ω1) X(tm,ω2) X(tm,ωN) )m×N.

There exist orthogonal matrices 𝐔m×m, 𝐕N×N, and the quasi-diagonal positive matrix 𝚺=diag(σ1,,σr,0,,0)+m×N such that

𝐗=𝐔𝚺𝐕T,

known as the singular value decomposition (SVD). Introducing the column vectors of 𝐔,𝐕

𝐔=( 𝐮1 𝐮2 𝐮m ),𝐕=( 𝐯1 𝐯2 𝐯N ),

the SVD can be rewritten in two important forms:

Sum of rank-1 updates

𝐗=k=1rσk𝐮k𝐯kT

Bases for linear operator 𝐗
𝐗( 𝐯1 𝐯N )=( 𝐗𝐯1 𝐗𝐯r 𝐗𝐯N )=( σ1𝐮1 σr𝐮r 𝟎 𝟎 ).

3Covariance matrices

The covariance of two centered random variables X(ti,ω)=Xi,X(tj,ω)=Xj is

cov[Xi,Xj]=𝔼[XiXj],

typically approximated through a statistic from N observations

𝔼[XiXj]=1Nk=1NXi(ωk)Xj(ωk)

The covariance matrix 𝐂 of m centered random variables X(t1,ω),,X(tm,ω) has entries

Cij=cov[Xi,Xj],

and is expressed as the matrix product

𝐂=1N𝐗𝐗Tm×m.

By construction the covariance matrix is symmetric positive definite (spd) and therefore admits an orthogonal eigendecomposition

𝐂=𝐔𝚲𝐔T=k=1mλk𝐮k𝐮kT.

Assume (by re-labeling if necessary) that λ1λ2λm. It is often the case that p eigenvalues dominate over all others (the strongly correlated modes)

λ1λ2λpλp+1λm,

and the correlation matrix is approximated by the first p rank-1 updates

𝐂k=1pλk𝐮k𝐮kT.

4Data-driven bases

The dominant correlated modes form a natural basis set for analysis of data. Rather than solving the covariance matrix eigenproblem the SVD of 𝐗 is used since

𝐗=𝐔𝚺𝐕T,N𝐂=𝐗𝐗T=𝐔𝚺𝐕T𝐕𝚺T𝐔T=𝐔𝚺𝚺T𝐔T.

5Application to gait analysis

Consider the data obtained from many individual gait measurements (either from different individuals or at different times for the same individual). The goal is to identify differences w.r.t. a mean gait and use those differences to (1) identify either an individual or (2) a particular type of walking (climbing stairs versus level walking). The procedure is demonstrated here for the second problem.

octave>

dir='/home/student/courses/MATH590/NUMdata';

chdir(dir);

LastName='Mitran';

d=textread(strcat(LastName,'.data'));

mu=max(size(d))/7; disp(mu)

14190

/home/student/courses/MATH590/NUMdata

octave>

data=reshape(d,[7,mu])';

data(1:6,1:7)

( 0 0 0 0 0 0 0 0.004 -0.7803 0.2999 -1.799 -8.0329 16.868 3.0539 0.005 -0.0953 0.0762 -0.0111 0.085944 21.033 12.152 0.006 -0.0953 0.0762 -0.0111 0.085944 21.033 12.152 0.007 -0.6242 -0.0203 -0.9742 -0.017189 42.279 -1.5986 0.025 -0.6242 -0.0203 -0.9742 -0.017189 42.279 -1.5986 )

octave>

Interpolate to obtain even-spaced data, and plot the data.

octave>

t0=data(1,1); t1=data(mu,1); ni = 2^ceil(log2(mu)); dt=(t1-t0)/ni;

ti=(0:ni-1)*dt; ti=ti';

ai=interp1(data(:,1),data(:,3),ti);

fid=fopen('periods.data','w');

fprintf(fid,'%f %f\n',[ti ai]');

fclose(fid);

octave>

GNUplot]

cd '/home/student/courses/MATH590/NUMdata'

set terminal postscript eps enhanced color

set style line 1 lt 2 lc rgb "0x00006400" lw 3

plot 'periods.data' u 1:2 w l ls 1

GNUplot]

Find the Fourier spectrum of the vertical acceleration data.

octave>

Ai=fft(ai); PAi=log10(Ai.*conj(Ai));

fid=fopen('spectrum.data','w');

fprintf(fid,'%f\n',PAi(1:ni/4));

fclose(fid);

octave>

GNUplot]

cd '/home/student/courses/MATH590/NUMdata'

set terminal postscript eps enhanced color

set style line 1 lt 2 lc rgb "0x00000064" lw 3

plot 'spectrum.data' ls 1

GNUplot]

There are several peaks within the Fourier spectrum. Seek the peak corresponding to a natural step period of approximately Ts=(t1-t0)/nSteps0.5s. This turns out to be close to the global peak as shown in the following calculations

octave>

ks=floor((t1-t0)/(0.5))

325

octave>

[PAimx imx]=max(PAi);

octave>

[PAimx imx]

( 7.3053 299 )

octave>

N=imx-1;

octave>

Ts=(t1-t0)/N

0.54679

octave>

Isolate the steps, find an average step waveform g, and compute the centered data matrix 𝐗

octave>

m = floor(ni/N); t=(0:m-1)'*dt;

a = reshape(ai(2:m*N+1),[m,N]); g = mean(a')';

X = a - repmat(g,1,N);

octave>

fid=fopen('steps.data','w');

i=1; while(i<N)

fprintf(fid,'%f %f\n',[t a(:,i)]');

fprintf(fid,'\n');

i=i+5;

endwhile;

fclose(fid);

octave>

fid=fopen('gstep.data','w');

fprintf(fid,'%f %f\n',[t g]');

fclose(fid);

octave>

GNUplot]

cd '/home/student/courses/MATH590/NUMdata'

set terminal postscript eps enhanced color

set style line 1 lt 2 lc rgb "0x00000064" lw 3

set style line 2 lt 2 lc rgb "0x00ff0000" lw 6

plot 'steps.data' u 1:2 w l ls 1, 'gstep.data' u 1:2 w l ls 2

GNUplot]

Find the natural basis for the problem by computing the SVD of 𝐗, and investigate the dominant singular values

octave>

[U,S,V]=svd(X,0);

octave>

(diag(S)(1:6))'

( 108.031 93.528 23.645 19.604 17.588 15.726 )

octave>

fid=fopen('gaits.data','w');

fprintf(fid,'%f %f %f %f\n',[t U(:,1:2) g]');

fclose(fid);

octave>

From the above each step can be characterized by the first two components 𝐮1,𝐮2. Plot these modes and compare to the average gait 𝐠.

GNUplot]

cd '/home/student/courses/MATH590/NUMdata'

set terminal postscript eps enhanced color

set style line 1 lt 2 lc rgb "0x00000064" lw 3

set style line 2 lt 2 lc rgb "0x00006464" lw 3

set style line 3 lt 2 lc rgb "0x00FF0000" lw 3

plot 'gaits.data' u 1:2 w l ls 1, ” u 1:3 w l ls 2, ” u 1:4 w l ls 3

GNUplot]

Find the coordinates 𝐜 of each step along these directions.

octave>

c=X'*U(:,1:2);

fid=fopen('coefs.data','w');

fprintf(fid,'%f %f\n',c');

fclose(fid);

octave>

size(c)

( 298 2 )

octave>

GNUplot]

cd '/home/student/courses/MATH590/NUMdata'

set terminal postscript eps enhanced color

set style line 1 lt 2 lc rgb "0x00000064" lw 3

plot 'coefs.data' u 1:2 w p ls 1

GNUplot]

At this point, an economical representation of each of the N steps has been obtained, and the next question is to classify the observed results. This forms the objective of clustering analysis that relies on the mathematical theory of sets, as presented in the next module.