Vector Space Dimension

Synopsis. The vector subspaces of a matrix characterize what vectors can or cannot be reached by linear combination of matrix columns or rows. How large are these subspaces? An answer is provided by the concept of minimal spanning sets, and the number of vectors in a minimal spanning set allows a precise definition of the intuitive concept of dimension. Upon completion of the framework for vector operations, a first example of the relevance of the theory is considered in the problem of classification of electrocardiograms as indicators of healthy or diseased states.

1.Linear dependence and independence

1.1.Zero factors

For the simple scalar mapping f:, f(x)=ax, the condition f(x)=0 implies either that a=0 or x=0. The function f is a linear mapping, and a=0 can be understood as defining a zero mapping f(x)=0. When a0 the condition ax=0 necessarily implies that x=0. Scalar multiplication satisfies the zero product property and if a product is equal to zero one of the factors must be zero.

Linear mappings between vector spaces, 𝒇:nm, 𝒇(𝒙)=𝑨𝒙, 𝑨m×n can exhibit remarkably different behavior. As in the scalar case, a zero mapping is defined by 𝑨=𝟎 in which case 𝒇(𝒙)=𝟎. In contrast to the scalar case, even when 𝑨𝟎, the equation 𝑨𝒙=𝟎 no longer necessarily implies that 𝒙=𝟎. For example,

𝑨=[ 𝒂1 𝒂2 ]=[ 1 1 2 2 ],𝑨𝒙=[ 1 1 2 2 ][ 1 -1 ]=[ 0 0 ]=𝟎. (1)

The linear combination above can be read as traveling x1=1 in the direction of 𝒂1, followed by traveling x2=-1 in the direction of 𝒂2. Since 𝒂1=𝒂2, the net result of the travel is arrival at the origin 𝑨𝒙=𝟎. Though 𝑨 has two column vectors, they both specify the same direction and are said to contain redundant data. A similar situation arises for

𝑩=[ 𝒃1 𝒃2 𝒃3 ]=[ 1 -1 1 2 0 4 3 1 7 ],𝑩𝒙=[ 1 -1 1 2 0 4 3 1 7 ][ 2 1 -1 ]=[ 0 0 0 ]=𝟎. (2)

The columns of 𝑩 specify three different directions, but the directions satisfy 𝒃3=2𝒃1+𝒃2. This implies that 𝒃3span{𝒃1,𝒃2}. Whereas the vectors 𝒂1,𝒂2 are colinear in 2, the vectors 𝒃1,𝒃2,𝒃3 are coplanar within 3. In both cases matrix vector multiplication is seen to not satisfy the zero product property and from 𝑨𝒙=𝟎 one cannot deduce that either 𝑨=𝟎 or 𝒙=𝟎. This arises from the defining matrix-vector multiplication to describe linear combinations.

There are however cases when 𝑴𝒙=𝟎 implies that 𝒙=𝟎, most simply for the case 𝑴=𝑰. The need to distinguish between cases that satisfy or do not satisfy the zero product property leads to a general concept of linear dependence.

Definition. The vectors 𝒂1,𝒂2,,𝒂nV,are linearly dependent if there exist n scalars, x1,,xnS, at least one of which is different from zero such that

x1𝒂1++xn𝒂n=𝟎.

Introducing a matrix representation of the vectors

𝑨=[ 𝒂1 𝒂2 𝒂n ];𝒙=[ x1 x2 xn ]

allows restating linear dependence as the existence of a non-zero vector, 𝒙𝟎, such that 𝑨𝒙=𝟎. Linear dependence can also be written as 𝑨𝒙=𝟎𝒙=𝟎, or that one cannot deduce from the fact that the linear mapping 𝒇(𝒙)=𝑨𝒙 attains a zero value that the argument itself is zero. The converse of this statement would be that the only way to ensure 𝑨𝒙=𝟎 is for 𝒙=𝟎, or 𝑨𝒙=𝟎𝒙=𝟎, leading to the concept of linear independence.

Definition. The vectors 𝒂1,𝒂2,,𝒂nV,are linearly independent if the only n scalars, x1,,xnS, that satisfy

x1𝒂1++xn𝒂n=𝟎, (3)

are x1=0, x2=0,…,xn=0.

If 𝑨m×n contains n linearly independent vectors 𝒂1,,𝒂nm the nullspace of 𝑨 only contains one element, the zero vector.

N(𝑨)={𝒙:𝑨𝒙=𝟎}={𝟎}𝒂1,,𝒂nlinearlyindependent.

1.2.Orthogonality

Establishing linear independence through the above definition requires algebraic calculations to deduce that x1𝒂1++xn𝒂n=𝟎 necessarily implies x1=0, x2=0,…,xn=0. There is an important case suggested by the behavior of the column vectors of the identity matrix 𝑰 that simplifies the calculations. Distinct column vectors 𝒆1,,𝒆m within 𝑰 are orthogonal

𝒆iT𝒆j=0,1i,jm,ij

and each individual vector is of unit 2-norm

𝒆iT𝒆i=1,1im.

In this case multiplying the equation x1𝒆1++xm𝒆m=𝟎 by 𝒆jT immediately leads to xj=0, and the column vectors of 𝑰 are linearly independent. In general any orthogonal set of vectors 𝒖1,,𝒖n are linearly independent by the same argument. In matrix terms, the vectors are collected into a matrix 𝑼=[ 𝒖1 𝒖2 𝒖n ] and

𝑼T𝑼=[ 𝒖1T 𝒖2T 𝒖nT ][ 𝒖1 𝒖2 𝒖n ]=[ 𝒖1T𝒖1 𝒖1T𝒖2 𝒖1T𝒖n 𝒖2T𝒖1 𝒖2T𝒖2 𝒖2T𝒖n 𝒖nT𝒖1 𝒖nT𝒖2 𝒖nT𝒖n ]=[ ||𝒖1||2 0 0 0 ||𝒖n||2 0 0 0 ||𝒖n||2 ].

Definition. The column vectors 𝒖1,𝒖2,,𝒖nm of matrix 𝑼m×n are orthogonal if

𝑼T𝑼=diag(||𝒖1||2,,||𝒖n||2).

An especially simple and useful case is when the norms of all orthogonal vectors are equal to one.

Definition. The column vectors 𝒒1,𝒒2,,𝒒nm of matrix 𝑸m×n are orthonormal if

𝑸T𝑸=𝑰.

Somewhat confusingly a square matrix with orthonormal columns is said to be orthogonal.

Definition. The matrix 𝑸m×m is orthogonal if

𝑸T𝑸=𝑸𝑸T=𝑰.

Example. The rotation matrix in 2 is orthogonal,

𝑹θ=[ cosθ -sinθ sinθ cosθ ],𝑹θ𝑹θT=𝑹θT𝑹θ=[ cos2θ+sin2θ 0 0 cos2θ+sin2θ ]=𝑰.

Example. A rotation matrix in 3 is orthogonal,

𝑹θ=[ cosθ -sinθ 0 sinθ cosθ 0 0 0 1 ],𝑹θ𝑹θT=𝑹θT𝑹θ=[ cos2θ+sin2θ 0 0 0 cos2θ+sin2θ 0 0 0 1 ]=𝑰.

Example. The reflection matrix across direction 𝒒 of unit norm in m

𝑹𝒒=2𝒒𝒒T-𝑰,

is orthogonal

𝑹𝒒𝑹𝒒T=(2𝒒𝒒T-𝑰)(2𝒒𝒒T-𝑰)T=(2𝒒𝒒T-𝑰)(2𝒒𝒒T-𝑰)=4𝒒𝒒T𝒒𝒒T-4𝒒𝒒T-I=𝑰

since 𝒒𝒒T𝒒𝒒T=𝒒(𝒒T𝒒)𝒒T=𝒒(1)𝒒T=𝒒𝒒T.

2.Basis and dimension

2.1.Minimal spanning sets

Vector spaces are closed under linear combination, and the span of a vector set ={𝒂1,𝒂2,} defines a vector subspace. If the entire set of vectors can be obtained by a spanning set, V=span, extending by an additional element 𝒞={𝒃} would be redundant since span=span𝒞. Avoinding redundancy leads to a consideration of a minimal spanning set. This is formalized by the concept of a basis, and also allows leads to a characterization of the size of a vector space by the cardinality of a basis set.

Definition. A set of vectors 𝒖1,,𝒖nU is a basis for vector space 𝒰=(U,S,+,) if

  1. 𝒖1,,𝒖n are linearly independent;

  2. span{𝒖1,,𝒖n}=U.

Definition. The number n of vectors 𝒖1,,𝒖nU within a basis is the dimension of the vector space 𝒰=(U,S,+,).

The above definitions underlie statements such as “3 represents three-dimensional space”. Since any 𝒗3 can be expressed as

𝒗=𝑰𝒗=v1𝒆1+v2𝒆2+v3𝒆3,

it results that {𝒆1,𝒆2,𝒆3} is a spanning set. The equation x1𝒆1+x2𝒆2+x3𝒆3=𝟎 leads to

[ x1 x2 x3 ]=[ 0 0 0 ]=𝟎,

hence {𝒆1,𝒆2,𝒆3} are independent, and therefore form a basis. The cardinality of the set {𝒆1,𝒆2,𝒆3} is equal to three, and, indeed, 3 represents three-dimensional space.

2.2.Dimensions of matrix spaces

The domain n and co-domain m of the linear mapping 𝒇:nm, 𝒇(𝒙)=𝑨𝒙, are vector spaces. Within these vector spaces, subspaces associated with the linear mapping are defined by:

The dimensions of these subspaces arise so often in applications to warrant formal definition.

Definition. The rank of a matrix 𝑨m×n is the dimension of its column space.

Definition. The nullity of a matrix 𝑨m×n is the dimension of its null space.

The dimension of the row space is equal to that of the column space. This is a simple consequence of how scalars are organized into vectors. When the preferred organization is into column vectors, a linear combination is expressed as

𝒃=𝑨𝒙=x1𝒂1++xn𝒂n.

The same linear combination can also be expressed through rows by taking the transpose, an operation that swaps rows and columns,

𝒃T=(𝑨𝒙)T=𝒙T𝑨T=x1𝒂1T++xn𝒂nT.

Any statment about the linear combination of column vectors x1𝒂1++xn𝒂n also holds for the linear combination of row vectors x1𝒂1T++xn𝒂nT. In particular the dimension of the column space equals that of the row space.

3.Signal compression

3.1.Electrocardiograms

The first forays of the significance of matrix-associated subspaces through geometry of lines or planes is useful, but belie the utility of these concepts in applications. Consider the problem of long-term care for cardiac patients. Electrocardiograms (ECGs) are recorded periodically, often over decades and need to be stored for comparison and assessment of disease progression. Heartbeats arise from electrical signals occuring at frequencies of around f=250 Hz (cycles/second). A “single-lead” ECG to detect heart arrhythmia can require T=180 s recording time, so a vector representation would require m=fT=4.5×104𝒪(105) components, 𝒃m,

𝒃=𝑰𝒃=b1𝒆1+b2𝒆2++bm𝒆m, (4)

where component bj is the voltage recorded at time tj=j(δt), and δt=1/f is the sampling time interval.

The linear combination (4) seems a natural way to organize recorded data, but is it the most insightful? Recall the inclined plane example in which forces seemed to be naturally described in a Cartesian system consisting of the horizontal and vertical directions, yet a more insightful description was seen to correspond to normal and tangential directions. Might there be an alternative linear combination, say

𝒃=x1𝒉1+x2𝒉2++xm𝒉m, (5)

that would represent the ECG in a more meaningful manner? The vectors entering into the linear combination are, as usual, organized into a matrix

𝑯=[ 𝒉1 𝒉2 𝒉m ]m×m,

such that the ECG is obtained through a matrix-vector product 𝒃=𝑯𝒙,𝒙m. In particular, in this alternative description, could a smaller number of components suffice for capturing the essence of the ECG? Select the first n components by defining

𝑯n=[ 𝒉1 𝒉2 𝒉n ]m×n,𝒄=𝑯n𝒚,𝒄m,𝒚n. (6)

The two-norm of the vectors' difference is the error ε=||𝒃-𝒄||2 incurred by keeping only n components. If 𝒄 is close to 𝒃 then ε should be small, and if n is much smaller than m, a compressed version of the ECG is obtained.

At first sight, the representations (5,6) might seem more costly than (4) since not only do the scaling factors 𝒙m or 𝒚n have to be stored, but also the m2 components of 𝑯 also. This difficulty is eliminated if 𝑯=[ 𝒉1 𝒉2 𝒉m ] is obtained by some simple rule, much as 𝑰=[ 𝒆1 𝒆2 𝒆m ] can be specified by any one of a number of procedures.

3.2.The identity matrix revisited

Within the algebraic structure of a vector space there is an identity element 1 with respect to the operation of scaling the vector 𝒃m,

1𝒃=𝒃.

Analogously, the identity matrix 𝑰m×m acts as an identity element with respect to matrix vector multiplication

𝑰𝒃=𝒃.

Since matrix-matrix multiplication is simply successive matrix-vector multiplications,

𝑨𝑩=𝑨[ 𝒃1 𝒃2 𝒃p ]=[ 𝑨𝒃1 𝑨𝒃2 𝑨𝒃p ]

the identity matrix 𝑰 is an identity element for matrix multiplication

𝑰𝑩=𝑩.

Computer storage of the identity matrix 𝑰m×m might naively seem to require m2 locations, one for every component, but its remarkable structure implies specification only of a single number, m the size of 𝑰. The identity matrix can then be reconstructed as needed through a variety of procedures.

Predefined identity operator

In Julia I is a predefined operator from which the full identity matrix can be reconstructed if needed

m=3; Matrix(1.0I,m,m)

[ 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ] (7)

Component definition

In terms of components

𝑰=[δij],1i,jm,δij={ 1 ifi=j 0 otherwise .,

where δij is known as Kronecker-delta and is readily transcribed in Julia.

δ(i,j) = if i==j return 1 else return 0 end;
Id(m) = [δ(i,j) for i in 1:m, j=1:m]; Id(3)

[ 1 0 0 0 1 0 0 0 1 ] (8)

Column vector definition

Another construction is by circular shifts of the column vector 𝒆1=[ 1 0 0 ]Tm.

e1T(m) = [1; zeros(m-1,1)]; e1T(3)

[ 1.0 0.0 0.0 ] (9)

ekT(m,k) = circshift(e1T(m),k); [ekT(3,0) ekT(3,1) ekT(3,2)]

[ 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ] (10)

Diagonal matrix construction

The diagonal structure of 𝑰 also defines a reconstruction.

Diagonal(ones(3,3))

[ 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 ] (11)

3.3.

Exterior product construction of 𝑰

A conceptually different reconstruction of 𝑰 when m=2p uses block matrix operations and builds up larger matrices from smaller ones. This technique is quite instructive in that it:

Since larger versions of the identity matrix will be obtained from smaller ones, a notation to specify the matrix size is needed. Let 𝑰k denote the identity matrix of size m×m with m=2k. Start from p=0, m=1 in which case 𝑰0=[1], also define

𝑰1=[ 1 0 0 1 ].

The next matrix in this sequence would be 𝑰2 in which a block structure can be identified

𝑰2=[ 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 ]=[ 𝑰1 𝟎 𝟎 𝑰1 ]=[ 1𝑰1 0𝑰1 0𝑰1 1𝑰1 ].

The scaling coefficients applied to each 𝑰1 block are recognized to be 𝑰1 itself. This suggest that the next matrix in the sequence could be obtained as

𝑰3=[ 1𝑰2 0𝑰2 0𝑰2 1𝑰2 ].

It is useful to introduce a notation for these operations: 𝑰2=𝑰1𝑰1,𝑰3=𝑰1𝑰2.

Definition. The exterior product of matrices 𝑨=[aij]m×n and 𝑩p×q is the matrix 𝑪(mp)×(nq)

𝑪=𝑨𝑩=[ a11𝑩 a12𝑩 a1n𝑩 a21𝑩 a21𝑩 a2n𝑩 am1𝑩 am2𝑩 amn𝑩 ].

Recall that the inner product of 𝒖,𝒗m is the scalar 𝒖T𝒗, and one can think of an inner product as “reducing the dimensions of its factors”. In contrast the exterior product “increases the dimensions of its factors”. An example of the exterior product has already been met, namely the projection matrix along direction 𝒒 of unit norm (||𝒒||=1)

𝑷𝒒=𝒒𝒒T=𝒒𝒒=[ q1𝒒 q2𝒒 qm𝒒 ].

Using the exterior product definition the matrix 𝑰q is defined in terms of previous terms in the sequence as

𝑰q=𝑰1𝑰q-1,q>1.

Such definitions based upon previous terms are said to be recursive.

3.4.Walsh-Hadamard matrices

An alternative to specifying the signal 𝒃m as the linear combination 𝒃=𝑰𝒃 is now constructed by a different recursion procedure. As in the identity matrix case let m=2p and start from p=0, m=1 with 𝑯0=[1]. For the next term choose however

𝑯1=[ 1 1 1 -1 ],

and define in general

𝑯q=𝑯1𝑯q-1.

Julia can be extended to include definitions of the above Hadamard matrices.

using Hadamard
H0=hadamard(2^0)

[ 1 ] (12)

H1=hadamard(2^1)

[ 1 1 1 -1 ] (13)

H2=hadamard(2^2)

[ 1 1 1 1 1 -1 1 -1 1 1 -1 -1 1 -1 -1 1 ] (14)

The column vectors of the identity matrix are mutually orthogonal as expressed by

𝑰T𝑰=[ 𝒆1T 𝒆2T 𝒆mT ][ 𝒆1 𝒆2 𝒆m ]=[ 𝒆1T𝒆1 𝒆1T𝒆2 𝒆1T𝒆m 𝒆2T𝒆1 𝒆2T𝒆2 𝒆2T𝒆m 𝒆mT𝒆1 𝒆mT𝒆2 𝒆mT𝒆m ]=𝑰.

The Hadamard matrices have similar behavior in that

𝑯qT𝑯q=2q𝑰m,m=2q,

and thus be seen to have orthogonal columns.

H2'*H2

[ 4 0 0 0 0 4 0 0 0 0 4 0 0 0 0 4 ] (15)

The structure of Hadamard matrices allows a reconstruction from small cases just as simple as that of the identity matrix, but the components of a Hadamard matrix are quite different. Rather than display of the components a graphical visualization is insightful. The spy(A) function displays a plot of the non-zero components of a matrix

q=5; m=2^q; Iq=Matrix(1.0I,m,m); Hq=hadamard(m);
clf(); subplot(1,2,1); spy(Iq); subplot(1,2,2); spy(Hq.+1);
cd(homedir()*"/courses/MATH347DS/images"); savefig("L03Fig01.eps");

Figure 1. The structure of the identity and Hadamard matrices for m=25=32.

3.5.Sample sets and reconstructions

The behavior of the column vectors of 𝑰 and 𝑯 is instructive. Denote by b: the ECG recording that would be obtained measurements were carried out for all t[0,T]. This is obviously impossible since it would require an infinite number of measurements. In practice, only the values bi=b(ti) are recorded for ti=i(δt), which is a sample of the infinite set of the possible values. The sample set values are organized into the vector 𝒃. Function values b(t) for tti can be reconstructed through some assumption, for instance that

b(t)=b(ti)=bifort[ti,ti+1].

The above states that b(t) remains constant from ti to ti+δt. A simple example when b(t)=sin(t) illustrates the approach.

T=2*pi; n=16; δt=T/n; t=(0:n-1)*δt; b=sin.(t);
m=n^2; δts=T/m; ts=(0:m-1)*δts; s=sin.(ts);
p=4*n; δtw=T/p; tw=(0:p-1)*δtw; w=reshape((b.*[1 1 1 1])',(1,p))';
figure(3); clf(); plot(t,b,"o",ts,s,tw,w);
title("Reconstruction of sine from samples");
savefig("L04Fig02.eps");

Figure 2. The reconstruction is a step function.

The reconstruction exhibits steps at which the function value changes. The column vectors of 𝑰 are well suited for such reconstructions.

n=8; In=Matrix(1.0I,n,n); Hn=hadamard(n);
figure(5); clf();
for j=1:n
   subplot(8,2,2*j-1); plot(In[:,j])
   subplot(8,2,2*j);   plot(Hn[:,j])
end
title("Column vectors of I");
subplot(2,1,2);
for j=1:n
   plot(Hn[:,j])
end
subplot(8,2,1); title("Identity matrix");
subplot(8,2,2); title("Hadamard matrix");
savefig("L04Fig03.eps");

Figure 3. Comparison of column vectors of 𝑰,𝑯.

3.6.Naive ECG compression

ECG compression.
What happens if the linear combinations expressing 𝒃m

𝒃=𝑰𝒃=b1𝒆1++bm𝒆m=c1𝒉1++cm𝒉m=𝑯𝒄

are truncated? The scaling coefficients of the Hadamard linear combination are readily found

𝑯𝒄=𝒃𝑯T(𝑯𝒄)=𝑯T𝒃(𝑯T𝑯)𝒄=𝑯T𝒃m𝑰𝒄=𝑯T𝒃𝒄=1m𝑯T𝒃.

Consider 𝒖 to be the truncation of 𝒃=𝑰𝒃 to n<m terms

𝒖=b1𝒆1++bn𝒆n.

Due to the choice of the significance of the components bi the above simply drops ECG recording times for t>n(δt). The truncation of the Hadamard linear combination

𝒗=c1𝒉1++cn𝒉n

behaves quite differently.

using MAT
DataFileName = homedir()*"/courses/MATH347DS/data/ecg/ECGData.mat";
DataFile = matopen(DataFileName,"r");
dict = read(DataFile,"ECGData");
data = dict["Data"]';
size(data)

[ 65536 162 ] (16)

q=12; m=2^q; k=15; b=data[1:m,k];
Iq=Matrix(1.0I,m,m); Hq=hadamard(m); c=(1/m)*transpose(Hq)*b;
n=2^10; u=Iq[:,1:n]*b[1:n]; v=Hq[:,1:n]*c[1:n];
figure(1); clf(); subplot(3,1,1); plot(b);
subplot(3,1,2); plot(u);
subplot(3,1,3); plot(v);
cd(homedir()*"/courses/MATH347DS/images"); savefig("L03Fig02.eps");

Figure 4. Comparison of original ECG (top) containing m=212=4096 time samples with truncated linear combinations based upon n=210=1024 terms of the linear combinations using identity matrix 𝑰 (middle), and Hadamard matrix 𝑯 (bottom).

3.7.Basis subsets and compression

Neither of the truncated linear combinations seems particularly useful. Truncation of the 𝑰-based linear combination cuts off part of the signal, while that of the 𝑯-based linear combination introduces heart pulses not present in the original signal. The problem is the significance given to the ordering of the vectors into the linear combination. Consider first truncation of 𝒃=𝑰𝒃 to obtain 𝒖. Rather than cutting off part of the signal, an alternative data compression is to use a larger time sample, say 4δt instead of δt, a process known as down-sampling,

𝒅=𝒃[1:4:m]m/4.
w=fwht(b);
figure(6); clf(); plot(c,".");
plot(w,".");