Formal Rules

1.Algebraic structures

1.1.Typical structures

A vector space has been introduced as a 4-tuple 𝒱=(V,S,+,) with specific behavior of the vector addition and scaling operations. Arithmetic operations between scalars were implicitly assumed to be similar to those of the real numbers, but also must be specified to obtain a complete definition of a vector space. Algebra defines various structures that specify the behavior operations with objects. Knowledge of these structures is useful not only in linear algebra, but also in other mathematical approaches to data analysis such as topology or geometry.

Groups.
A group is a 2-tuple 𝒢=(G,+) containing a set G and an operation + with properties from Table 2. If a,bG, a+b=b+a, the group is said to be commutative. Besides the familiar example of integers under addition (,+), symmetry groups that specify spatial or functional relations are of particular interest. The rotations by 0,π2,π,3π2 or vertices of a square form a group.

Addition rules
a+bG Closure
a+(b+c)=(a+b)+c Associativity
0+a=a Identity element
a+(-a)=0 Inverse element

Table 1. Group 𝒢=(G,+) properties, for a,b,cG

Rings.
A ring is a 3-tuple =(R,+,) containing a set R and two operations +, with properties from Table 2. As is often the case, a ring is more complex structure built up from simpler algebraic structures. With respect to addition a ring has the properties of a commutative group. Only associativity and existence of an identity element is imposed for multiplication. Matrix addition and multiplication has the structure of ring (m×m,+,).

Addition rules
(R,+) is a commutative (Abelian) group
Multiplication rules
abR Closure
(ab)c=a(bc) Associativity
a1=1a=a Identity element
Distributivity
a(b+c)=(ab)+(ac) on the left
(a+b)c=(ac)+(bc) on the right

Table 2. Ring =(R,+,) properties, for a,b,cR.

Fields.
A ring is a 3-tuple =(F,+,) containing a set F and two operations +,, each with properties of a commutative group, but with special behavior for the inverse of the null element. The multiplicative inverse is denoted as a-1. Scalars S in the definition of a vector space must satisfy the properties of a field. Since the operations are often understood from context a field might be referred to as the full 3-tuple, or, more concisely just through the set of elements as in the definition of a vector space.

Addition rules
(F,+) is a commutative (Abelian) group
Multiplication rules
(F,) is a commutative group except Associativity
that 0-1 does not exist
Distributivity
a(b+c)=(ab)+(ac)

Table 3. Field =(F,+,) properties, for a,b,cF.

Using the above definitions, a vector space 𝒱=(V,S,+,) can be described as a commutative group (V,+) combined with a field S that satisfies the scaling properties a𝒖V, a(𝒖+𝒗)=a𝒖+b𝒗, (a+b)𝒖=a𝒖+b𝒖, a(b𝒖)=(ab)𝒖, 1𝒖=𝒖, for a,bS, 𝒖,𝒗V.

1.2.Vector subspaces

A central interest in data science is to seek simple description of complex objects. A typical situation is that many instances of some object of interest are initially given as an m-tuple 𝒗m with large m. Assuming that addition and scaling of such objects can cogently be defined, a vector space is obtained, say over the field of reals with an Euclidean distance, Em. Examples include for instance recordings of medical data (electroencephalograms, electrocardiograms), sound recordings, or images, for which m can easily reach in to the millions. A natural question to ask is whether all the m real numbers are actually needed to describe the observed objects, or perhaps there is some intrinsic description that requires a much smaller number of descriptive parameters, that still preserves the useful idea of linear combination. The mathematical transcription of this idea is a vector subspace.

Definition. (Vector Subspace) . 𝒰=(U,S,+,), U, is a vector subspace of vector space 𝒱=(V,S,+,) over the same field of scalars S, denoted by 𝒰𝒱, if UV and a,bS, 𝒖,𝒗U, the linear combination a𝒖+b𝒗U.

The above states a vector subspace must be closed under linear combination, and have the same vector addition and scaling operations as the enclosing vector space. The simplest vector subspace of a vector space is the null subspace that only contains the null element, U={𝟎}. In fact any subspace must contain the null element 𝟎, or otherwise closure would not be verified for the particular linear combination 𝒖+(-𝒖)=𝟎. If UV, then 𝒰 is said to be a proper subspace of 𝒱, denoted by 𝒰<𝒱.

Setting n-m components equal to zero in the real space m defines a proper subspace whose elements can be placed into a one-to-one correspondence with the vectors within n. For example, setting component m of 𝒙m equal to zero gives 𝒙=[ x1 x2 xm-1 0 ]T that while not a member of m-1, is in a one-to-one relation with 𝒙'=[ x1 x2 xm-1 ]Tm-1. Dropping the last component of 𝒚m, 𝒚=[ y1 y2 ym-1 ym ]T gives vector 𝒚'=[ y1 y2 ym-1 ]m-1, but this is no longer a one-to-one correspondence since for some given 𝒚', the last component ym could take any value.

octave] 
m=3; x=[1; 2; 0]; xp=x(1:2); disp(xp)

1

2

octave] 
y=[1; 2; 3]; yp=y(1:2); disp(yp)

1

2

octave] 

Vector subspaces arise in decomposition of a vector space. The converse, composition of vector spaces 𝒰=(U,S,+,) 𝒱=(V,S,+,) is also defined in terms of linear combination. A vector 𝒙3 can be obtained as the linear combination

𝒙=[ x1 x2 x3 ]=[ x1 0 0 ]+[ 0 x2 x3 ],

but also as

𝒙=[ x1 x2 x3 ]=[ x1 x2-a 0 ]+[ 0 a x3 ],

for some arbitrary a. In the first case, 𝒙 is obtained as a unique linear combination of a vector from the set U={[ x1 0 0 ]T|x1.} with a vector from V={[ 0 x2 x3 ]T|x2,x3.}. In the second case, there is an infinity of linear combinations of a vector from V with another from W={[ x1 x2 0 ]T|x1,x2.} to the vector 𝒙. This is captured by a pair of definitions to describe vector space composition.

Definition. Given two vector subspaces 𝒰=(U,S,+,), 𝒱=(V,S,+,) of the space 𝒲=(W,S,+,), the sum is the vector space 𝒰+𝒱=(U+V,S,+,), where the sum of the two sets of vectors U,V is U+V={𝒖+𝒗|𝒖U,𝒗V}.

Definition. Given two vector subspaces 𝒰=(U,S,+,), 𝒱=(V,S,+,) of the space 𝒲=(W,S,+,), the direct sum is the vector space 𝒰𝒱=(UV,S,+,), where the direct sum of the two sets of vectors U,V is UV={𝒖+𝒗|!𝒖U,!𝒗V}. (unique decomposition)

Since the same scalar field, vector addition, and scaling is used , it is more convenient to refer to vector space sums simply by the sum of the vector sets U+V, or UV, instead of specifying the full tuplet for each space. This shall be adopted henceforth to simplify the notation.

octave] 
u=[1; 0; 0]; v=[0; 2; 3]; vp=[0; 1; 3]; w=[1; 1; 0]; disp([u+v vp+w]) 

1 1

2 2

3 3

octave] 

In the previous example, the essential difference between the two ways to express 𝒙3 is that UV={𝟎}, but VW={[ 0 a 0 ]T|a.}{𝟎}, and in general if the zero vector is the only common element of two vector spaces then the sum of the vector spaces becomes a direct sum.In practice, the most important procedure to construct direct sums or check when an intersection of two vector subspaces reduces to the zero vector is through an inner product.

Definition. Two vector subspaces U,V of the real vector space m are orthogonal, denoted as UV if 𝒖T𝒗=0 for any 𝒖U,𝒗V.

Definition. Two vector subspaces U,V of U+V are orthogonal complements, denoted U=V, V=U if they are orthogonal subspaces, UV, and UV={𝟎}, i.e., the null vector is the only common element of both subspaces.

octave] 
disp([u'*v vp'*w])

0 1

octave] 

The above concept of orthogonality can be extended to other vector subspaces, such as spaces of functions. It can also be extended to other choices of an inner product, in which case the term conjugate vector spaces is sometimes used.

The concepts of sum and direct sum of vector spaces used linear combinations of the form 𝒖+𝒗. This notion can be extended to arbitrary linear combinations.

Definition. In vector space 𝒱=(V,S,+,), the span of vectors 𝒂1,𝒂2,,𝒂nV,is the set of vectors reachable by linear combination

span{𝒂1,𝒂2,,𝒂n}={𝒃V|x1,,xnSsuchthat𝒃=x1𝒂1++xn𝒂n}.

Note that for real vector spaces a member of the span of the vectors {𝒂1,𝒂2,,𝒂n} is the vector 𝒃 obtained from the matrix vector multiplication

𝒃=𝑨𝒙=[ 𝒂1 𝒂2 𝒂n ][ x1 x2 xn ].

From the above, the span is a subset of the co-domain of the linear mapping 𝒇(𝒙)=𝑨𝒙.

2.Vector subspaces of a linear mapping

The wide-ranging utility of linear algebra essentially results a complete characterization of the behavior of a linear mapping between vector spaces 𝒇:UV, 𝒇(a𝒖+b𝒗)=a𝒇(𝒖)+b𝒇(𝒗). For some given linear mapping the questions that arise are:

  1. Can any vector within V be obtained by evaluation of 𝒇?

  2. Is there a single way that a vector within V can be obtained by evaluation of 𝒇?

Linear mappings between real vector spaces 𝒇:nm, have been seen to be completely specified by a matrix 𝑨m×n. It is common to frame the above questions about the behavior of the linear mapping 𝒇(𝒙)=𝑨𝒙 through sets associated with the matrix 𝑨. To frame an answer to the first question, a set of reachable vectors is first defined.

Definition. The column space (or range) of matrix 𝑨m×n is the set of vectors reachable by linear combination of the matrix column vectors

C(𝑨)=range(𝑨)={𝒃m|𝒙nsuchthat𝒃=𝑨𝒙}.

By definition, the column space is included in the co-domain of the function 𝒇(𝒙)=𝑨𝒙, C(𝑨)m, and is readily seen to be a vector subspace of m. The question that arises is whether the column space is the entire co-domain C(𝑨)=m that would signify that any vector can be reached by linear combination. If this is not the case then the column space would be a proper subset, C(𝑨)m, and the question is to determine what part of the co-domain cannot be reached by linear combination of columns of 𝑨. Consider the orthogonal complement of C(𝑨) defined as the set vectors orthogonal to all of the column vectors of 𝑨, expressed through inner products as

𝒂1T𝒚=0,𝒂2T𝒚=0,,𝒂nT𝒚=0.

This can be expressed more concisely through the transpose operation

𝑨=[ 𝒂1 𝒂2 𝒂n ],𝑨T𝒚=[ 𝒂1T 𝒂2T 𝒂nT ]=[ 𝒂1T𝒚 𝒂2T𝒚 𝒂nT𝒚 ],

and leads to the definition of a set of vectors for which 𝑨T𝒚=𝟎

Definition. The left null space (or cokernel) of a matrix 𝑨m×n is the set

N(𝑨T)=null(𝑨T)={𝒚m|𝑨T𝒚=𝟎}.

Note that the left null space is also a vector subspace of the co-domain of 𝒇(𝒙)=𝑨𝒙, N(𝑨T)m. The above definitions suggest that both the matrix and its transpose play a role in characterizing the behavior of the linear mapping 𝒇=𝑨𝒙, so analagous sets are define for the transpose 𝑨T.

Definition. The row space (or corange) of a matrix 𝑨m×n is the set

R(𝑨)=C(𝑨T)=range(𝑨T)={𝒄n|𝒚m𝒄=𝑨T𝒚}n

Definition. The null space of a matrix 𝑨m×n is the set

N(𝑨)=null(𝑨)={𝒙n|𝑨𝒙=𝟎}n

Examples. Consider a linear mapping between real spaces 𝒇:nm, defined by 𝒚=𝒇(𝒙)=𝑨𝒙=[ y1 yn ]T, with 𝑨m×n.

  1. For n=1, m=3,

    𝑨=[ 1 0 0 ],𝑨T=[ 1 0 0 ],

    the column space C(𝑨) is the y1-axis, and the left null space N(𝑨T) is the y2y3-plane. Vectors that span these spaces are returned by the Octave orth and null functions.

    octave] 
    A=[1; 0; 0]; disp(orth(A));
    disp('-----'); disp(null(A'))

    -1

    -0

    -0

    ––-

    0 0

    1 0

    0 1

    octave] 

  2. For n=2, m=3,

    𝑨=[ 1 -1 0 0 0 0 ]=[ 𝒂1 𝒂2 ],𝑨T=[ 1 0 0 -1 0 0 ],

    the columns of 𝑨 are colinear, 𝒂2=-𝒂1, and the column space C(𝑨) is the y1-axis, and the left null space N(𝑨T) is the y2y3-plane, as before.

    octave] 
    A=[1 -1; 0 0; 0 0]; disp(orth(A));
    disp('-----'); disp(null(A'))

    -1.00000

    -0.00000

    -0.00000

    ––-

    0 0

    1 0

    0 1

    octave] 

  3. For n=2, m=3,

    𝑨=[ 1 0 0 1 0 0 ],𝑨T=[ 1 0 0 0 1 0 ],

    the column space C(𝑨) is the y1y2-plane, and the left null space N(𝑨T) is the y3-axis.

    octave] 
    A=[1 0; 0 1; 0 0]; disp(orth(A));
    disp('-----'); disp(null(A'))

    -1 -0

    -0 -1

    -0 -0

    ––-

    0

    0

    1

    octave] 

  4. For n=2, m=3,

    𝑨=[ 1 1 1 -1 0 0 ],𝑨T=[ 1 1 0 1 -1 0 ],

    the same C(𝑨), N(𝑨T) are obtained, albeit with a different set of spanning vectors returned by orth.

    octave] 
    A=[1 1; 1 -1; 0 0]; disp(orth(A));
    disp('-----'); disp(null(A'))

    0.70711 0.70711

    0.70711 -0.70711

    -0.00000 -0.00000

    ––-

    0

    0

    1

    octave] 

  5. For n=3, m=3,

    𝑨=[ 1 1 3 1 -1 -1 1 1 3 ]=[ 𝒂1 𝒂2 𝒂3 ],
    𝑨T=[ 1 1 1 1 -1 1 3 -1 3 ]=[ 𝒂1T 𝒂2T 𝒂3T ],𝑨T𝒚=[ 𝒂1T𝒚 𝒂2T𝒚 𝒂3T𝒚 ]

    since 𝒂3=𝒂1+2𝒂2, the orthogonality condition 𝑨T𝒚=𝟎 is satisfied by vectors of form 𝒚=[ a 0 -a ], a.

    octave] 
    A=[1 1 3; 1 -1 -1; 1 1 3]; disp(orth(A));
    disp('-----'); disp(null(A'))

    0.69157 0.14741

    -0.20847 0.97803

    0.69157 0.14741

    ––-

    0.70711

    0.00000

    -0.70711

    octave] 

The above low dimensional examples are useful to gain initial insight into the significance of the spaces C(𝑨),N(𝑨T). Further appreciation can be gained by applying the same concepts to processing of images. A gray-scale image of size px by py pixels can be represented as a vector with m=pxpy components, 𝒃[0,1]mm. Even for a small image with px=py=128=27 pixels along each direction, the vector 𝒃 would have m=214 components. An image can be specified as a linear combination of the columns of the identity matrix

𝒃=𝑰𝒃=[ 𝒆1 𝒆2 𝒆m ][ b1 b2 bm ],

with bi the gray-level intensity in pixel i. Similar to the inclined plane example from §1, an alternative description as a linear combination of another set of vectors 𝒂1,,𝒂m might be more relevant. One choice of greater utility for image processing mimics the behavior of the set {1,cost,cos2t,,sint,sin2t,} that extends the second example in §1, would be for m=4

𝑨=[ 𝒂1 𝒂2 𝒂3 𝒂4 ]=[ 1 1 1 0 1 1 0 1 1 0 1 1 1 0 0 0 ].