Linear Mappings

Synopsis. Vectors have been introduced to represent complicated objects, whose description requires m numbers, and the procedure of linear combination allows construction of new vectors. Alternative insights into some object might be obtained by transformation of vectors. Of all possible transformations, those that are compatible with linear combinations are of special interest. It turns out that matrices are not only important in organizing collections of vectors, but also to represent such transformations, referred to as linear mappings.

1.Functions

1.1.Relations

The previous chapter focused on mathematical expression of the concept of quantification, the act of associating human observation with measurements, as a first step of scientific inquiry. Consideration of different types of quantities led to various types of numbers, vectors as groupings of numbers, and matrices as groupings of vectors. Symbols were introduced for these quantities along with some intial rules for manipulating such objects, laying the foundation for an algebra of vectors and matrices. Science seeks to not only observe, but to also explain, which now leads to additional operations for working with vectors and matrices that will define the framework of linear algebra.

Explanations within scientific inquiry are formulated as hypotheses, from which predictions are derived and tested. A widely applied mathematical transcription of this process is to organize hypotheses and predictions as two sets X and Y, and then construct another set R of all of the instances in which an element of X is associated with an element in Y. The set of all possible instances of xX and yY, is the Cartesian product of X with Y, denoted as X×Y={(x,y)|.xX,yY}, a construct already encountered in the definition of the real 2-space 2=(2,,+,) where 2=×. Typically, not all possible tuples (x,y)X×Y are relevant leading to the following definition.

Definition. (Relation) . A relation R between two sets X,Y is a subset of the Cartesian product X×Y, RX×Y.

The key concept is that of associating an input xX with an output yY. Inverting the approach and associating an output to an input is also useful, leading to the definition of an inverse relation as R-1Y×X, R-1={(y,x)|(x,y)R.}. Note that an inverse exists for any relation, and the inverse of an inverse is the original relation, (R-1)-1=R. From the above, a relation is a triplet (a tuple with three elements), (X,Y,R), that will often be referred to by just its last member R.

Homogeneous relations.
Many types of relations are defined in mathematics and encountered in linear algebra, and establishing properties of specific relations is an important task within data science. A commonly encountered type of relationship is from a set onto itself, known as a homogeneous relation. For homogeneous relations HA×A, it is common to replace the set membership notation (a,b)H to state that aA is in relationship H with bA, with a binary operator notation a?Hb. Familiar examples include the equality and less than relationships between reals, E,L×, in which (a,b)E is replaced by a=b, and (a,b)L is replaced by a<b. The equality relationship is its own inverse, and the inverse of the less than relationship is the greater than relation G×, G=L-1, a<bb>a.

1.2.Functions

Functions between sets X and Y are a specific type of relationship that often arise in science. For a given input xX, theories that predict a single possible output yY are of particular scientific interest.

Definition. (Function) . A function from set X to set Y is a relation FX×Y, that associates to xX a single yY.

The above intuitive definition can be transcribed in precise mathematical terms as FX×Y is a function if (x,y)F and (x,z)F implies y=z. Since it's a particular kind of relation, a function is a triplet of sets (X,Y,F), but with a special, common notation to denote the triplet by f:XY, with F={(x,f(x))|xX,f(x)Y.} and the property that (x,y)Fy=f(x). The set X is the domain and the set Y is the codomain of the function f. The value from the domain xX is the argument of the function associated with the function value y=f(x). The function value y is said to be returned by evaluation y=f(x).

Whereas all relations can be inverted, and inversion of a function defines a new relation, but which might not itself be a function. For example the relation S-1={(α,a),(β,a),(γ,a)} is a function, but its inverse (S-1)-1=S is not.

Familiar functions include:

Simple functions such as sin, cos, exp, log, are predefined in Julia, and can be applied to each component of a vector argument by broadcasting, denoted by a period in front of the paranteses enclosing the argument.

θ=π; [sin(θ) cos(θ) exp(θ) log(θ)]

[ 0.0 -1.0 23.140692632779267 1.1447298858494002 ] (1)

θ=0:π/6:π; short(x)=round(x,digits=6); short.(sin.(θ))'

[ 0.0 0.5 0.866025 1.0 0.866025 0.5 0.0 ] (2)

short.(log2.(1:8))'

[ 0.0 1.0 1.584963 2.0 2.321928 2.584963 2.807355 3.0 ] (3)

A construct that will be often used is to interpret a vector within Em as a function, since 𝒗m with components 𝒗=[ v1 v2 vm ]T also defines a function v:{1,2,,m}, with values v(i)=vi. As the number of components grows the function v can provide better approximations of some continuous function f𝒞0() through the function values vi=v(i)=f(xi) at distinct sample points x1,x2,,xm.

The above function examples are all defined on a domain of scalars or naturals and returned scalar values. Within linear algebra the particular interest is on functions defined on sets of vectors from some vector space 𝒱=(V,S,+,) that return either scalars f:VS, or vectors from some other vector space 𝒲=(W,S,+,), 𝒈:VW. The codomain of a vector-valued function might be the same set of vectors as its domain, 𝒉:VV. The fundamental operation within linear algebra is the linear combination a𝒖+b𝒗 with a,bS, 𝒖,𝒗V. A key aspect is to characterize how a function behaves when given a linear combination as its argument, for instance f(a𝒖+b𝒗) or 𝒈(a𝒖+b𝒗).

1.3.Linear functionals

Consider first the case of a function defined on a set of vectors that returns a scalar value. These can be interpreted as labels attached to a vector, and are very often encountered in applications from natural phenomena or data analysis.

Definition. (Functional) . A functional on vector space 𝒱=(V,S,+,) is a function from the set of vectors V to the set of scalars S of the vector space 𝒱.

Definition. (Linear Functional) . The functional f:VS on vector space 𝒱=(V,S,+,) is a linear functional if for any two vectors 𝒖,𝒗V and any two scalars a,b

f(a𝒖+b𝒗)=af(𝒖)+bf(𝒗). (4)

1.4.Linear mappings

Consider now functions 𝒇:VW from vector space 𝒱=(V,S,+,) to another vector space 𝒲=(W,T,+,). As before, the action of such functions on linear combinations is of special interest.

Definition. (Linear Mapping) . A function 𝒇:VW, from vector space 𝒱=(V,S,+,) to vector space 𝒲=(W,S,+,) is called a linear mapping if for any two vectors 𝒖,𝒗V and any two scalars a,bS

𝒇(a𝒖+b𝒗)=a𝒇(𝒖)+b𝒇(𝒗). (5)

The image of a linear combination a𝒖+b𝒗 through a linear mapping is another linear combination a𝒇(𝒖)+b𝒇(𝒗), and linear mappings are said to preserve the structure of a vector space.

Note that f: defined as y=f(x)=ax+b represents a line in the (x,y)-plane, but is not a linear mapping for b0 since

f(x+z)=a(x+z)+bax+b+az+b=f(x)+f(z).

Matrix-vector multiplication has been introduced as a concise way to specify a linear combination

𝒇(𝒙)=𝑨𝒙=x1𝒂1++xn𝒂n,

with 𝒂1,,𝒂n the columns of the matrix, 𝑨=[ 𝒂1 𝒂2 𝒂n ]. This is a linear mapping between the real spaces m, n, 𝒇:mn, and indeed any linear mapping between real spaces can be given as a matrix-vector product. Consider some 𝒙m

𝒙=[ x1 x2 xm ]=x1[ 1 0 0 ]+x2[ 0 1 0 ]++xm[ 0 0 1 ]=x1𝒆1+x2𝒆2++xm𝒆m.

Applying the linear mapping 𝒇 to 𝒙 leads to

𝒇(𝒙)=𝒇(x1𝒆1+x2𝒆2++xm𝒆m)=x1𝒇(𝒆1)+x2𝒇(𝒆2)++xm𝒇(𝒆m).

The matrix 𝑨 with columns 𝒂1=𝒇(𝒆1),,𝒂m=𝒇(𝒆m) now allows finding

𝒇(𝒙)=𝑨𝒙,

through a matrix-vector multiplication for any input vector 𝒙. The matrix 𝑨 thus defined is a representation of the linear mapping 𝒇. As will be shown later, it is not the only possible representation.

2.Measurements

Vectors within the real space m can be completely specified by m real numbers, and m is large in many realistic applications. The task of describing the elements of a vector space 𝒱=(V,S,+,) by simpler means arises. Within data science this leads to classification problems in accordance with some relevant criteria, and one of the simplest classifications is to attach a scalar label to a vector. Commonly encountered labels include the magnitude of a vector or its orientation with respect to another vector.

2.1.Norms

The above observations lead to the mathematical concept of a norm as a tool to evaluate vector magnitude. Recall that a vector space is specified by two sets and two operations, 𝒱=(V,S,+,), and the behavior of a norm with respect to each of these components must be defined. The desired behavior includes the following properties and formal definition.

Unique value

The magnitude of a vector 𝒗V should be a unique scalar, requiring the definition of a function. The scalar could have irrational values and should allow ordering of vectors by size, so the function should be from V to , f:V. On the real line the point at coordinate x is at distance |x| from the origin, and to mimic this usage the norm of 𝒗V is denoted as ||𝒗||, leading to the definition of a function ||||:V+, +={a|a,a0.}.

Null vector case

Provision must be made for the only distinguished element of V, the null vector 𝟎. It is natural to associate the null vector with the null scalar element, ||𝟎||=0. A crucial additional property is also imposed namely that the null vector is the only vector whose norm is zero, ||𝒗||=0𝒗=𝟎. From knowledge of a single scalar value, an entire vector can be determined. This property arises at key junctures in linear algebra, notably in providing a link to mathematical analysis, and is needed to establish the fundamental theorem of linear algbera or the singular value decomposition encountered later.

Scaling

Transfer of the scaling operation 𝒗=a𝒖 property leads to imposing ||𝒗||=|a|||𝒖||. This property ensures commensurability of vectors, meaning that the magnitude of vector 𝒗 can be expressed as a multiple of some standard vector magnitude ||𝒖||.

Vector addition

Position vectors from the origin to coordinates x,y>0 on the real line can be added and |x+y|=|x|+|y|. If however the position vectors point in different directions, x>0, y<0, then |x+y|<|x|+|y|. For a general vector space the analogous property is known as the triangle inequality, ||𝒖+𝒗||||𝒖||+||𝒗|| for 𝒖,𝒗V.

Definition. (Norm) . A norm on the vector space 𝒱=(V,S,+,) is a function ||||:V+ that for 𝒖,𝒗V, aS satisfies:

  1. ||𝒗||=0𝒗=𝟎;

  2. ||a𝒖||=|a|||𝒖||;

  3. ||𝒖+𝒗||||𝒖||+||𝒗||.

A commonly encountered norm of 𝒗m is the Euclidean norm

||𝒗||=v12++vm2=(v12++vm2)1/2=(j=1mvj2)1/2,

useful in many physics applications. The form of the above norm, square root of sum of squares of components, can be generalized to obtain other useful norms.

Definition. (p-Norm in m) . The p-norm on the real vector space m=(m,,+,) for p1 is the function ||||p:V+ with values ||𝒙||p=(|x1|p+|x2|p++|xm|p)1/p, or

||𝒙||p=(i=1m|xi|p)1/pfor𝒙m. (6)

Note that the Euclidean norm corresponds to p=2, and is often called the 2-norm. Denote by xi the largest component in absolute value of 𝒙m. As p increases, |xi|p becomes dominant with respect to all other terms in the sum suggesting the definition of an inf-norm by

||𝒙||=max1im|xi|.

This also works for vectors with equal components, since the fact that the number of components is finite while p can be used as exemplified for 𝒙=[ a a a ]T, by ||𝒙||p=(m|a|p)1/p=m1/p|a|, with m1/p1.

Figure 1. Regions within 2 for which ||𝒙||p1, for p=1,2,3,.

Vector norms arise very often in applications, especially in data science since they can be used to classify data, and are implemented in software systems such as Julia in which the norm function with a single argument computes the most commonly encountered norm, the 2-norm. If a second argument p is specified the p-norm is computed.

x=[1; 1; 1]; [norm(x) sqrt(3)]

[ 1.7320508075688772 1.7320508075688772 ] (7)

m=9; x=ones(m,1); [norm(x) sqrt(m)]

[ 3.0 3.0 ] (8)

m=4; x=ones(m,1); [norm(x,1) m]

[ 4.0 4.0 ] (9)

2.2.Inner product

Norms are functionals that define what is meant by the size of a vector, but are not linear. Even in the simplest case of the real line, the linearity relation |x+y|=|x|+|y| is not verified for x>0, y<0. Nor do norms characterize the familiar geometric concept of orientation of a vector. A particularly important orientation from Euclidean geometry is orthogonality between two vectors. Another function is required, one that would take two vector arguments to enable characterizing their relative orientation. It would return a scalar, hence s:V×VS, with S often chosen as the set of real numbers.

Definition. (Inner Product) . A real inner product in the vector space 𝒱=(V,,+,) is a function s:V×V with properties

Symmetry

For any 𝒂,𝒙V, s(𝒂,𝒙)=s(𝒙,𝒂).

Linearity in second argument

For any 𝒂,𝒙,𝒚V, α,β, s(𝒂,α𝒙+β𝒚)=αs(𝒂,𝒙)+βs(𝒂,𝒚).

Positive definiteness

For any 𝒙V\{𝟎}, s(𝒙,𝒙)>0.

A commonly encountered inner product is the dot product of two vectors 𝒂,𝒙m

𝒂𝒙=a1x1++amxm.

Using the convention of representing 𝒂,𝒙 as column vectors, the dot product is also expressed as

𝒂T𝒙=[ a1 a2 am ][ x1 x2 xm ],

and is therefore a matrix multiplication between 𝒂T1×m and 𝒙m×1 resulting in a scalar, also referred to as a scalar product. Inner products also provide a procedure to evaluate geometrical quantities and relationships.

Vector norm

The square of the 2-norm of 𝒙m is given as

s(𝒙,𝒙)=𝒙T𝒙=||𝒙||22.

In general, the square root of s(𝒙,𝒙) satisfies the properties of a norm, and is called the norm induced by an inner product

||𝒙||=s(𝒙,𝒙)1/2.

A real space together with the scalar product s(𝒙,𝒚)=𝒙T𝒚 and induced norm ||𝒙||=s(𝒙,𝒙)1/2 defines an Euclidean vector space m.

Orientation

In 2 the point specified by polar coordinates (r,θ) has the Cartesian coordinates x1=rcosθ, x2=rsinθ, and position vector 𝒙=[ x1 x2 ]T. The inner product

𝒆1T𝒙=[ 1 0 ][ x1 x2 ]=1x1+0x2=rcosθ,

is seen to contain information on the relative orientation of 𝒙 with respect to 𝒆1. In general, the angle θ between two vectors 𝒙,𝒚 with any vector space with a scalar product can be defined by

cosθ=s(𝒙,𝒚)[s(𝒙,𝒙)s(𝒚,𝒚)]1/2=s(𝒙,𝒚)||𝒙||||𝒚||,

which becomes

cosθ=𝒙T𝒚||𝒙||||𝒚||,

in a Euclidean space, 𝒙,𝒚m.

Orthogonality

In 2 two vectors are orthogonal if the angle between them is such that cosθ=0, and this can be extended to an arbitrary vector space 𝒱=(V,,+,) with a scalar product by stating that 𝒙,𝒚V are orthogonal if s(𝒙,𝒚)=0. In m vectors 𝒙,𝒚m are orthogonal if 𝒙T𝒚=0.

3.Linear mapping matrices

3.1.Common geometric transformations

Several geometric transformations are linear mappings and are widely used in applications.

Stretching.
Different stretch ratios along separate axis in m is described by the linear mapping, s:mm,

s(𝒙)=[ λ1x1 λ2x2 λmxm ].

The matrix associated with stretching is

𝑺=[ s(𝒆1) s(𝒆2) s(𝒆m) ]=[ λ1 0 0 0 λ2 0 0 0 λm ]=diag(λ1,λ2,,λm),

and has a remarkably simple form known as a diagonal matrix.

Projection.
A very important transformation is projection of a vector 𝒗m along the direction of another vector 𝒖𝟎. It is convenient to define a vector of unit length along the direction of 𝒖 by

𝒒=𝒖||𝒖||.

The resulting vector is 𝒘=pq(𝒗)m has the same number of components as 𝒗, and is of length ||𝒗||cosθ in the direction of 𝒒, stated as

𝒘=(||𝒗||cosθ)𝒒.

The matrix associated with projection is

𝑷𝒒=[ pq(𝒆1) pq(𝒆2) pq(𝒆m) ].

The jth unit vector 𝒆j is at angle θj with respect to 𝒒,

cosθj=𝒒T𝒆j=[ q1 q2 qm ][ 0 1 0 ]=qj,

and the projection of 𝒆j along 𝒒 is

pq(𝒆j)=(||𝒆j||cosθj)𝒒=qj𝒒.

Gathering projections of all the unit vectors within the identity matrix gives

𝑷𝒒=[ q1𝒒 q2𝒒 qm𝒒 ].

Note that 𝑷𝒒 contains m column vectors that are all scalings of the 𝒒 vector, by coefficients q1,q2,,qm that are the components of 𝒒 itself. Since scaling is a linear combination, the above m linear combinations can be expressed as a matrix-matrix product

𝑷𝒒=𝒒[ q1 q2 qm ],

leading to the remarkably simple expression

𝑷𝒒=𝒒𝒒Tm×m.

Figure 2. Projection (p𝒒) and reflection (r𝒒) operations in two dimensions

Reflection.
Another widely used geometric transformation that is also a linear mapping is the reflection of a vector 𝒗 across another vector 𝒖. As before, introduce a unit vector in the direction of 𝒖, 𝒒=𝒖/||𝒖||, and let 𝒛=𝒓𝒒(𝒗)=𝑹𝒒𝒗 be the reflection of 𝒗 across 𝒒. The reflection matrix can be constructed from the previous projection matrix. Start from the vector addition

𝒘=𝑷𝒒𝒗=(𝒒𝒒T)𝒗=𝒗+𝒚,

that can be interpreted as stating that the projection 𝒘 is obtained from 𝒗 by addition of the vector 𝒚. The reflection of 𝒘 across 𝒒 is obtained by starting at 𝒗, and adding 2𝒚,

𝒛=𝒗+2(𝒘-𝒗)=2𝒘-𝒗=2(𝒒𝒒T)𝒗-𝒗=2(𝒒𝒒T)𝒗-𝑰𝒗=[2(𝒒𝒒T)-𝑰]𝒗=𝑹𝒒𝒗.

The reflection matrix results

𝑹𝒒=2(𝒒𝒒T)-𝑰.

Rotation in 2.
The previous geometric transformations are valid in m for arbitrary m. Rotation mappings are not as readily generalizable. In two dimensions, consider rθ(𝒗)=𝑹θ𝒗, with

𝑹θ=[ rθ(𝒆1) rθ(𝒆2) ]=[ cosθ -sinθ sinθ cosθ ]

Figure 3. Rotation rθ(𝒗)=𝑹θ𝒗 in two dimensions.

Rotation in 3.
The axis of the above two dimensinal rotation is a third direction perpendicular to the x1x2-plane. A vector 𝒗3 would not change its third coordinate under such a transformation 𝒓θ,3(𝒗) hence the associated rotation matrix is readily obtained as

𝑹θ=[ rθ,3(𝒆1) rθ,3(𝒆2) rθ,3(𝒆3) ]=[ cosθ -sinθ 0 sinθ cosθ 0 0 0 1 ].

3.2.Matrix-matrix product

From two functions f:AB and g:BC, a composite function, h=gf, h:AC is defined by

h(x)=g(f(x)).

Consider linear mappings between Euclidean spaces 𝒇:nm, 𝒈:mp. Recall that linear mappings are expressed as matrix vector multiplications

𝒇(𝒙)=𝑨𝒙,𝒈(𝒚)=𝑩𝒚,𝑨m×n,𝑩p×m.

The composite function 𝒉=𝒈𝒇 is 𝒉:np, defined by

𝒉(𝒙)=𝒈(𝒇(𝒙))=𝒈(𝑨𝒙)=𝑩𝑨𝒙.

Note that the intemediate vector 𝒖=𝑨𝒙 is subsequently multiplied by the matrix 𝑩. The composite function 𝒉 is itself a linear mapping

𝒉(a𝒙+b𝒚)=𝑩𝑨(a𝒙+b𝒚)=𝑩(a𝑨𝒙+b𝑨𝒚)=a𝑩𝑨𝒙+b𝑩𝑨𝒚=a𝒉(𝒙)+b𝒉(𝒚),

so it also can be expressed a matrix-vector multiplication

𝒉(𝒙)=𝑪𝒙=𝑩𝑨𝒙. (10)

Using the above, 𝑪 is defined as the product of matrix 𝑩 with matrix 𝑨

𝑪=𝑩𝑨.

The columns of 𝑪 can be determined from those of 𝑨 by considering the action of 𝒉 on the the column vectors of the identity matrix 𝑰=[ 𝒆1 𝒆2 𝒆n ]n×n. First, note that

𝑨𝒆j=[ 𝒂1 𝒂2 𝒂n ][ 1 0 0 ]=𝒂1,,𝑨𝒆j=[ 𝒂1 𝒂2 𝒂n ][ 0 1 0 ]=𝒂j,𝑨𝒆n=[ 𝒂1 𝒂2 𝒂n ][ 0 0 1 ]=𝒂n. (11)

The above can be repeated for the matrix 𝑪=[ 𝒄1 𝒄2 𝒄n ] giving

𝒉(𝒆1)=𝑪𝒆1=𝒄1,,𝒉(𝒆j)=𝑪𝒆j=𝒄j,,𝒉(𝒆n)=𝑪𝒆n=𝒄n. (12)

Combining the above equations leads to 𝒄j=𝑩𝒂j, or

𝑪=[ 𝒄1 𝒄2 𝒄n ]=𝑩[ 𝒂1 𝒂2 𝒂n ].

From the above the matrix-matrix product 𝑪=𝑩𝑨 is seen to simply be a grouping of all the products of 𝑩 with the column vectors of 𝑨,

𝑪=[ 𝒄1 𝒄2 𝒄n ]=[𝑩 𝒂1 𝑩𝒂2 𝑩𝒂n ]

The above results can readily be verified computationally.

a1=[1; 2]; a2=[3; 4]; A=[a1 a2]

[ 1 3 2 4 ] (13)

b1=[-1; 1; 3]; b2=[2; -2; 3]; B=[b1 b2]

[ -1 2 1 -2 3 3 ] (14)

C=B*A

[ 3 5 -3 -5 9 21 ] (15)

c1=B*a1; c2=B*a2; [c1 c2]

[ 3 5 -3 -5 9 21 ] (16)

3.3.Properties of the matrix-matrix product

Matrix-matrix products have been seen to arise from composition of linear mappings, and their properties arise from such compositions. Whereas matrix addition is direct consequence of component-by-component operations, matrix products exhibit some particularities.

Non-commutativity

Consider 𝒈(𝒗)=𝑩𝒗 to be rotation of vectors in 2 by angle θ=π/4, and 𝒇(𝒗)=𝑨𝒗 to be reflection across the 𝒆1 direction. The associated matrices are

𝑨=[ 1 0 0 -1 ],𝑩=12[ 1 -1 1 1 ].

Rotation followed by reflection of the vector 𝒆1 is

(𝒇𝒈)(𝒆1)=𝒇(𝒈(𝒆1))=𝑨𝑩𝒆1=12[ 1 -1 ],

whereas reflection followed by rotation is

(𝒈𝒇)(𝒆1)=𝒈(𝒇(𝒆1))=𝑩𝑨𝒆1=12[ 1 1 ],

a different result, highlighting that the order of applying linear transformations is important. Indeed, from

𝑨𝑩=[ 1 0 0 -1 ]12[ 1 -1 1 1 ]=12[ 1 -1 -1 -1 ],𝑩𝑨=12[ 1 -1 1 1 ][ 1 0 0 -1 ]=12[ 1 1 1 -1 ],

it is seen that 𝑨𝑩𝑩𝑨, and in general matrix multiplication is not commutative.

Associativity

Consider now three linear mappings 𝒇(𝒙)=𝑨𝒙, 𝒈(𝒙)=𝑩𝒙, 𝒉(𝒙)=𝑪𝒙, and compare the results of (𝒇𝒈)𝒉 to 𝒇(𝒈𝒉)

((𝒇𝒈)𝒉)(𝒙)=(𝒇𝒈)(𝒉(𝒙))=(𝑨𝑩)(𝑪𝒙)
(𝒇(𝒈𝒉))(𝒙)=𝒇((𝒈𝒉)(𝒙))=𝑨(𝑩𝑪𝒙).

Since (𝒇𝒈)(𝒉(𝒙))=𝒇(𝒈(𝒉(𝒙)))=(𝒇(𝒈𝒉))(𝒙), it results that the above two expressions should also be equal for arbitrary 𝒙,

(𝑨𝑩)𝑪=𝑨(𝑩𝑪),

and matrix multiplication is said to be associative.

Product transposition

The transpose of a linear combination is a straightforward reorganization of vector components as rows instead of columns,

(𝑨𝒙)T=(x1𝒂1++xn𝒂n)T=x1𝒂1T++xn𝒂nT.

The above can be expressed as a matrix multiplication

(𝑨𝒙)T=[ x1 x2 xn ][ 𝒂1T 𝒂2T 𝒂nT ]=𝒙T𝑨T.

Extending the above to multiple linear combinations gives

(𝑨𝑿)T=([ 𝑨𝒙1 𝑨𝒙2 𝑨𝒙p ])T=[ (𝑨𝒙1)T (𝑨𝒙2)T (𝑨𝒙p)T ]=[ 𝒙1T𝑨T 𝒙2T𝑨T 𝒙pT𝑨T ]=[ 𝒙1T 𝒙2T 𝒙pT ]𝑨T=𝑿T𝑨T,

the transpose of a product is the product of the transposes.

3.4.Block matrix operations

In expression such as

𝑨T=([ 𝒂1 𝒂2 𝒂n ])T=[ 𝒂1T 𝒂2T 𝒂nT ],

the column vectors 𝒂1,𝒂2,,𝒂n can be interpreted as blocks of size m×1 within the matrix 𝑨 of size m×n. A pervasive task within linear algebra application to large data sets is to break down the problem into smaller parts. Consider that matrix 𝑴 of size m×n can be broken up into blocks

𝑴=[ 𝑼 𝑽 𝑿 𝒀 ].

The dimensions of the blocks have to be compatible, and if 𝑼 has size p×q and 𝒀 has size r×s, it must hold that

m=p+r,n=q+s,

and matrix 𝑿 has size r×q, 𝑽 has size p×s.

Block transposition

The transpose of 𝑴 is

𝑴T=[ 𝑼T 𝑿T 𝑽T 𝒀T ].

Block addition

Assuming compatibility of block dimensions, addition is carried out block-by-block.

𝑴+𝑵=[ 𝑼 𝑽 𝑿 𝒀 ]+[ 𝑷 𝑸 𝑹 𝑺 ]=[ 𝑼+𝑷 𝑽+𝑸 𝑿+𝑹 𝒀+𝑺 ].

Block multiplication

The “rows-over-columns” rule carries over to blocks

𝑴𝑵=[ 𝑼 𝑽 𝑿 𝒀 ][ 𝑷 𝑸 𝑹 𝑺 ]=[ 𝑼𝑷+𝑽𝑹 𝑼𝑸+𝑽𝑺 𝑿𝑷+𝒀𝑹 𝑿𝑸+𝒀𝑺 ],

where non-commutativity of matrix multiplication has to be respected.