Lecture 5: Linear Functionals and Mappings

1.Functions

1.1.Relations

A general procedure to relate input values from set X to output values from set Y is to first construct the set of all possible instances of xX and yY, which is the Cartesian product of X with Y, denoted as X×Y={(x,y)|.xX,yY}. Usually only some associations of inputs to outputs are of interest leading to the following definition.

Definition. (Relation) . A relation R between two sets X,Y is a subset of the Cartesian product X×Y, RX×Y.

Associating an output to an input is also useful, leading to the definition of an inverse relation as R-1Y×X, R-1={(y,x)|(x,y)R.}. Note that an inverse exists for any relation, and the inverse of an inverse is the original relation, (R-1)-1=R.

Homogeneous relations.
Many types of relations are defined in mathematics and encountered in linear algebra. A commonly encountered type of relationship is from a set onto itself, known as a homogeneous relation. For homogeneous relations HA×A, it is common to replace the set membership notation (a,b)H to state that aA is in relationship H with bA, with a binary operator notation a?Hb. Familiar examples include the equality and less than relationships between reals, E,L×, in which (a,b)E is replaced by a=b, and (a,b)L is replaced by a<b. The equality relationship is its own inverse, and the inverse of the less than relationship is the greater than relation G×, G=L-1, a<bb>a. Homogeneous relations HA×A are classified according to the following criteria.

Reflection

Relation H is reflexive if (a,a)H for any aA. The equality relation E× is reflexive, aA,a=a, the less than relation L× is not, 1R,11.

Symmetry

Relation H is symmetric if (a,b)H implies that (b,a)H, (a,b)H(b,a)H. The equality relation E× is symmetric, a=bb=a, the less than relation L× is not, a<bb<a.

Anti-symmetry

Relation H is anti-symmetric if (a,b)H for ab, then (b,a)H. The less than relation L× is antisymmetric, a<bba.

Transitivity

Relation H is transitive if (a,b)H and (b,c)H implies (a,c)H. for any aA. The equality relation E× is transitive, a=bb=ca=c, as is the less than relation L×, a<bb<ca<c.

Certain combinations of properties often arise. A homogeneous relation that is reflexive, symmetric, and transitive is said to be an equivalence relation. Equivalence relations include equality among the reals, or congruence among triangles. A homogeneous relation that is reflexive, anti-symmetric and transitive is a partial order relation, such as the less than or equal relation between reals. Finally, a homogeneous relation that is anti-symmetric and transitive is an order relation, such as the less than relation between reals.

1.2.Functions

Functions between sets X and Y are a specific type of relationship that often arise in science. For a given input xX, theories that predict a single possible output yY are of particular scientific interest.

Definition. (Function) . A function from set X to set Y is a relation FX×Y, that associates to xX a single yY.

The above intuitive definition can be transcribed in precise mathematical terms as FX×Y is a function if (x,y)F and (x,z)F implies y=z. Since it's a particular kind of relation, a function is a triplet of sets (X,Y,F), but with a special, common notation to denote the triplet by f:XY, with F={(x,f(x))|xX,f(x)Y.} and the property that (x,y)Fy=f(x). The set X is the domain and the set Y is the codomain of the function f. The value from the domain xX is the argument of the function associated with the function value y=f(x). The function value y is said to be returned by evaluation y=f(x).

As seen previously, a Euclidean space Em=(m,,+,) can be used to suggest properties of more complex spaces such as the vector space of continuous functions 𝒞0(). A construct that will be often used is to interpret a vector within Em as a function, since 𝒗m with components 𝒗=[ v1 v2 vm ]T also defines a function v:{1,2,,m}, with values v(i)=vi. As the number of components grows the function v can provide better approximations of some continuous function f𝒞0() through the function values vi=v(i)=f(xi) at distinct sample points x1,x2,,xm.

The above function examples are all defined on a domain of scalars or naturals and returned scalar values. Within linear algebra the particular interest is on functions defined on sets of vectors from some vector space 𝒱=(V,S,+,) that return either scalars f:VS, or vectors from some other vector space 𝒲=(W,S,+,), 𝒈:VW. The codomain of a vector-valued function might be the same set of vectors as its domain, 𝒉:VV. The fundamental operation within linear algebra is the linear combination a𝒖+b𝒗 with a,bS, 𝒖,𝒗V. A key aspect is to characterize how a function behaves when given a linear combination as its argument, for instance f(a𝒖+b𝒗) or 𝒈(a𝒖+b𝒗).

1.3.Linear functionals

Consider first the case of a function defined on a set of vectors that returns a scalar value. These can be interpreted as labels attached to a vector, and are very often encountered in applications from natural phenomena or data analysis.

Definition. (Functional) . A functional on vector space 𝒱=(V,S,+,) is a function from the set of vectors V to the set of scalars S of the vector space 𝒱.

Definition. (Linear Functional) . The functional f:VS on vector space 𝒱=(V,S,+,) is a linear functional if for any two vectors 𝒖,𝒗V and any two scalars a,b

f(a𝒖+b𝒗)=af(𝒖)+bf(𝒗). (1)

Many different functionals may be defined on a vector space 𝒱=(V,S,+,), and an insightful alternative description is provided by considering the set of all linear functionals, that will be denoted as V={f|f:VS.}. These can be organized into another vector space 𝒱=(V,S,+,) with vector addition of linear functionals f,gV and scaling by aS defined by

(f+g)(𝒖)=f(𝒖)+g(𝒖),(af)(𝒖)=af(𝒖),𝒖V. (2)

Definition. (Dual Vector Space) . For some vector space 𝒱, the vector space of linear functionals 𝒱 is called the dual vector space.

As is often the case, the above abstract definition can better be understood by reference to the familiar case of Euclidean space. Consider 2=(2,,+,), the set of vectors in the plane with 𝒙2 the position vector from the origin (0,0) to point X in the plane with coordinates (x1,x2). One functional from the dual space 2 is f2(𝒙)=x2, i.e., taking the second coordinate of the position vector. The linearity property is readily verified. For 𝒙,𝒚2, a,b,

f2(a𝒙+b𝒚)=ax2+by2=af2(𝒙)+bf2(𝒚).

Given some constant value h, the curves within the plane defined by f2(𝒙)=h are called the contour lines or level sets of f2. Several contour lines and position vectors are shown in Figure 1. The utility of functionals and dual spaces can be shown by considering a simple example from physics. Assume that x2 is the height above ground level and a vector 𝒙 is the displacement of a body of mass m in a gravitational field. The mechanical work done to lift the body from ground level to height h is W=mgh with g the gravitational acceleration. The mechanical work is the same for all displacements 𝒙 that satisfy the equation f2(𝒙)=h. The work expressed in units mgΔh can be interpreted as the number of contour lines f2(𝒙)=nΔh intersected by the displacement vector 𝒙. This concept of duality between vectors and scalar-valued functionals arises throughout mathematics, the physical and social sciences and in data science. The term “duality” itself comes from geometry. A point X in 2 with coordinates (x1,x2) can be defined either as the end-point of the position vector 𝒙, or as the intersection of the contour lines of two functionals f1(𝒙)=x1 and f2(𝒙)=x2. Either geometric description works equally well in specifying the position of X, so it might seem redundant to have two such procedures. It turns out though that many quantities of interest in applications can be defined through use of both descriptions, as shown in the computation of mechanical work in a gravitational field.

Figure 1. Vectors in E2 and contour lines of the functional f(𝒙)=x2

1.4.Linear mappings

Consider now functions 𝒇:VW from vector space 𝒱=(V,S,+,) to another vector space 𝒲=(W,T,+,). As before, the action of such functions on linear combinations is of special interest.

Definition. (Linear Mapping) . A function 𝒇:VW, from vector space 𝒱=(V,S,+,) to vector space 𝒲=(W,S,,) is called a linear mapping if for any two vectors 𝒖,𝒗V and any two scalars a,bS

𝒇(a𝒖+b𝒗)=a𝒇(𝒖)+b𝒇(𝒗). (3)

The image of a linear combination a𝒖+b𝒗 through a linear mapping is another linear combination a𝒇(𝒖)+b𝒇(𝒗), and linear mappings are said to preserve the structure of a vector space, and called homomorphisms in mathematics. The codomain of a linear mapping might be the same as the domain in which case the mapping is said to be an endomorphism.

Matrix-vector multiplication has been introduced as a concise way to specify a linear combination

𝒇(𝒙)=𝑨𝒙=x1𝒂1++xn𝒂n,

with 𝒂1,,𝒂n the columns of the matrix, 𝑨=[ 𝒂1 𝒂2 𝒂n ]. This is a linear mapping between the real spaces m, n, 𝒇:mn, and indeed any linear mapping between real spaces can be given as a matrix-vector product.

2.Measurements

Vectors within the real space m can be completely specified by m real numbers, even though m is large in many realistic applications. A vector within 𝒞0(), i.e., a continuous function defined on the reals, cannot be so specified since it would require an infinite, non-countable listing of function values. In either case, the task of describing the elements of a vector space 𝒱=(V,S,+,) by simpler means arises. Within data science this leads to classification problems in accordance with some relevant criteria.

2.1.Equivalence classes

Many classification criteria are scalars, defined as a scalar-valued function f:𝒱S on a vector space, 𝒱=(V,S,+,). The most common criteria are inspired by experience with Euclidean space. In a Euclidean-Cartesian model (2,,+,) of the geometry of a plane Π, a point OΠ is arbitrarily chosen to correspond to the zero vector 𝟎=[ 0 0 ]T, along with two preferred vectors 𝒆1,𝒆2 grouped together into the identity matrix 𝑰. The position of a point XΠ with respect to O is given by the linear combination

𝒙=𝑰𝒙+𝟎=[ 𝒆1 𝒆2 ][ x1 x2 ]=x1𝒆1+x2𝒆2.

Several possible classifications of points in the plane are depicted in Figure 2: lines, squares, circles. Intuitively, each choice separates the plane into subsets, and a given point in the plane belongs to just one in the chosen family of subsets. A more precise characterization is given by the concept of a partition of a set.

Definition. (Partition) . A partition of a set is a grouping of its elements into non-empty subsets such that every element is included in exactly one subset.

In precise mathematical terms, a partition of set S is P={Si|SiP,Si.,iI} such that xS, !jI for which xSj. Since there is only one set (! signifies “exists and is unique”) to which some given xS belongs, the subsets Si of the partition P are disjoint, ijSiSj=. The subsets Si are labeled by i within some index set I. The index set might be a subset of the naturals, I in which case the partition is countable, possibly finite. The partitions of the plane suggested by Figure 2 are however indexed by a real-valued label, i with I.

A technique which is often used to generate a partition of a vector space 𝒱=(V,S,+,) is to define an equivalence relation between vectors, HV×V. For some element 𝒖V, the equivalence class of 𝒖 is defined as all vectors 𝒗 that are equivalent to 𝒖, {𝒗|.(𝒖,𝒗)H}. The set of equivalence classes of is called the quotient set and denoted as V/H, and the quotient set is a partition of V. Figure 2 depicts four different partitions of the plane. These can be interpreted geometrically, such as parallel lines or distance from the origin. With wider implications for linear algebra, the partitions can also be given in terms of classification criteria specified by functions.

Figure 2. Equivalence classes within the plane

2.2.Norms

The partition of 2 by circles from Figure 2 is familiar; the equivalence classes are sets of points whose position vector has the same size, {𝒙=[ x1 x2 ]T|(x12+x22)1/2=r.}, or is at the same distance from the origin. Note that familiarity with Euclidean geometry should not obscure the fact that some other concept of distance might be induced by the data. A simple example is statement of walking distance in terms of city blocks, in which the distance from a starting point to an address x1=3 blocks east and x2=4 blocks north is x1+x2=7 city blocks, not the Euclidean distance (x12+x22)1/2=5 since one cannot walk through the buildings occupying a city block.

The above observations lead to the mathematical concept of a norm as a tool to evaluate vector magnitude. Recall that a vector space is specified by two sets and two operations, 𝒱=(V,S,+,), and the behavior of a norm with respect to each of these components must be defined. The desired behavior includes the following properties and formal definition.

Unique value

The magnitude of a vector 𝒗V should be a unique scalar, requiring the definition of a function. The scalar could have irrational values and should allow ordering of vectors by size, so the function should be from V to , f:V. On the real line the point at coordinate x is at distance |x| from the origin, and to mimic this usage the norm of 𝒗V is denoted as ||𝒗||, leading to the definition of a function ||||:V+, +={a|a,a0.}.

Null vector case

Provision must be made for the only distinguished element of V, the null vector 𝟎. It is natural to associate the null vector with the null scalar element, ||𝟎||=0. A crucial additional property is also imposed namely that the null vector is the only vector whose norm is zero, ||𝒗||=0𝒗=𝟎. From knowledge of a single scalar value, an entire vector can be determined. This property arises at key junctures in linear algebra, notably in providing a link to another branch of mathematics known as analysis, and is needed to establish the fundamental theorem of linear algbera or the singular value decomposition encountered later.

Scaling

Transfer of the scaling operation 𝒗=a𝒖 property leads to imposing ||𝒗||=|a|||𝒖||. This property ensures commensurability of vectors, meaning that the magnitude of vector 𝒗 can be expressed as a multiple of some standard vector magnitude ||𝒖||.

Vector addition

Position vectors from the origin to coordinates x,y>0 on the real line can be added and |x+y|=|x|+|y|. If however the position vectors point in different directions, x>0, y<0, then |x+y|<|x|+|y|. For a general vector space the analogous property is known as the triangle inequality, ||𝒖+𝒗||||𝒖||+||𝒗|| for 𝒖,𝒗V.

Definition. (Norm) . A norm on the vector space 𝒱=(V,S,+,) is a function ||||:V+ that for 𝒖,𝒗V, aS satisfies:

  1. ||𝒗||=0𝒗=𝟎;

  2. ||a𝒖||=|a|||𝒖||;

  3. ||𝒖+𝒗||||𝒖||+||𝒗||.

Note that the norm is a functional, but the triangle inequality implies that it is not generally a linear functional. Returning to Figure 2, consider the functions fi:2+ defined for 𝒙=[ x1 x2 ]T through values

f1(𝒙)=|x1|,f2(𝒙)=|x2|,f3(𝒙)=|x1|+|x2|,f4(𝒙)=(|x1|2+|x2|2)1/2.

Sets of constant value of the above functions are also equivalence classes induced by the equivalence relations Ei for i=1,2,3,4.

  1. f1(𝒙)=c|x1|=c, E1={(𝒙,𝒚)|f1(𝒙)=f1(𝒚)|x1|=|y1|.}2×2;

  2. f2(𝒙)=c|x2|=c, E2={(𝒙,𝒚)|f2(𝒙)=f2(𝒚)|x2|=|y2|.}2×2;

  3. f3(𝒙)=c|x1|+|x2|=c, E3={(𝒙,𝒚)|f3(𝒙)=f3(𝒚)|x1|+|x2|=|y1|+|y2|.}2×2;

  4. f4(𝒙)=c(|x1|2+|x2|2)1/2=c, E4={(𝒙,𝒚)|f4(𝒙)=f4(𝒚)(|x1|2+|x2|2)1/2=(|y1|2+|y2|2)1/2.}2×2.

These equivalence classes correspond to the vertical lines, horizontal lines, squares, and circles of Figure 2. Not all of the functions fi are norms since f1(𝒙) is zero for the non-null vector 𝒙=[ 0 1 ]T, and f2(𝒙)is zero for the non-null vector 𝒙=[ 1 0 ]T. The functions f3 and f4 are indeed norms, and specific cases of the following general norm.

Definition. (p-Norm in m) . The p-norm on the real vector space m=(m,,+,) for p1 is the function ||||p:V+ with values ||𝒙||p=(|x1|p+|x2|p++|xm|p)1/p, or

||𝒙||p=(i=1m|xi|p)1/pfor𝒙m. (4)

Denote by xi the largest component in absolute value of 𝒙m. As p increases, |xi|p becomes dominant with respect to all other terms in the sum suggesting the definition of an inf-norm by

||𝒙||=max1im|xi|.

This also works for vectors with equal components, since the fact that the number of components is finite while p can be used as exemplified for 𝒙=[ a a a ]T, by ||𝒙||p=(m|a|p)1/p=m1/p|a|, with m1/p1.

Note that the Euclidean norm corresponds to p=2, and is often called the 2-norm. The analogy between vectors and functions can be exploited to also define a p-norm for 𝒞0[a,b]=(C([a,b]),,+,) , the vector space of continuous functions defined on [a,b].

Definition. (p-Norm in 𝒞0[a,b]) . The p-norm on the vector space of continuous functions 𝒞0[a,b] for p1 is the function ||||p:V+ with values

||f||p=(ab|f(x)|pdx)1/p,forfC[a,b]. (5)

The integration operation ab can be intuitively interpreted as the value of the sum i=1m from equation (4) for very large m and very closely spaced evaluation points of the function f(xi), for instance |xi+1-xi|=(b-a)/m. An inf-norm can also be define for continuous functions by

||f||=supx[a,b]|f(x)|,

where sup, the supremum operation can be intuitively understood as the generalization of the max operation over the countable set {1,2,,m} to the uncountable set [a,b].

Figure 3. Regions within 2 for which ||𝒙||p1, for p=1,2,3,.

Vector norms arise very often in applications since they can be used to classify data, and are implemented in most software systems as a norm(x,p) to evaluate the p-norm of a vector 𝒙, with p=2 as the default.

x=[1 1 1]; [norm(x) sqrt(3.0)]

[ 1.7320508075688772 1.7320508075688772 ] (6)

m=9; x=ones(m); [norm(x) sqrt(m)]

[ 3.0 3.0 ] (7)

m=4; x=ones(m); [norm(x,1) m]

[ 4.0 4.0 ] (8)

[norm(x,1) norm(x,2) norm(x,4) norm(x,8) norm(x,16) norm(x,Inf)]

[ 4.0 2.0 1.414213562373095 1.189207115002721 1.0905077326652577 1.0 ] (9)

2.3.Inner product

Norms are functionals that define what is meant by the size of a vector, but are not linear. Even in the simplest case of the real line, the linearity relation |x+y|=|x|+|y| is not verified for x>0, y<0. Nor do norms characterize the familiar geometric concept of orientation of a vector. A particularly important orientation from Euclidean geometry is orthogonality between two vectors. Another function is required, but before a formal definition some intuitive understanding is sought by considering vectors and functionals in the plane, as depicted in Figure 4. Consider a position vector 𝒙=[ x1 x2 ]T2 and the previously-encountered linear functionals

f1,f2:2,f1(𝒙)=x1,f2(𝒙)=x2.

The x1 component of the vector 𝒙 can be thought of as the number of level sets of f1 times it crosses; similarly for the x2 component. A convenient labeling of level sets is by their normal vectors. The level sets of f1 have normal 𝒆1T=[ 1 0 ], and those of f2 have normal vector 𝒆2T=[ 0 1 ]. Both of these can be thought of as matrices with two columns, each containing a single component. The products of these matrices with the vector 𝒙 gives the value of the functionals f1,f2

𝒆1T𝒙=[ 1 0 ][ x1 x2 ]=1x1+0x2=x1=f1(𝒙),
𝒆2T𝒙=[ 0 1 ][ x1 x2 ]=0x1+1x2=x1=f2(𝒙).

Figure 4. Euclidean space E2 and its dual E2.

In general, any linear functional f defined on the real space m can be labeled by a vector

𝒂T=[ a1 a2 am ],

and evaluated through the matrix-vector product f(𝒙)=𝒂T𝒙. This suggests the definition of another function s:m×m,

s(𝒂,𝒙)=𝒂T𝒙.

The function s is called an inner product, has two vector arguments from which a matrix-vector product is formed and returns a scalar value, hence is also called a scalar product. The definition from an Euclidean space can be extended to general vector spaces. For now, consider the field of scalars to be the reals S=.

Definition. (Inner Product) . An inner product in the vector space 𝒱=(V,,+,) is a function s:V×V with properties

Symmetry

For any 𝒂,𝒙V, s(𝒂,𝒙)=s(𝒙,𝒂).

Linearity in second argument

For any 𝒂,𝒙,𝒚V, α,β, s(𝒂,α𝒙+β𝒚)=αs(𝒂,𝒙)+βs(𝒂,𝒚).

Positive definiteness

For any 𝒙V\{𝟎}, s(𝒙,𝒙)>0.

The inner product s(𝒂,𝒙) returns the number of level sets of the functional labeled by 𝒂 crossed by the vector 𝒙, and this interpretation underlies many applications in the sciences as in the gravitational field example above. Inner products also provide a procedure to evaluate geometrical quantities and relationships.

Vector norm

In m the number of level sets of the functional labeled by 𝒙 crossed by 𝒙 itself is identical to the square of the 2-norm

s(𝒙,𝒙)=𝒙T𝒙=||𝒙||22.

In general, the square root of s(𝒙,𝒙) satisfies the properties of a norm, and is called the norm induced by an inner product

||𝒙||=s(𝒙,𝒙)1/2.

A real space together with the scalar product s(𝒙,𝒚)=𝒙T𝒚 and induced norm ||𝒙||=s(𝒙,𝒙)1/2 defines an Euclidean vector space m.

Orientation

In 2 the point specified by polar coordinates (r,θ) has the Cartesian coordinates x1=rcosθ, x2=rsinθ, and position vector 𝒙=[ x1 x2 ]T. The inner product

𝒆1T𝒙=[ 1 0 ][ x1 x2 ]=1x1+0x2=rcosθ,

is seen to contain information on the relative orientation of 𝒙 with respect to 𝒆1. In general, the angle θ between two vectors 𝒙,𝒚 with any vector space with a scalar product can be defined by

cosθ=s(𝒙,𝒚)[s(𝒙,𝒙)s(𝒚,𝒚)]1/2=s(𝒙,𝒚)||𝒙||||𝒚||,

which becomes

cosθ=𝒙T𝒚||𝒙||||𝒚||,

in a Euclidean space, 𝒙,𝒚m.

Orthogonality

In 2 two vectors are orthogonal if the angle between them is such that cosθ=0, and this can be extended to an arbitrary vector space 𝒱=(V,,+,) with a scalar product by stating that 𝒙,𝒚V are orthogonal if s(𝒙,𝒚)=0. In m vectors 𝒙,𝒚m are orthogonal if 𝒙T𝒚=0.

3.Linear mapping composition

3.1.Matrix-matrix product

From two functions f:AB and g:BC, a composite function, h=gf, h:AC is defined by

h(x)=g(f(x)).

Consider linear mappings between Euclidean spaces 𝒇:nm, 𝒈:mp. Recall that linear mappings between Euclidean spaces are expressed as matrix vector multiplication

𝒇(𝒙)=𝑨𝒙,𝒈(𝒚)=𝑩𝒚,𝑨m×n,𝑩p×m.

The composite function 𝒉=𝒈𝒇 is 𝒉:np, defined by

𝒉(𝒙)=𝒈(𝒇(𝒙))=𝒈(𝑨𝒙)=𝑩𝑨𝒙.

Note that the intemediate vector 𝒖=𝑨𝒙 is subsequently multiplied by the matrix 𝑩. The composite function 𝒉 is itself a linear mapping

𝒉(a𝒙+b𝒚)=𝑩𝑨(a𝒙+b𝒚)=𝑩(a𝑨𝒙+b𝑨𝒚)=𝑩(a𝒖+b𝒗)=a𝑩𝒖+b𝑩𝒗=a𝑩𝑨𝒙+b𝑩𝑨𝒚=a𝒉(𝒙)+b𝒉(𝒚),

so it also can be expressed a matrix-vector multiplication

𝒉(𝒙)=𝑪𝒙=𝑩𝑨𝒙. (10)

Using the above, 𝑪 is defined as the product of matrix 𝑩 with matrix 𝑨

𝑪=𝑩𝑨.

The columns of 𝑪 can be determined from those of 𝑨 by considering the action of 𝒉 on the the column vectors of the identity matrix 𝑰=[ 𝒆1 𝒆2 𝒆n ]n×n. First, note that

𝑨𝒆j=[ 𝒂1 𝒂2 𝒂n ][ 1 0 0 ]=𝒂1,,𝑨𝒆j=[ 𝒂1 𝒂2 𝒂n ][ 0 1 0 ]=𝒂j,𝑨𝒆n=[ 𝒂1 𝒂2 𝒂n ][ 0 0 1 ]=𝒂n. (11)

The above can be repeated for the matrix 𝑪=[ 𝒄1 𝒄2 𝒄n ] giving

𝒉(𝒆1)=𝑪𝒆1=𝒄1,,𝒉(𝒆j)=𝑪𝒆j=𝒄j,,𝒉(𝒆n)=𝑪𝒆n=𝒄n. (12)

Combining the above equations leads to 𝒄j=𝑩𝒂j, or

𝑪=[ 𝒄1 𝒄2 𝒄n ]=𝑩[ 𝒂1 𝒂2 𝒂n ].

From the above the matrix-matrix product 𝑪=𝑩𝑨 is seen to simply be a grouping of all the products of 𝑩 with the column vectors of 𝑨,

𝑪=[ 𝒄1 𝒄2 𝒄n ]=[𝑩 𝒂1 𝑩𝒂2 𝑩𝒂n ].

a1=[1; 2]; a2=[3; 4]; A=[a1 a2]

[ 1 3 2 4 ] (13)

b1=[-1; 1; 3]; b2=[2; -2; 3]; B=[b1 b2]

[ -1 2 1 -2 3 3 ] (14)

C=B*A

[ 3 5 -3 -5 9 21 ] (15)

c1=B*a1; c2=B*a2; [c1 c2]

[ 3 5 -3 -5 9 21 ] (16)

Summary.