MATH347DS

Synopsis. Vectors have been introduced to represent complicated objects, whose description requires

m

numbers, and the procedure of linear combination allows construction of new vectors. Alternative insights into some object might be obtained by transformation of vectors. Of all possible transformations, those that are compatible with linear combinations are of special interest. It turns out that matrices are not only important in organizing collections of vectors, but also to represent such transformations, referred to as linear mappings.

1.Functions

1.1.Relations

The previous chapter focused on mathematical expression of the concept of quantification, the act of associating human observation with measurements, as a first step of scientific inquiry. Consideration of different types of quantities led to various types of numbers, vectors as groupings of numbers, and matrices as groupings of vectors. Symbols were introduced for these quantities along with some intial rules for manipulating such objects, laying the foundation for an algebra of vectors and matrices. Science seeks to not only observe, but to also explain, which now leads to additional operations for working with vectors and matrices that will define the framework of linear algebra.

Explanations within scientific inquiry are formulated as hypotheses, from which predictions are derived and tested. A widely applied mathematical transcription of this process is to organize hypotheses and predictions as two sets $X$ and $Y$ , and then construct another set $R$ of all of the instances in which an element of $X$ is associated with an element in $Y$ . The set of all possible instances of $x \in X$ and $y \in Y$ , is the Cartesian product of $X$ with $Y$ , denoted as $X \times Y = {(x, y) | . x \in X, y \in Y}$ , a construct already encountered in the definition of the real 2-space $ℛ_{2} = (ℝ^{2}, ℝ, +, \cdot)$ where $ℝ^{2} = ℝ \times ℝ$ . Typically, not all possible tuples $(x, y) \in X \times Y$ are relevant leading to the following definition.

Definition. (Relation) . A relation $R$ between two sets $X, Y$ is a subset of the Cartesian product $X \times Y$ , $R \subseteq X \times Y$ .

The key concept is that of associating an input $x \in X$ with an output $y \in Y$ . Inverting the approach and associating an output to an input is also useful, leading to the definition of an inverse relation as $R^{- 1} \subseteq Y \times X$ , $R^{- 1} = {(y, x) | (x, y) \in R .}$ . Note that an inverse exists for any relation, and the inverse of an inverse is the original relation, ${(R^{- 1})}^{- 1} = R$ . From the above, a relation is a triplet (a tuple with three elements), $(X, Y, R)$ , that will often be referred to by just its last member $R$ .

Homogeneous relations.

Many types of relations are defined in mathematics and encountered in linear algebra, and establishing properties of specific relations is an important task within data science. A commonly encountered type of relationship is from a set onto itself, known as a homogeneous relation. For homogeneous relations

H \subseteq A \times A

, it is common to replace the set membership notation

(a, b) \in H

to state that

a \in A

is in relationship

H

with

b \in A

, with a binary operator notation

a \overset{H}{?} b

. Familiar examples include the equality and less than relationships between reals,

E, L \subseteq ℝ \times ℝ

, in which

(a, b) \in E

is replaced by

a = b

, and

(a, b) \in L

is replaced by

a < b

. The equality relationship is its own inverse, and the inverse of the less than relationship is the greater than relation

G \subseteq ℝ \times ℝ

G = L^{- 1}

a < b \Rightarrow b > a

1.2.Functions

Functions between sets $X$ and $Y$ are a specific type of relationship that often arise in science. For a given input $x \in X$ , theories that predict a single possible output $y \in Y$ are of particular scientific interest.

Definition. (Function) . A function from set $X$ to set $Y$ is a relation $F \subseteq X \times Y$ , that associates to $x \in X$ a single $y \in Y$ .

The above intuitive definition can be transcribed in precise mathematical terms as $F \subseteq X \times Y$ is a function if $(x, y) \in F$ and $(x, z) \in F$ implies $y = z$ . Since it's a particular kind of relation, a function is a triplet of sets $(X, Y, F)$ , but with a special, common notation to denote the triplet by $f : X \to Y$ , with $F = {(x, f (x)) | x \in X, f (x) \in Y .}$ and the property that $(x, y) \in F \Rightarrow y = f (x)$ . The set $X$ is the domain and the set $Y$ is the codomain of the function $f$ . The value from the domain $x \in X$ is the argument of the function associated with the function value $y = f (x)$ . The function value $y$ is said to be returned by evaluation $y = f (x)$ .

Whereas all relations can be inverted, and inversion of a function defines a new relation, but which might not itself be a function. For example the relation $S^{- 1} = {(α, a), (β, a), (γ, a)}$ is a function, but its inverse ${(S^{- 1})}^{- 1} = S$ is not.

Familiar functions include:

the trigonometric functions $\cos : ℝ \to [- 1, 1]$ , $\sin : ℝ \to [- 1, 1]$ that for argument $θ \in ℝ$ return the function values $\cos (θ), \sin (θ)$ giving the Cartesian coordinates $(x, y) \in ℝ^{2}$ of a point on the unit circle at angular extent $θ$ from the $x$ -axis;
the exponential and logarithm functions $\exp : ℝ \to ℝ$ , $\log : (0, \infty) \to ℝ$ , as well as power and logarithm functions in some other base $a$ ;
polynomial functions $p_{n} : ℝ \to ℝ$ , defined by a succession of additions and multiplications
$p_{n} (x) = a_{n} x^{n} + a_{n - 1} x^{n - 1} + \dots + a_{1} x + a_{0} = \sum_{i = 0}^{n} a_{i} x^{i} = ((a_{n} x + a_{n - 1}) x + \dots + a_{1}) x + a_{0} .$

Simple functions such as sin, cos, exp, log, are predefined in Julia, and can be applied to each component of a vector argument by broadcasting, denoted by a period in front of the paranteses enclosing the argument.

∴	θ=π; [sin(θ) cos(θ) exp(θ) log(θ)]

$[\begin{array}{cccc} 0.0 & - 1.0 & 23.140692632779267 & 1.1447298858494002 \end{array}]$ (1)

∴	θ=0:π/6:π; short(x)=round(x,digits=6); short.(sin.(θ))'

$[\begin{array}{ccccccc} 0.0 & 0.5 & 0.866025 & 1.0 & 0.866025 & 0.5 & 0.0 \end{array}]$ (2)

∴	short.(log2.(1:8))'

$[\begin{array}{cccccccc} 0.0 & 1.0 & 1.584963 & 2.0 & 2.321928 & 2.584963 & 2.807355 & 3.0 \end{array}]$ (3)

∴

A construct that will be often used is to interpret a vector within $E_{m}$ as a function, since $𝒗 \in ℝ^{m}$ with components $𝒗 = {[\begin{array}{llll} v_{1} & v_{2} & \dots & v_{m} \end{array}]}^{T}$ also defines a function $v : {1, 2, \dots, m} \to ℝ$ , with values $v (i) = v_{i}$ . As the number of components grows the function $v$ can provide better approximations of some continuous function $f \in 𝒞^{0} (ℝ)$ through the function values $v_{i} = v (i) = f (x_{i})$ at distinct sample points $x_{1}, x_{2}, \dots, x_{m}$ .

The above function examples are all defined on a domain of scalars or naturals and returned scalar values. Within linear algebra the particular interest is on functions defined on sets of vectors from some vector space $𝒱 = (V, S, +, \cdot)$ that return either scalars $f : V \to S$ , or vectors from some other vector space $𝒲 = (W, S, +, \cdot)$ , $𝒈 : V \to W$ . The codomain of a vector-valued function might be the same set of vectors as its domain, $𝒉 : V \to V$ . The fundamental operation within linear algebra is the linear combination $a 𝒖 + b 𝒗$ with $a, b \in S$ , $𝒖, 𝒗 \in V$ . A key aspect is to characterize how a function behaves when given a linear combination as its argument, for instance $f (a 𝒖 + b 𝒗)$ or $𝒈 (a 𝒖 + b 𝒗) .$

1.3.Linear functionals

Consider first the case of a function defined on a set of vectors that returns a scalar value. These can be interpreted as labels attached to a vector, and are very often encountered in applications from natural phenomena or data analysis.

Definition. (Functional) . A functional on vector space $𝒱 = (V, S, +, \cdot)$ is a function from the set of vectors $V$ to the set of scalars $S$ of the vector space $𝒱$ .

Definition. (Linear Functional) . The functional $f : V \to S$ on vector space $𝒱 = (V, S, +, \cdot)$ is a linear functional if for any two vectors $𝒖, 𝒗 \in V$ and any two scalars $a, b$

$f (a 𝒖 + b 𝒗) = a f (𝒖) + b f (𝒗) .$ (4)

1.4.Linear mappings

Consider now functions $𝒇 : V \to W$ from vector space $𝒱 = (V, S, +, \cdot)$ to another vector space $𝒲 = (W, T, +, \cdot)$ . As before, the action of such functions on linear combinations is of special interest.

Definition. (Linear Mapping) . A function $𝒇 : V \to W$ , from vector space $𝒱 = (V, S, +, \cdot)$ to vector space $𝒲 = (W, S, +, \cdot)$ is called a linear mapping if for any two vectors $𝒖, 𝒗 \in V$ and any two scalars $a, b \in S$

$𝒇 (a 𝒖 + b 𝒗) = a 𝒇 (𝒖) + b 𝒇 (𝒗) .$ (5)

The image of a linear combination $a 𝒖 + b 𝒗$ through a linear mapping is another linear combination $a 𝒇 (𝒖) + b 𝒇 (𝒗)$ , and linear mappings are said to preserve the structure of a vector space.

Note that $f : ℝ \to ℝ$ defined as $y = f (x) = a x + b$ represents a line in the $(x, y)$ -plane, but is not a linear mapping for $b \neq 0$ since

f (x + z) = a (x + z) + b \neq a x + b + a z + b = f (x) + f (z) .

Matrix-vector multiplication has been introduced as a concise way to specify a linear combination

𝒇 (𝒙) = 𝑨 𝒙 = x_{1} 𝒂_{1} + \dots + x_{n} 𝒂_{n},

with $𝒂_{1}, \dots, 𝒂_{n}$ the columns of the matrix, $𝑨 = [\begin{array}{llll} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}]$ . This is a linear mapping between the real spaces $ℛ_{m}$ , $ℛ_{n}$ , $𝒇 : ℝ^{m} \to ℝ^{n}$ , and indeed any linear mapping between real spaces can be given as a matrix-vector product. Consider some $𝒙 \in ℝ^{m}$

𝒙 = [\begin{array}{l} x_{1} \\ x_{2} \\ ⋮ \\ x_{m} \end{array}] = x_{1} [\begin{array}{l} 1 \\ 0 \\ ⋮ \\ 0 \end{array}] + x_{2} [\begin{array}{l} 0 \\ 1 \\ ⋮ \\ 0 \end{array}] + \dots + x_{m} [\begin{array}{l} 0 \\ 0 \\ ⋮ \\ 1 \end{array}] = x_{1} 𝒆_{1} + x_{2} 𝒆_{2} + \dots + x_{m} 𝒆_{m} .

Applying the linear mapping $𝒇$ to $𝒙$ leads to

𝒇 (𝒙) = 𝒇 (x_{1} 𝒆_{1} + x_{2} 𝒆_{2} + \dots + x_{m} 𝒆_{m}) = x_{1} 𝒇 (𝒆_{1}) + x_{2} 𝒇 (𝒆_{2}) + \dots + x_{m} 𝒇 (𝒆_{m}) .

The matrix $𝑨$ with columns $𝒂_{1} = 𝒇 (𝒆_{1}), \dots, 𝒂_{m} = 𝒇 (𝒆_{m})$ now allows finding

𝒇 (𝒙) = 𝑨 𝒙,

through a matrix-vector multiplication for any input vector $𝒙$ . The matrix $𝑨$ thus defined is a representation of the linear mapping $𝒇$ . As will be shown later, it is not the only possible representation.

2.Measurements

Vectors within the real space $ℛ_{m}$ can be completely specified by $m$ real numbers, and $m$ is large in many realistic applications. The task of describing the elements of a vector space $𝒱 = (V, S, +, \cdot)$ by simpler means arises. Within data science this leads to classification problems in accordance with some relevant criteria, and one of the simplest classifications is to attach a scalar label to a vector. Commonly encountered labels include the magnitude of a vector or its orientation with respect to another vector.

2.1.Norms

The above observations lead to the mathematical concept of a norm as a tool to evaluate vector magnitude. Recall that a vector space is specified by two sets and two operations, $𝒱 = (V, S, +, \cdot)$ , and the behavior of a norm with respect to each of these components must be defined. The desired behavior includes the following properties and formal definition.

Unique value: The magnitude of a vector $𝒗 \in V$ should be a unique scalar, requiring the definition of a function. The scalar could have irrational values and should allow ordering of vectors by size, so the function should be from $V$ to $ℝ$ , $f : V \to ℝ$ . On the real line the point at coordinate $x$ is at distance $| x |$ from the origin, and to mimic this usage the norm of $𝒗 \in V$ is denoted as $|| 𝒗 ||$ , leading to the definition of a function $|| || : V \to ℝ_{+}$ , $ℝ_{+} = {a | a \in ℝ, a ⩾ 0 .}$ .
Null vector case: Provision must be made for the only distinguished element of $V$ , the null vector $𝟎$ . It is natural to associate the null vector with the null scalar element, $|| 𝟎 || = 0$ . A crucial additional property is also imposed namely that the null vector is the only vector whose norm is zero, $|| 𝒗 || = 0 \Rightarrow 𝒗 = 𝟎$ . From knowledge of a single scalar value, an entire vector can be determined. This property arises at key junctures in linear algebra, notably in providing a link to mathematical analysis, and is needed to establish the fundamental theorem of linear algbera or the singular value decomposition encountered later.
Scaling: Transfer of the scaling operation $𝒗 = a 𝒖$ property leads to imposing $|| 𝒗 || = | a | || 𝒖 ||$ . This property ensures commensurability of vectors, meaning that the magnitude of vector $𝒗$ can be expressed as a multiple of some standard vector magnitude $|| 𝒖 ||$ .
Vector addition: Position vectors from the origin to coordinates $x, y > 0$ on the real line can be added and $| x + y | = | x | + | y |$ . If however the position vectors point in different directions, $x > 0$ , $y < 0$ , then $| x + y | < | x | + | y |$ . For a general vector space the analogous property is known as the triangle inequality, $|| 𝒖 + 𝒗 || ⩽ || 𝒖 || + || 𝒗 ||$ for $𝒖, 𝒗 \in V$ .

Definition. (Norm) . A norm on the vector space $𝒱 = (V, S, +, \cdot)$ is a function $|| || : V \to ℝ_{+}$ that for $𝒖, 𝒗 \in V$ , $a \in S$ satisfies:

$|| 𝒗 || = 0 \Rightarrow 𝒗 = 𝟎$ ;

$|| a 𝒖 || = | a | || 𝒖 ||$ ;

$|| 𝒖 + 𝒗 || ⩽ || 𝒖 || + || 𝒗 ||$ .

A commonly encountered norm of $𝒗 \in ℝ^{m}$ is the Euclidean norm

|| 𝒗 || = \sqrt{v_{1}^{2} + \dots + v_{m}^{2}} = {(v_{1}^{2} + \dots + v_{m}^{2})}^{1 / 2} = {(\sum_{j = 1}^{m} v_{j}^{2})}^{1 / 2},

useful in many physics applications. The form of the above norm, square root of sum of squares of components, can be generalized to obtain other useful norms.

Definition. ( $p$ -Norm in $ℛ_{m}$ ) . The $p$ -norm on the real vector space $ℛ_{m} = (ℝ^{m}, ℝ, +, \cdot)$ for $p ⩾ 1$ is the function ${|| ||}_{p} : V \to ℝ_{+}$ with values ${|| 𝒙 ||}_{p} = {({| x_{1} |}^{p} + {| x_{2} |}^{p} + \dots + {| x_{m} |}^{p})}^{1 / p}$ , or

${|| 𝒙 ||}_{p} = {(\sum_{i = 1}^{m} {| x_{i} |}^{p})}^{1 / p} for 𝒙 \in ℝ^{m} .$ (6)

Note that the Euclidean norm corresponds to $p = 2$ , and is often called the $2$ -norm. Denote by $x_{i}$ the largest component in absolute value of $𝒙 \in ℝ^{m}$ . As $p$ increases, ${| x_{i} |}^{p}$ becomes dominant with respect to all other terms in the sum suggesting the definition of an inf-norm by

{|| 𝒙 ||}_{\infty} = {max}_{1 ⩽ i ⩽ m} | x_{i} | .

This also works for vectors with equal components, since the fact that the number of components is finite while $p \to \infty$ can be used as exemplified for $𝒙 = {[\begin{array}{cccc} a & a & \dots & a \end{array}]}^{T}$ , by ${|| 𝒙 ||}_{p} = {(m {| a |}^{p})}^{1 / p} = m^{1 / p} | a |$ , with $m^{1 / p} \to 1$ .

Figure 1. Regions within $ℝ^{2}$ for which ${|| 𝒙 ||}_{p} ⩽ 1$ , for $p = 1, 2, 3, \infty$ .

Vector norms arise very often in applications, especially in data science since they can be used to classify data, and are implemented in software systems such as Julia in which the norm function with a single argument computes the most commonly encountered norm, the $2$ -norm. If a second argument $p$ is specified the $p$ -norm is computed.

∴	x=[1; 1; 1]; [norm(x) sqrt(3)]

$[\begin{array}{cc} 1.7320508075688772 & 1.7320508075688772 \end{array}]$ (7)

∴	m=9; x=ones(m,1); [norm(x) sqrt(m)]

$[\begin{array}{cc} 3.0 & 3.0 \end{array}]$ (8)

∴	m=4; x=ones(m,1); [norm(x,1) m]

$[\begin{array}{cc} 4.0 & 4.0 \end{array}]$ (9)

∴

2.2.Inner product

Norms are functionals that define what is meant by the size of a vector, but are not linear. Even in the simplest case of the real line, the linearity relation $| x + y | = | x | + | y |$ is not verified for $x > 0$ , $y < 0$ . Nor do norms characterize the familiar geometric concept of orientation of a vector. A particularly important orientation from Euclidean geometry is orthogonality between two vectors. Another function is required, one that would take two vector arguments to enable characterizing their relative orientation. It would return a scalar, hence $s : V \times V \to S$ , with $S$ often chosen as the set of real numbers.

Definition. (Inner Product) . A real inner product in the vector space $𝒱 = (V, ℝ, +, \cdot)$ is a function $s : V \times V \to ℝ$ with properties

Symmetry

For any $𝒂, 𝒙 \in V$ , $s (𝒂, 𝒙) = s (𝒙, 𝒂)$ .

Linearity in second argument

For any $𝒂, 𝒙, 𝒚 \in V$ , $α, β \in ℝ$ , $s (𝒂, α 𝒙 + β 𝒚) = α s (𝒂, 𝒙) + β s (𝒂, 𝒚)$ .

Positive definiteness

For any $𝒙 \in V \ {𝟎}$ , $s (𝒙, 𝒙) > 0$ .

A commonly encountered inner product is the dot product of two vectors $𝒂, 𝒙 \in ℝ^{m}$

𝒂 \cdot 𝒙 = a_{1} x_{1} + \dots + a_{m} x_{m} .

Using the convention of representing $𝒂, 𝒙$ as column vectors, the dot product is also expressed as

𝒂^{T} 𝒙 = [\begin{array}{llll} a_{1} & a_{2} & \dots & a_{m} \end{array}] [\begin{array}{l} x_{1} \\ x_{2} \\ ⋮ \\ x_{m} \end{array}],

and is therefore a matrix multiplication between $𝒂^{T} \in ℝ^{1 \times m}$ and $𝒙 \in ℝ^{m \times 1}$ resulting in a scalar, also referred to as a scalar product. Inner products also provide a procedure to evaluate geometrical quantities and relationships.

Vector norm

The square of the 2-norm of $𝒙 \in ℝ^{m}$ is given as

s (𝒙, 𝒙) = 𝒙^{T} 𝒙 = {|| 𝒙 ||}_{2}^{2} .

In general, the square root of $s (𝒙, 𝒙)$ satisfies the properties of a norm, and is called the norm induced by an inner product

|| 𝒙 || = s {(𝒙, 𝒙)}^{1 / 2} .

A real space together with the scalar product $s (𝒙, 𝒚) = 𝒙^{T} 𝒚$ and induced norm $|| 𝒙 || = s {(𝒙, 𝒙)}^{1 / 2}$ defines an Euclidean vector space $ℰ_{m}$ .

Orientation

In $ℰ_{2}$ the point specified by polar coordinates $(r, θ)$ has the Cartesian coordinates $x_{1} = r \cos θ$ , $x_{2} = r \sin θ$ , and position vector $𝒙 = {[\begin{array}{cc} x_{1} & x_{2} \end{array}]}^{T}$ . The inner product

𝒆_{1}^{T} 𝒙 = [\begin{array}{cc} 1 & 0 \end{array}] [\begin{array}{c} x_{1} \\ x_{2} \end{array}] = 1 \cdot x_{1} + 0 \cdot x_{2} = r \cos θ,

is seen to contain information on the relative orientation of $𝒙$ with respect to $𝒆_{1}$ . In general, the angle $θ$ between two vectors $𝒙, 𝒚$ with any vector space with a scalar product can be defined by

\cos θ = \frac{s (𝒙, 𝒚)}{{[s (𝒙, 𝒙) s (𝒚, 𝒚)]}^{1 / 2}} = \frac{s (𝒙, 𝒚)}{|| 𝒙 || || 𝒚 ||},

which becomes

\cos θ = \frac{𝒙^{T} 𝒚}{|| 𝒙 || || 𝒚 ||},

in a Euclidean space, $𝒙, 𝒚 \in ℝ^{m}$ .

Orthogonality

In $ℰ_{2}$ two vectors are orthogonal if the angle between them is such that $\cos θ = 0$ , and this can be extended to an arbitrary vector space $𝒱 = (V, ℝ, +, \cdot)$ with a scalar product by stating that $𝒙, 𝒚 \in V$ are orthogonal if $s (𝒙, 𝒚) = 0$ . In $ℰ_{m}$ vectors $𝒙, 𝒚 \in ℝ^{m}$ are orthogonal if $𝒙^{T} 𝒚 = 0$ .

3.Linear mapping matrices

3.1.Common geometric transformations

Several geometric transformations are linear mappings and are widely used in applications.

Stretching.

Different stretch ratios along separate axis in

ℝ^{m}

is described by the linear mapping,

s : ℝ^{m} \to ℝ^{m}

s (𝒙) = [\begin{array}{l} λ_{1} x_{1} \\ λ_{2} x_{2} \\ ⋮ \\ λ_{m} x_{m} \end{array}] .

The matrix associated with stretching is

𝑺 = [\begin{array}{llll} s (𝒆_{1}) & s (𝒆_{2}) & \dots & s (𝒆_{m}) \end{array}] = [\begin{array}{llll} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{m} \end{array}] = diag (λ_{1}, λ_{2}, \dots, λ_{m}),

and has a remarkably simple form known as a diagonal matrix.

Projection.

A very important transformation is projection of a vector

𝒗 \in ℝ^{m}

along the direction of another vector

𝒖 \neq 𝟎

. It is convenient to define a vector of unit length along the direction of

𝒖

𝒒 = \frac{𝒖}{|| 𝒖 ||} .

The resulting vector is $𝒘 = p_{q} (𝒗) \in ℝ^{m}$ has the same number of components as $𝒗$ , and is of length $|| 𝒗 || \cos θ$ in the direction of $𝒒$ , stated as

𝒘 = (|| 𝒗 || \cos θ) 𝒒 .

The matrix associated with projection is

𝑷_{𝒒} = [\begin{array}{llll} p_{q} (𝒆_{1}) & p_{q} (𝒆_{2}) & \dots & p_{q} (𝒆_{m}) \end{array}] .

The $j^{th}$ unit vector $𝒆_{j}$ is at angle $θ_{j}$ with respect to $𝒒$ ,

\cos θ_{j} = 𝒒^{T} 𝒆_{j} = [\begin{array}{llll} q_{1} & q_{2} & \dots & q_{m} \end{array}] [\begin{array}{l} 0 \\ ⋮ \\ 1 \\ ⋮ \\ 0 \end{array}] = q_{j},

and the projection of $𝒆_{j}$ along $𝒒$ is

p_{q} (𝒆_{j}) = (|| 𝒆_{j} || \cos θ_{j}) 𝒒 = q_{j} 𝒒 .

Gathering projections of all the unit vectors within the identity matrix gives

𝑷_{𝒒} = [\begin{array}{llll} q_{1} 𝒒 & q_{2} 𝒒 & \dots & q_{m} 𝒒 \end{array}] .

Note that $𝑷_{𝒒}$ contains $m$ column vectors that are all scalings of the $𝒒$ vector, by coefficients $q_{1}, q_{2}, \dots, q_{m}$ that are the components of $𝒒$ itself. Since scaling is a linear combination, the above $m$ linear combinations can be expressed as a matrix-matrix product

𝑷_{𝒒} = 𝒒 [\begin{array}{llll} q_{1} & q_{2} & \dots & q_{m} \end{array}],

leading to the remarkably simple expression

𝑷_{𝒒} = 𝒒 𝒒^{T} \in ℝ^{m \times m} .

Figure 2. Projection ( $p_{𝒒}$ ) and reflection ( $r_{𝒒}$ ) operations in two dimensions

Reflection.

Another widely used geometric transformation that is also a linear mapping is the reflection of a vector

𝒗

across another vector

𝒖

. As before, introduce a unit vector in the direction of

𝒖

𝒒 = 𝒖 / || 𝒖 ||

, and let

𝒛 = 𝒓_{𝒒} (𝒗) = 𝑹_{𝒒} 𝒗

be the reflection of

𝒗

across

𝒒

. The reflection matrix can be constructed from the previous projection matrix. Start from the vector addition

𝒘 = 𝑷_{𝒒} 𝒗 = (𝒒 𝒒^{T}) 𝒗 = 𝒗 + 𝒚,

that can be interpreted as stating that the projection $𝒘$ is obtained from $𝒗$ by addition of the vector $𝒚$ . The reflection of $𝒘$ across $𝒒$ is obtained by starting at $𝒗$ , and adding $2 𝒚$ ,

𝒛 = 𝒗 + 2 (𝒘 - 𝒗) = 2 𝒘 - 𝒗 = 2 (𝒒 𝒒^{T}) 𝒗 - 𝒗 = 2 (𝒒 𝒒^{T}) 𝒗 - 𝑰 𝒗 = [2 (𝒒 𝒒^{T}) - 𝑰] 𝒗 = 𝑹_{𝒒} 𝒗 .

The reflection matrix results

𝑹_{𝒒} = 2 (𝒒 𝒒^{T}) - 𝑰 .

Rotation in $ℝ^{2}$ .

The previous geometric transformations are valid in

ℝ^{m}

for arbitrary

m

. Rotation mappings are not as readily generalizable. In two dimensions, consider

r_{θ} (𝒗) = 𝑹_{θ} 𝒗

, with

𝑹_{θ} = [\begin{array}{ll} r_{θ} (𝒆_{1}) & r_{θ} (𝒆_{2}) \end{array}] = [\begin{array}{ll} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{array}]

Figure 3. Rotation $r_{θ} (𝒗) = 𝑹_{θ} 𝒗$ in two dimensions.

Rotation in $ℝ^{3}$ .

The axis of the above two dimensinal rotation is a third direction perpendicular to the

x_{1} x_{2}

-plane. A vector

𝒗 \in ℝ^{3}

would not change its third coordinate under such a transformation

𝒓_{θ, 3} (𝒗)

hence the associated rotation matrix is readily obtained as

𝑹_{θ} = [\begin{array}{lll} r_{θ, 3} (𝒆_{1}) & r_{θ, 3} (𝒆_{2}) & r_{θ, 3} (𝒆_{3}) \end{array}] = [\begin{array}{lll} \cos θ & - \sin θ & 0 \\ \sin θ & \cos θ & 0 \\ 0 & 0 & 1 \end{array}] .

3.2.Matrix-matrix product

From two functions $f : A \to B$ and $g : B \to C$ , a composite function, $h = g \circ f$ , $h : A \to C$ is defined by

h (x) = g (f (x)) .

Consider linear mappings between Euclidean spaces $𝒇 : ℝ^{n} \to ℝ^{m}$ , $𝒈 : ℝ^{m} \to ℝ^{p}$ . Recall that linear mappings are expressed as matrix vector multiplications

𝒇 (𝒙) = 𝑨 𝒙, 𝒈 (𝒚) = 𝑩 𝒚, 𝑨 \in ℝ^{m \times n}, 𝑩 \in ℝ^{p \times m} .

The composite function $𝒉 = 𝒈 \circ 𝒇$ is $𝒉 : ℝ^{n} \to ℝ^{p}$ , defined by

𝒉 (𝒙) = 𝒈 (𝒇 (𝒙)) = 𝒈 (𝑨 𝒙) = 𝑩 𝑨 𝒙 .

Note that the intemediate vector $𝒖 = 𝑨 𝒙$ is subsequently multiplied by the matrix $𝑩$ . The composite function $𝒉$ is itself a linear mapping

𝒉 (a 𝒙 + b 𝒚) = 𝑩 𝑨 (a 𝒙 + b 𝒚) = 𝑩 (a 𝑨 𝒙 + b 𝑨 𝒚) = a 𝑩 𝑨 𝒙 + b 𝑩 𝑨 𝒚 = a 𝒉 (𝒙) + b 𝒉 (𝒚),

so it also can be expressed a matrix-vector multiplication

𝒉 (𝒙) = 𝑪 𝒙 = 𝑩 𝑨 𝒙 .

(10)

Using the above, $𝑪$ is defined as the product of matrix $𝑩$ with matrix $𝑨$

𝑪 = 𝑩 𝑨 .

The columns of $𝑪$ can be determined from those of $𝑨$ by considering the action of $𝒉$ on the the column vectors of the identity matrix $𝑰 = [\begin{array}{cccc} 𝒆_{1} & 𝒆_{2} & \dots & 𝒆_{n} \end{array}] \in ℝ^{n \times n}$ . First, note that

𝑨 𝒆_{j} = [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] [\begin{array}{c} 1 \\ 0 \\ ⋮ \\ ⋮ \\ 0 \end{array}] = 𝒂_{1}, \dots, 𝑨 𝒆_{j} = [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] [\begin{array}{c} 0 \\ ⋮ \\ 1 \\ ⋮ \\ 0 \end{array}] = 𝒂_{j}, 𝑨 𝒆_{n} = [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] [\begin{array}{c} 0 \\ ⋮ \\ ⋮ \\ 0 \\ 1 \end{array}] = 𝒂_{n} .

(11)

The above can be repeated for the matrix $𝑪 = [\begin{array}{cccc} 𝒄_{1} & 𝒄_{2} & \dots & 𝒄_{n} \end{array}]$ giving

𝒉 (𝒆_{1}) = 𝑪 𝒆_{1} = 𝒄_{1}, \dots, 𝒉 (𝒆_{j}) = 𝑪 𝒆_{j} = 𝒄_{j}, \dots, 𝒉 (𝒆_{n}) = 𝑪 𝒆_{n} = 𝒄_{n} .

(12)

Combining the above equations leads to $𝒄_{j} = 𝑩 𝒂_{j}$ , or

𝑪 = [\begin{array}{cccc} 𝒄_{1} & 𝒄_{2} & \dots & 𝒄_{n} \end{array}] = 𝑩 [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] .

From the above the matrix-matrix product $𝑪 = 𝑩 𝑨$ is seen to simply be a grouping of all the products of $𝑩$ with the column vectors of $𝑨$ ,

𝑪 = [\begin{array}{cccc} 𝒄_{1} & 𝒄_{2} & \dots & 𝒄_{n} \end{array}] = [𝑩 \begin{array}{cccc} 𝒂_{1} & 𝑩 𝒂_{2} & \dots & 𝑩 𝒂_{n} \end{array}]

The above results can readily be verified computationally.

∴	a1=[1; 2]; a2=[3; 4]; A=[a1 a2]

$[\begin{array}{cc} 1 & 3 \\ 2 & 4 \end{array}]$ (13)

∴	b1=[-1; 1; 3]; b2=[2; -2; 3]; B=[b1 b2]

$[\begin{array}{cc} - 1 & 2 \\ 1 & - 2 \\ 3 & 3 \end{array}]$ (14)

∴

C=B*A

$[\begin{array}{cc} 3 & 5 \\ - 3 & - 5 \\ 9 & 21 \end{array}]$ (15)

∴	c1=Ba1; c2=Ba2; [c1 c2]

$[\begin{array}{cc} 3 & 5 \\ - 3 & - 5 \\ 9 & 21 \end{array}]$ (16)

∴

3.3.Properties of the matrix-matrix product

Matrix-matrix products have been seen to arise from composition of linear mappings, and their properties arise from such compositions. Whereas matrix addition is direct consequence of component-by-component operations, matrix products exhibit some particularities.

3.4.Block matrix operations

the column vectors

𝒂_{1}, 𝒂_{2}, \dots, 𝒂_{n}

can be interpreted as blocks of size

m \times 1

within the matrix

𝑨

of size

m \times n

. A pervasive task within linear algebra application to large data sets is to break down the problem into smaller parts. Consider that matrix

𝑴