MATH661

1.Functions

1.1.Relations

A general procedure to relate input values from set $X$ to output values from set $Y$ is to first construct the set of all possible instances of $x \in X$ and $y \in Y$ , which is the Cartesian product of $X$ with $Y$ , denoted as $X \times Y = {(x, y) | . x \in X, y \in Y}$ . Usually only some associations of inputs to outputs are of interest leading to the following definition.

Definition. (Relation) . A relation $R$ between two sets $X, Y$ is a subset of the Cartesian product $X \times Y$ , $R \subseteq X \times Y$ .

Associating an output to an input is also useful, leading to the definition of an inverse relation as $R^{- 1} \subseteq Y \times X$ , $R^{- 1} = {(y, x) | (x, y) \in R .}$ . Note that an inverse exists for any relation, and the inverse of an inverse is the original relation, ${(R^{- 1})}^{- 1} = R$ .

Homogeneous relations.

Many types of relations are defined in mathematics and encountered in linear algebra. A commonly encountered type of relationship is from a set onto itself, known as a homogeneous relation. For homogeneous relations

H \subseteq A \times A

, it is common to replace the set membership notation

(a, b) \in H

to state that

a \in A

is in relationship

H

with

b \in A

, with a binary operator notation

a \overset{H}{?} b

. Familiar examples include the equality and less than relationships between reals,

E, L \subseteq ℝ \times ℝ

, in which

(a, b) \in E

is replaced by

a = b

, and

(a, b) \in L

is replaced by

a < b

. The equality relationship is its own inverse, and the inverse of the less than relationship is the greater than relation

G \subseteq ℝ \times ℝ

G = L^{- 1}

a < b \Rightarrow b > a

. Homogeneous relations

H \subseteq A \times A

are classified according to the following criteria.

Reflection: Relation $H$ is reflexive if $(a, a) \in H$ for any $a \in A$ . The equality relation $E \subseteq ℝ \times ℝ$ is reflexive, $\forall a \in A, a = a$ , the less than relation $L \subseteq ℝ \times ℝ$ is not, $1 \in R, 1 ≮ 1$ .
Symmetry: Relation $H$ is symmetric if $(a, b) \in H$ implies that $(b, a) \in H$ , $(a, b) \in H \Rightarrow (b, a) \in H$ . The equality relation $E \subseteq ℝ \times ℝ$ is symmetric, $a = b \Rightarrow b = a$ , the less than relation $L \subseteq ℝ \times ℝ$ is not, $a < b ⇏ b < a$ .
Anti-symmetry: Relation $H$ is anti-symmetric if $(a, b) \in H$ for $a \neq b$ , then $(b, a) \notin H$ . The less than relation $L \subseteq ℝ \times ℝ$ is antisymmetric, $a < b \Rightarrow b ≮ a$ .
Transitivity: Relation $H$ is transitive if $(a, b) \in H$ and $(b, c) \in H$ implies $(a, c) \in H$ . for any $a \in A$ . The equality relation $E \subseteq ℝ \times ℝ$ is transitive, $a = b \land b = c \Rightarrow a = c$ , as is the less than relation $L \subseteq ℝ \times ℝ$ , $a < b \land b < c \Rightarrow a < c$ .

Certain combinations of properties often arise. A homogeneous relation that is reflexive, symmetric, and transitive is said to be an equivalence relation. Equivalence relations include equality among the reals, or congruence among triangles. A homogeneous relation that is reflexive, anti-symmetric and transitive is a partial order relation, such as the less than or equal relation between reals. Finally, a homogeneous relation that is anti-symmetric and transitive is an order relation, such as the less than relation between reals.

1.2.Functions

Functions between sets $X$ and $Y$ are a specific type of relationship that often arise in science. For a given input $x \in X$ , theories that predict a single possible output $y \in Y$ are of particular scientific interest.

Definition. (Function) . A function from set $X$ to set $Y$ is a relation $F \subseteq X \times Y$ , that associates to $x \in X$ a single $y \in Y$ .

The above intuitive definition can be transcribed in precise mathematical terms as $F \subseteq X \times Y$ is a function if $(x, y) \in F$ and $(x, z) \in F$ implies $y = z$ . Since it's a particular kind of relation, a function is a triplet of sets $(X, Y, F)$ , but with a special, common notation to denote the triplet by $f : X \to Y$ , with $F = {(x, f (x)) | x \in X, f (x) \in Y .}$ and the property that $(x, y) \in F \Rightarrow y = f (x)$ . The set $X$ is the domain and the set $Y$ is the codomain of the function $f$ . The value from the domain $x \in X$ is the argument of the function associated with the function value $y = f (x)$ . The function value $y$ is said to be returned by evaluation $y = f (x)$ .

As seen previously, a Euclidean space $E_{m} = (ℝ^{m}, ℝ, +, \cdot)$ can be used to suggest properties of more complex spaces such as the vector space of continuous functions $𝒞^{0} (ℝ)$ . A construct that will be often used is to interpret a vector within $E_{m}$ as a function, since $𝒗 \in ℝ^{m}$ with components $𝒗 = {[\begin{array}{llll} v_{1} & v_{2} & \dots & v_{m} \end{array}]}^{T}$ also defines a function $v : {1, 2, \dots, m} \to ℝ$ , with values $v (i) = v_{i}$ . As the number of components grows the function $v$ can provide better approximations of some continuous function $f \in 𝒞^{0} (ℝ)$ through the function values $v_{i} = v (i) = f (x_{i})$ at distinct sample points $x_{1}, x_{2}, \dots, x_{m}$ .

The above function examples are all defined on a domain of scalars or naturals and returned scalar values. Within linear algebra the particular interest is on functions defined on sets of vectors from some vector space $𝒱 = (V, S, +, \cdot)$ that return either scalars $f : V \to S$ , or vectors from some other vector space $𝒲 = (W, S, +, \cdot)$ , $𝒈 : V \to W$ . The codomain of a vector-valued function might be the same set of vectors as its domain, $𝒉 : V \to V$ . The fundamental operation within linear algebra is the linear combination $a 𝒖 + b 𝒗$ with $a, b \in S$ , $𝒖, 𝒗 \in V$ . A key aspect is to characterize how a function behaves when given a linear combination as its argument, for instance $f (a 𝒖 + b 𝒗)$ or $𝒈 (a 𝒖 + b 𝒗) .$

1.3.Linear functionals

Consider first the case of a function defined on a set of vectors that returns a scalar value. These can be interpreted as labels attached to a vector, and are very often encountered in applications from natural phenomena or data analysis.

Definition. (Functional) . A functional on vector space $𝒱 = (V, S, +, \cdot)$ is a function from the set of vectors $V$ to the set of scalars $S$ of the vector space $𝒱$ .

Definition. (Linear Functional) . The functional $f : V \to S$ on vector space $𝒱 = (V, S, +, \cdot)$ is a linear functional if for any two vectors $𝒖, 𝒗 \in V$ and any two scalars $a, b$

$f (a 𝒖 + b 𝒗) = a f (𝒖) + b f (𝒗) .$ (1)

Many different functionals may be defined on a vector space $𝒱 = (V, S, +, \cdot)$ , and an insightful alternative description is provided by considering the set of all linear functionals, that will be denoted as $V^{*} = {f | f : V \to S .}$ . These can be organized into another vector space $𝒱^{*} = (V^{*}, S, +, \cdot)$ with vector addition of linear functionals $f, g \in V^{*}$ and scaling by $a \in S$ defined by

(f + g) (𝒖) = f (𝒖) + g (𝒖), (a f) (𝒖) = a f (𝒖), 𝒖 \in V .

(2)

Definition. (Dual Vector Space) . For some vector space $𝒱$ , the vector space of linear functionals $𝒱^{*}$ is called the dual vector space.

As is often the case, the above abstract definition can better be understood by reference to the familiar case of Euclidean space. Consider $ℛ_{2} = (ℝ^{2}, ℝ, +, \cdot)$ , the set of vectors in the plane with $𝒙 \in ℝ^{2}$ the position vector from the origin $(0, 0)$ to point $X$ in the plane with coordinates $(x_{1}, x_{2})$ . One functional from the dual space $ℛ_{2}^{*}$ is $f_{2} (𝒙) = x_{2}$ , i.e., taking the second coordinate of the position vector. The linearity property is readily verified. For $𝒙, 𝒚 \in ℛ_{2}$ , $a, b \in ℝ$ ,

f_{2} (a 𝒙 + b 𝒚) = a x_{2} + b y_{2} = a f_{2} (𝒙) + b f_{2} (𝒚) .

Given some constant value $h \in ℝ$ , the curves within the plane defined by $f_{2} (𝒙) = h$ are called the contour lines or level sets of $f_{2}$ . Several contour lines and position vectors are shown in Figure 1. The utility of functionals and dual spaces can be shown by considering a simple example from physics. Assume that $x_{2}$ is the height above ground level and a vector $𝒙$ is the displacement of a body of mass $m$ in a gravitational field. The mechanical work done to lift the body from ground level to height $h$ is $W = m g h$ with $g$ the gravitational acceleration. The mechanical work is the same for all displacements $𝒙$ that satisfy the equation $f_{2} (𝒙) = h$ . The work expressed in units $m g Δ h$ can be interpreted as the number of contour lines $f_{2} (𝒙) = n Δ h$ intersected by the displacement vector $𝒙$ . This concept of duality between vectors and scalar-valued functionals arises throughout mathematics, the physical and social sciences and in data science. The term “duality” itself comes from geometry. A point $X$ in $ℝ^{2}$ with coordinates $(x_{1}, x_{2})$ can be defined either as the end-point of the position vector $𝒙$ , or as the intersection of the contour lines of two functionals $f_{1} (𝒙) = x_{1}$ and $f_{2} (𝒙) = x_{2}$ . Either geometric description works equally well in specifying the position of $X$ , so it might seem redundant to have two such procedures. It turns out though that many quantities of interest in applications can be defined through use of both descriptions, as shown in the computation of mechanical work in a gravitational field.

Figure 1. Vectors in $E_{2}$ and contour lines of the functional $f (𝒙) = x_{2}$

1.4.Linear mappings

Consider now functions $𝒇 : V \to W$ from vector space $𝒱 = (V, S, +, \cdot)$ to another vector space $𝒲 = (W, T, +, \cdot)$ . As before, the action of such functions on linear combinations is of special interest.

Definition. (Linear Mapping) . A function $𝒇 : V \to W$ , from vector space $𝒱 = (V, S, +, \cdot)$ to vector space $𝒲 = (W, S, \oplus, ⊙)$ is called a linear mapping if for any two vectors $𝒖, 𝒗 \in V$ and any two scalars $a, b \in S$

$𝒇 (a 𝒖 + b 𝒗) = a 𝒇 (𝒖) + b 𝒇 (𝒗) .$ (3)

The image of a linear combination $a 𝒖 + b 𝒗$ through a linear mapping is another linear combination $a 𝒇 (𝒖) + b 𝒇 (𝒗)$ , and linear mappings are said to preserve the structure of a vector space, and called homomorphisms in mathematics. The codomain of a linear mapping might be the same as the domain in which case the mapping is said to be an endomorphism.

Matrix-vector multiplication has been introduced as a concise way to specify a linear combination

𝒇 (𝒙) = 𝑨 𝒙 = x_{1} 𝒂_{1} + \dots + x_{n} 𝒂_{n},

with $𝒂_{1}, \dots, 𝒂_{n}$ the columns of the matrix, $𝑨 = [\begin{array}{llll} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}]$ . This is a linear mapping between the real spaces $ℛ_{m}$ , $ℛ_{n}$ , $𝒇 : ℝ^{m} \to ℝ^{n}$ , and indeed any linear mapping between real spaces can be given as a matrix-vector product.

2.Measurements

Vectors within the real space $ℛ_{m}$ can be completely specified by $m$ real numbers, even though $m$ is large in many realistic applications. A vector within $𝒞^{0} (ℝ)$ , i.e., a continuous function defined on the reals, cannot be so specified since it would require an infinite, non-countable listing of function values. In either case, the task of describing the elements of a vector space $𝒱 = (V, S, +, \cdot)$ by simpler means arises. Within data science this leads to classification problems in accordance with some relevant criteria.

2.1.Equivalence classes

Many classification criteria are scalars, defined as a scalar-valued function $f : 𝒱 \to S$ on a vector space, $𝒱 = (V, S, +, \cdot)$ . The most common criteria are inspired by experience with Euclidean space. In a Euclidean-Cartesian model $(ℝ^{2}, ℝ, +, \cdot)$ of the geometry of a plane $Π$ , a point $O \in Π$ is arbitrarily chosen to correspond to the zero vector $𝟎 = {[\begin{array}{ll} 0 & 0 \end{array}]}^{T}$ , along with two preferred vectors $𝒆_{1}, 𝒆_{2}$ grouped together into the identity matrix $𝑰$ . The position of a point $X \in Π$ with respect to $O$ is given by the linear combination

𝒙 = 𝑰 𝒙 + 𝟎 = [\begin{array}{ll} 𝒆_{1} & 𝒆_{2} \end{array}] [\begin{array}{l} x_{1} \\ x_{2} \end{array}] = x_{1} 𝒆_{1} + x_{2} 𝒆_{2} .

Several possible classifications of points in the plane are depicted in Figure 2: lines, squares, circles. Intuitively, each choice separates the plane into subsets, and a given point in the plane belongs to just one in the chosen family of subsets. A more precise characterization is given by the concept of a partition of a set.

Definition. (Partition) . A partition of a set is a grouping of its elements into non-empty subsets such that every element is included in exactly one subset.

In precise mathematical terms, a partition of set $S$ is $P = {S_{i} | S_{i} \subset P, S_{i} \neq \emptyset ., i \in I}$ such that $\forall x \in S$ , $\exists! j \in I$ for which $x \in S_{j}$ . Since there is only one set ( $\exists!$ signifies “exists and is unique”) to which some given $x \in S$ belongs, the subsets $S_{i}$ of the partition $P$ are disjoint, $i \neq j \Rightarrow S_{i} \cap S_{j} = \emptyset$ . The subsets $S_{i}$ are labeled by $i$ within some index set $I$ . The index set might be a subset of the naturals, $I \subset ℕ$ in which case the partition is countable, possibly finite. The partitions of the plane suggested by Figure 2 are however indexed by a real-valued label, $i \in ℝ$ with $I \subset ℝ$ .

A technique which is often used to generate a partition of a vector space $𝒱 = (V, S, +, \cdot)$ is to define an equivalence relation between vectors, $H \subseteq V \times V$ . For some element $𝒖 \in V$ , the equivalence class of $𝒖$ is defined as all vectors $𝒗$ that are equivalent to $𝒖$ , ${𝒗 | . (𝒖, 𝒗) \in H}$ . The set of equivalence classes of is called the quotient set and denoted as $V / H$ , and the quotient set is a partition of $V$ . Figure 2 depicts four different partitions of the plane. These can be interpreted geometrically, such as parallel lines or distance from the origin. With wider implications for linear algebra, the partitions can also be given in terms of classification criteria specified by functions.

Figure 2. Equivalence classes within the plane

2.2.Norms

The partition of $ℝ^{2}$ by circles from Figure 2 is familiar; the equivalence classes are sets of points whose position vector has the same size, ${𝒙 = {[\begin{array}{ll} x_{1} & x_{2} \end{array}]}^{T} | {(x_{1}^{2} + x_{2}^{2})}^{1 / 2} = r .}$ , or is at the same distance from the origin. Note that familiarity with Euclidean geometry should not obscure the fact that some other concept of distance might be induced by the data. A simple example is statement of walking distance in terms of city blocks, in which the distance from a starting point to an address $x_{1} = 3$ blocks east and $x_{2} = 4$ blocks north is $x_{1} + x_{2} = 7$ city blocks, not the Euclidean distance ${(x_{1}^{2} + x_{2}^{2})}^{1 / 2} = 5$ since one cannot walk through the buildings occupying a city block.

The above observations lead to the mathematical concept of a norm as a tool to evaluate vector magnitude. Recall that a vector space is specified by two sets and two operations, $𝒱 = (V, S, +, \cdot)$ , and the behavior of a norm with respect to each of these components must be defined. The desired behavior includes the following properties and formal definition.

Unique value: The magnitude of a vector $𝒗 \in V$ should be a unique scalar, requiring the definition of a function. The scalar could have irrational values and should allow ordering of vectors by size, so the function should be from $V$ to $ℝ$ , $f : V \to ℝ$ . On the real line the point at coordinate $x$ is at distance $| x |$ from the origin, and to mimic this usage the norm of $𝒗 \in V$ is denoted as $|| 𝒗 ||$ , leading to the definition of a function $|| || : V \to ℝ_{+}$ , $ℝ_{+} = {a | a \in ℝ, a ⩾ 0 .}$ .
Null vector case: Provision must be made for the only distinguished element of $V$ , the null vector $𝟎$ . It is natural to associate the null vector with the null scalar element, $|| 𝟎 || = 0$ . A crucial additional property is also imposed namely that the null vector is the only vector whose norm is zero, $|| 𝒗 || = 0 \Rightarrow 𝒗 = 𝟎$ . From knowledge of a single scalar value, an entire vector can be determined. This property arises at key junctures in linear algebra, notably in providing a link to another branch of mathematics known as analysis, and is needed to establish the fundamental theorem of linear algbera or the singular value decomposition encountered later.
Scaling: Transfer of the scaling operation $𝒗 = a 𝒖$ property leads to imposing $|| 𝒗 || = | a | || 𝒖 ||$ . This property ensures commensurability of vectors, meaning that the magnitude of vector $𝒗$ can be expressed as a multiple of some standard vector magnitude $|| 𝒖 ||$ .
Vector addition: Position vectors from the origin to coordinates $x, y > 0$ on the real line can be added and $| x + y | = | x | + | y |$ . If however the position vectors point in different directions, $x > 0$ , $y < 0$ , then $| x + y | < | x | + | y |$ . For a general vector space the analogous property is known as the triangle inequality, $|| 𝒖 + 𝒗 || ⩽ || 𝒖 || + || 𝒗 ||$ for $𝒖, 𝒗 \in V$ .

Definition. (Norm) . A norm on the vector space $𝒱 = (V, S, +, \cdot)$ is a function $|| || : V \to ℝ_{+}$ that for $𝒖, 𝒗 \in V$ , $a \in S$ satisfies:

$|| 𝒗 || = 0 \Rightarrow 𝒗 = 𝟎$ ;

$|| a 𝒖 || = | a | || 𝒖 ||$ ;

$|| 𝒖 + 𝒗 || ⩽ || 𝒖 || + || 𝒗 ||$ .

Note that the norm is a functional, but the triangle inequality implies that it is not generally a linear functional. Returning to Figure 2, consider the functions $f_{i} : ℝ^{2} \to ℝ_{+}$ defined for $𝒙 = {[\begin{array}{cc} x_{1} & x_{2} \end{array}]}^{T}$ through values

f_{1} (𝒙) = | x_{1} |, f_{2} (𝒙) = | x_{2} |, f_{3} (𝒙) = | x_{1} | + | x_{2} |, f_{4} (𝒙) = {({| x_{1} |}^{2} + {| x_{2} |}^{2})}^{1 / 2} .

Sets of constant value of the above functions are also equivalence classes induced by the equivalence relations $E_{i}$ for $i = 1, 2, 3, 4$ .

$f_{1} (𝒙) = c \Rightarrow | x_{1} | = c$ , $E_{1} = {(𝒙, 𝒚) | f_{1} (𝒙) = f_{1} (𝒚) \Leftrightarrow | x_{1} | = | y_{1} | .} \subseteq ℝ^{2} \times ℝ^{2}$ ;
$f_{2} (𝒙) = c \Rightarrow | x_{2} | = c$ , $E_{2} = {(𝒙, 𝒚) | f_{2} (𝒙) = f_{2} (𝒚) \Leftrightarrow | x_{2} | = | y_{2} | .} \subseteq ℝ^{2} \times ℝ^{2}$ ;
$f_{3} (𝒙) = c \Rightarrow | x_{1} | + | x_{2} | = c$ , $E_{3} = {(𝒙, 𝒚) | f_{3} (𝒙) = f_{3} (𝒚) \Leftrightarrow | x_{1} | + | x_{2} | = | y_{1} | + | y_{2} | .} \subseteq ℝ^{2} \times ℝ^{2}$ ;
$f_{4} (𝒙) = c \Rightarrow {({| x_{1} |}^{2} + {| x_{2} |}^{2})}^{1 / 2} = c$ , $E_{4} = {(𝒙, 𝒚) | f_{4} (𝒙) = f_{4} (𝒚) \Leftrightarrow {({| x_{1} |}^{2} + {| x_{2} |}^{2})}^{1 / 2} = {({| y_{1} |}^{2} + {| y_{2} |}^{2})}^{1 / 2} .} \subseteq ℝ^{2} \times ℝ^{2}$ .

These equivalence classes correspond to the vertical lines, horizontal lines, squares, and circles of Figure 2. Not all of the functions $f_{i}$ are norms since $f_{1} (𝒙)$ is zero for the non-null vector $𝒙 = {[\begin{array}{cc} 0 & 1 \end{array}]}^{T}$ , and $f_{2} (𝒙)$ is zero for the non-null vector $𝒙 = {[\begin{array}{cc} 1 & 0 \end{array}]}^{T}$ . The functions $f_{3}$ and $f_{4}$ are indeed norms, and specific cases of the following general norm.

Definition. ( $p$ -Norm in $ℛ_{m}$ ) . The $p$ -norm on the real vector space $ℛ_{m} = (ℝ^{m}, ℝ, +, \cdot)$ for $p ⩾ 1$ is the function ${|| ||}_{p} : V \to ℝ_{+}$ with values ${|| 𝒙 ||}_{p} = {({| x_{1} |}^{p} + {| x_{2} |}^{p} + \dots + {| x_{m} |}^{p})}^{1 / p}$ , or

${|| 𝒙 ||}_{p} = {(\sum_{i = 1}^{m} {| x_{i} |}^{p})}^{1 / p} for 𝒙 \in ℝ^{m} .$ (4)

Denote by $x_{i}$ the largest component in absolute value of $𝒙 \in ℝ^{m}$ . As $p$ increases, ${| x_{i} |}^{p}$ becomes dominant with respect to all other terms in the sum suggesting the definition of an inf-norm by

{|| 𝒙 ||}_{\infty} = {max}_{1 ⩽ i ⩽ m} | x_{i} | .

This also works for vectors with equal components, since the fact that the number of components is finite while $p \to \infty$ can be used as exemplified for $𝒙 = {[\begin{array}{cccc} a & a & \dots & a \end{array}]}^{T}$ , by ${|| 𝒙 ||}_{p} = {(m {| a |}^{p})}^{1 / p} = m^{1 / p} | a |$ , with $m^{1 / p} \to 1$ .

Note that the Euclidean norm corresponds to $p = 2$ , and is often called the $2$ -norm. The analogy between vectors and functions can be exploited to also define a $p$ -norm for $𝒞^{0} [a, b] = (C ([a, b]), ℝ, +, \cdot)$ , the vector space of continuous functions defined on $[a, b]$ .

Definition. ( $p$ -Norm in $𝒞^{0} [a, b]$ ) . The $p$ -norm on the vector space of continuous functions $𝒞^{0} [a, b]$ for $p ⩾ 1$ is the function ${|| ||}_{p} : V \to ℝ_{+}$ with values

${|| f ||}_{p} = {(\int_{a}^{b} {| f (x) |}^{p} d x)}^{1 / p}, for f \in C [a, b] .$ (5)

The integration operation $\int_{a}^{b}$ can be intuitively interpreted as the value of the sum $\sum_{i = 1}^{m}$ from equation (4) for very large $m$ and very closely spaced evaluation points of the function $f (x_{i})$ , for instance $| x_{i + 1} - x_{i} | = (b - a) / m$ . An inf-norm can also be define for continuous functions by

{|| f ||}_{\infty} = {sup}_{x \in [a, b]} | f (x) |,

where sup, the supremum operation can be intuitively understood as the generalization of the max operation over the countable set ${1, 2, \dots, m}$ to the uncountable set $[a, b]$ .

Figure 3. Regions within $ℝ^{2}$ for which ${|| 𝒙 ||}_{p} ⩽ 1$ , for $p = 1, 2, 3, \infty$ .

$•$ $\circ$ Vector norms arise very often in applications since they can be used to classify data, and are implemented in most software systems as a norm(x,p) to evaluate the $p$ -norm of a vector $𝒙$ , with $p = 2$ as the default.

∴	x=[1 1 1]; [norm(x) sqrt(3.0)]

$[\begin{array}{cc} 1.7320508075688772 & 1.7320508075688772 \end{array}]$ (6)

∴	m=9; x=ones(m); [norm(x) sqrt(m)]

$[\begin{array}{cc} 3.0 & 3.0 \end{array}]$ (7)

∴	m=4; x=ones(m); [norm(x,1) m]

$[\begin{array}{cc} 4.0 & 4.0 \end{array}]$ (8)

∴	[norm(x,1) norm(x,2) norm(x,4) norm(x,8) norm(x,16) norm(x,Inf)]

$[\begin{array}{cccccc} 4.0 & 2.0 & 1.414213562373095 & 1.189207115002721 & 1.0905077326652577 & 1.0 \end{array}]$ (9)

∴

2.3.Inner product

Norms are functionals that define what is meant by the size of a vector, but are not linear. Even in the simplest case of the real line, the linearity relation $| x + y | = | x | + | y |$ is not verified for $x > 0$ , $y < 0$ . Nor do norms characterize the familiar geometric concept of orientation of a vector. A particularly important orientation from Euclidean geometry is orthogonality between two vectors. Another function is required, but before a formal definition some intuitive understanding is sought by considering vectors and functionals in the plane, as depicted in Figure 4. Consider a position vector $𝒙 = {[\begin{array}{cc} x_{1} & x_{2} \end{array}]}^{T} \in ℝ^{2}$ and the previously-encountered linear functionals

f_{1}, f_{2} : ℝ^{2} \to ℝ, f_{1} (𝒙) = x_{1}, f_{2} (𝒙) = x_{2} .

The $x_{1}$ component of the vector $𝒙$ can be thought of as the number of level sets of $f_{1}$ times it crosses; similarly for the $x_{2}$ component. A convenient labeling of level sets is by their normal vectors. The level sets of $f_{1}$ have normal $𝒆_{1}^{T} = [\begin{array}{cc} 1 & 0 \end{array}]$ , and those of $f_{2}$ have normal vector $𝒆_{2}^{T} = [\begin{array}{cc} 0 & 1 \end{array}]$ . Both of these can be thought of as matrices with two columns, each containing a single component. The products of these matrices with the vector $𝒙$ gives the value of the functionals $f_{1}, f_{2}$

𝒆_{1}^{T} 𝒙 = [\begin{array}{cc} 1 & 0 \end{array}] [\begin{array}{c} x_{1} \\ x_{2} \end{array}] = 1 \cdot x_{1} + 0 \cdot x_{2} = x_{1} = f_{1} (𝒙),

𝒆_{2}^{T} 𝒙 = [\begin{array}{cc} 0 & 1 \end{array}] [\begin{array}{c} x_{1} \\ x_{2} \end{array}] = 0 \cdot x_{1} + 1 \cdot x_{2} = x_{1} = f_{2} (𝒙) .

Figure 4. Euclidean space $E_{2}$ and its dual $E_{2}^{*}$ .

In general, any linear functional $f$ defined on the real space $ℛ_{m}$ can be labeled by a vector

𝒂^{T} = [\begin{array}{cccc} a_{1} & a_{2} & \dots & a_{m} \end{array}],

and evaluated through the matrix-vector product $f (𝒙) = 𝒂^{T} 𝒙$ . This suggests the definition of another function $s : ℝ^{m} \times ℝ^{m} \to ℝ$ ,

s (𝒂, 𝒙) = 𝒂^{T} 𝒙 .

The function $s$ is called an inner product, has two vector arguments from which a matrix-vector product is formed and returns a scalar value, hence is also called a scalar product. The definition from an Euclidean space can be extended to general vector spaces. For now, consider the field of scalars to be the reals $S = ℝ$ .

Definition. (Inner Product) . An inner product in the vector space $𝒱 = (V, ℝ, +, \cdot)$ is a function $s : V \times V \to ℝ$ with properties

Symmetry

For any $𝒂, 𝒙 \in V$ , $s (𝒂, 𝒙) = s (𝒙, 𝒂)$ .

Linearity in second argument

For any $𝒂, 𝒙, 𝒚 \in V$ , $α, β \in ℝ$ , $s (𝒂, α 𝒙 + β 𝒚) = α s (𝒂, 𝒙) + β s (𝒂, 𝒚)$ .

Positive definiteness

For any $𝒙 \in V \ {𝟎}$ , $s (𝒙, 𝒙) > 0$ .

The inner product $s (𝒂, 𝒙)$ returns the number of level sets of the functional labeled by $𝒂$ crossed by the vector $𝒙$ , and this interpretation underlies many applications in the sciences as in the gravitational field example above. Inner products also provide a procedure to evaluate geometrical quantities and relationships.

Vector norm

In $ℛ_{m}$ the number of level sets of the functional labeled by $𝒙$ crossed by $𝒙$ itself is identical to the square of the 2-norm

s (𝒙, 𝒙) = 𝒙^{T} 𝒙 = {|| 𝒙 ||}_{2}^{2} .

In general, the square root of $s (𝒙, 𝒙)$ satisfies the properties of a norm, and is called the norm induced by an inner product

|| 𝒙 || = s {(𝒙, 𝒙)}^{1 / 2} .

A real space together with the scalar product $s (𝒙, 𝒚) = 𝒙^{T} 𝒚$ and induced norm $|| 𝒙 || = s {(𝒙, 𝒙)}^{1 / 2}$ defines an Euclidean vector space $ℰ_{m}$ .

Orientation

In $ℰ_{2}$ the point specified by polar coordinates $(r, θ)$ has the Cartesian coordinates $x_{1} = r \cos θ$ , $x_{2} = r \sin θ$ , and position vector $𝒙 = {[\begin{array}{cc} x_{1} & x_{2} \end{array}]}^{T}$ . The inner product

𝒆_{1}^{T} 𝒙 = [\begin{array}{cc} 1 & 0 \end{array}] [\begin{array}{c} x_{1} \\ x_{2} \end{array}] = 1 \cdot x_{1} + 0 \cdot x_{2} = r \cos θ,

is seen to contain information on the relative orientation of $𝒙$ with respect to $𝒆_{1}$ . In general, the angle $θ$ between two vectors $𝒙, 𝒚$ with any vector space with a scalar product can be defined by

\cos θ = \frac{s (𝒙, 𝒚)}{{[s (𝒙, 𝒙) s (𝒚, 𝒚)]}^{1 / 2}} = \frac{s (𝒙, 𝒚)}{|| 𝒙 || || 𝒚 ||},

which becomes

\cos θ = \frac{𝒙^{T} 𝒚}{|| 𝒙 || || 𝒚 ||},

in a Euclidean space, $𝒙, 𝒚 \in ℝ^{m}$ .

Orthogonality

In $ℰ_{2}$ two vectors are orthogonal if the angle between them is such that $\cos θ = 0$ , and this can be extended to an arbitrary vector space $𝒱 = (V, ℝ, +, \cdot)$ with a scalar product by stating that $𝒙, 𝒚 \in V$ are orthogonal if $s (𝒙, 𝒚) = 0$ . In $ℰ_{m}$ vectors $𝒙, 𝒚 \in ℝ^{m}$ are orthogonal if $𝒙^{T} 𝒚 = 0$ .

3.Linear mapping composition

3.1.Matrix-matrix product

From two functions $f : A \to B$ and $g : B \to C$ , a composite function, $h = g \circ f$ , $h : A \to C$ is defined by

h (x) = g (f (x)) .

Consider linear mappings between Euclidean spaces $𝒇 : ℝ^{n} \to ℝ^{m}$ , $𝒈 : ℝ^{m} \to ℝ^{p}$ . Recall that linear mappings between Euclidean spaces are expressed as matrix vector multiplication

𝒇 (𝒙) = 𝑨 𝒙, 𝒈 (𝒚) = 𝑩 𝒚, 𝑨 \in ℝ^{m \times n}, 𝑩 \in ℝ^{p \times m} .

The composite function $𝒉 = 𝒈 \circ 𝒇$ is $𝒉 : ℝ^{n} \to ℝ^{p}$ , defined by

𝒉 (𝒙) = 𝒈 (𝒇 (𝒙)) = 𝒈 (𝑨 𝒙) = 𝑩 𝑨 𝒙 .

Note that the intemediate vector $𝒖 = 𝑨 𝒙$ is subsequently multiplied by the matrix $𝑩$ . The composite function $𝒉$ is itself a linear mapping

𝒉 (a 𝒙 + b 𝒚) = 𝑩 𝑨 (a 𝒙 + b 𝒚) = 𝑩 (a 𝑨 𝒙 + b 𝑨 𝒚) = 𝑩 (a 𝒖 + b 𝒗) = a 𝑩 𝒖 + b 𝑩 𝒗 = a 𝑩 𝑨 𝒙 + b 𝑩 𝑨 𝒚 = a 𝒉 (𝒙) + b 𝒉 (𝒚),

so it also can be expressed a matrix-vector multiplication

𝒉 (𝒙) = 𝑪 𝒙 = 𝑩 𝑨 𝒙 .

(10)

Using the above, $𝑪$ is defined as the product of matrix $𝑩$ with matrix $𝑨$

𝑪 = 𝑩 𝑨 .

The columns of $𝑪$ can be determined from those of $𝑨$ by considering the action of $𝒉$ on the the column vectors of the identity matrix $𝑰 = [\begin{array}{cccc} 𝒆_{1} & 𝒆_{2} & \dots & 𝒆_{n} \end{array}] \in ℝ^{n \times n}$ . First, note that

𝑨 𝒆_{j} = [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] [\begin{array}{c} 1 \\ 0 \\ ⋮ \\ ⋮ \\ 0 \end{array}] = 𝒂_{1}, \dots, 𝑨 𝒆_{j} = [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] [\begin{array}{c} 0 \\ ⋮ \\ 1 \\ ⋮ \\ 0 \end{array}] = 𝒂_{j}, 𝑨 𝒆_{n} = [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] [\begin{array}{c} 0 \\ ⋮ \\ ⋮ \\ 0 \\ 1 \end{array}] = 𝒂_{n} .

(11)

The above can be repeated for the matrix $𝑪 = [\begin{array}{cccc} 𝒄_{1} & 𝒄_{2} & \dots & 𝒄_{n} \end{array}]$ giving

𝒉 (𝒆_{1}) = 𝑪 𝒆_{1} = 𝒄_{1}, \dots, 𝒉 (𝒆_{j}) = 𝑪 𝒆_{j} = 𝒄_{j}, \dots, 𝒉 (𝒆_{n}) = 𝑪 𝒆_{n} = 𝒄_{n} .

(12)

Combining the above equations leads to $𝒄_{j} = 𝑩 𝒂_{j}$ , or

𝑪 = [\begin{array}{cccc} 𝒄_{1} & 𝒄_{2} & \dots & 𝒄_{n} \end{array}] = 𝑩 [\begin{array}{cccc} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] .

$•$ $\circ$ From the above the matrix-matrix product $𝑪 = 𝑩 𝑨$ is seen to simply be a grouping of all the products of $𝑩$ with the column vectors of $𝑨$ ,

𝑪 = [\begin{array}{cccc} 𝒄_{1} & 𝒄_{2} & \dots & 𝒄_{n} \end{array}] = [𝑩 \begin{array}{cccc} 𝒂_{1} & 𝑩 𝒂_{2} & \dots & 𝑩 𝒂_{n} \end{array}] .

∴	a1=[1; 2]; a2=[3; 4]; A=[a1 a2]

$[\begin{array}{cc} 1 & 3 \\ 2 & 4 \end{array}]$ (13)

∴	b1=[-1; 1; 3]; b2=[2; -2; 3]; B=[b1 b2]

$[\begin{array}{cc} - 1 & 2 \\ 1 & - 2 \\ 3 & 3 \end{array}]$ (14)

∴

C=B*A

$[\begin{array}{cc} 3 & 5 \\ - 3 & - 5 \\ 9 & 21 \end{array}]$ (15)

∴	c1=Ba1; c2=Ba2; [c1 c2]

$[\begin{array}{cc} 3 & 5 \\ - 3 & - 5 \\ 9 & 21 \end{array}]$ (16)

∴

Lecture 5: Linear Functionals and Mappings