"Maymester MATH547 Linear Algebra for Applications in Data Science"

1.Quantities

1.1.Numbers

Most scientific disciplines introduce an idea of the amount of some entity or property of interest. Furthermore, the amount is usually combined with the concept of a number, an abstraction of the observation that the two sets $A =$ {Mary, Jane, Tom} and $B =$ {apple, plum, cherry} seem quite different, but we can match one distinct person to one distinct fruit as in {Mary $\to$ plum, Jane $\to$ apple, Tom $\to$ cherry}. In contrast we cannot do the same matching of distinct persons to a distinct color from the set {red, green}, and one of the colors must be shared between two persons. Formal definition of the concept of a number from the above observations is surprisingly difficult since it would be self-referential due to the apperance of the numbers “one” and “two”. Leaving this aside, the key concept is that of quantity of some property of interest that is expressed through a number. Several types of numbers have been introduced in mathematics to express different types of quantities, and the following will be used throughout this text:

$ℕ$: The set of natural numbers, $ℕ = {0, 1, 2, 3, \dots}$ , infinite and countable, $ℕ_{+} = {1, 2, 3, \dots}$ ;
$ℤ$: The set of integers, $ℤ = {0, \pm 1, \pm 2, \pm 3, \dots}$ , infinite and countable;
$ℚ$: The set of rational numbers $ℚ = {p / q, p \in ℤ, q \in ℕ_{+}}$ , infinite and countable;
$ℝ$: The set of real numbers, infinite, not countable, can be ordered;
$ℂ$: The set of complex numbers, $ℂ = {x + i y, x, y \in ℝ}$ , infinite, not countable, cannot be ordered.

A computer has a finite amount of memory, hence cannot represent all numbers, but rather subsets of the above sets. Furthermore, computers internally use binary numbers composed of binary digits, or bits. Many computer number types are defined for specific purposes, and are often encountered in applications such as image representation or digital data acquisition. Here are the main types.

Subsets of $ℕ$

The number types uint8, uint16, uint32, uint64 represent subsets of the natural numbers (unsigned integers) using 8, 16, 32, 64 bits respectively. An unsigned integer with $b$ bits can store a natural number in the range from $0$ to $2^{b} - 1$ . Two arbitrary natural numbers, written as $\forall i, j \in ℕ$ can be added and will give another natural number, $k = i + j \in ℕ$ . In contrast, addition of computer unsigned integers is only defined within the specific range $0$ to $2^{b} - 1$ .

octave]

i=uint8(15); j=uint8(10); k=i+j

k = 25

octave]

i=uint8(150); j=uint8(200); k=i+j

k = 255

octave]

k=i-j

k = 0

octave]

Subsets of $ℤ$

The number types int8, int16, int32, int64 represent subsets of the integers. One bit is used to store the sign of the number, so the subset of $ℤ$ that can be represented is from $1 - 2^{b - 1}$ to $2^{b - 1} - 1$

octave]

i=int8(100); j=int8(101); k=i+j

k = 127

octave]

k=i-j

k = -1

octave]

Subsets of $ℚ, ℝ, ℂ$

Computers approximate the real numbers through the set $𝔽$ of floating point numbers. Floating point numbers that use $b = 32$ bits are known as single precision, while those that use $b = 64$ are double precision. A floating point number $x \in 𝔽$ is stored internally as $x = \pm . B_{1} B_{2} \dots B_{m} \times 2^{\pm b_{1} b_{2} \dots b_{e}}$ where $B_{i}$ , $i = 1, \dots, m$ are bits within the mantissa of length $m$ , and $b_{j}$ , $j = 1, \dots, e$ are bits within the exponent, along with signs $\pm$ for each. The default number type is usually double precision, more concisely referred to double. Common constants such as $e$ , $π$ are predefined as double, can be truncated to single, and the number of displayed decimal digits is controlled by format. The function disp(x) displays its argument x.

octave]

format long; disp([e pi])

2.718281828459045 3.141592653589793

octave]

disp([single(e) single(pi)])

2.7182817 3.1415927

octave]

The approximation of the reals $ℝ$ by the floats $𝔽$ is characterized by: realmax, the largest float, realmin the smallest positive float, and eps known as machine epsilon. Machine epsilon highlights the differences between floating point and real numbers since it is defined as the largest number $ϵ \in 𝔽$ that satisfies $1 + ϵ = 1$ . If $ε \in ℝ$ of course $1 + ε = 1$ implies $ε = 0$ , but floating points exhibit “granularity”, in the sense that over a unit interval there are small steps that are indistinguishable from zero due to the finite number of bits available for a float. Machine epsilon is small, and floating point errors can usually be kept under control. Keep in mind that perfect accuracy is a mathematical abstraction, not encountered in nature. In fields as sociology or psychology 3 digits of accuracy are excellent, in mechanical engineering this might increase to 6 digits, or in electronic engineering to 8 digits. The most precisely known physical constant is the Rydberg constant known to 12 digits. The granularity of double precision expressed by machine epsilon is sufficient to represent natural phenomena.

octave]

format short; disp([realmin realmax eps 1+eps])

2.2251e-308 1.7977e+308 2.2204e-16 1.0000e+00

octave]

Within the reals certain operations are undefined such as $1 / 0$ . Special float constants are defined to handle such situations: Inf is a float meant to represent infinity, and NaN (“not a number”) is meant to represent an undefinable result of an arithmetic operation.

octave]

warning("off"); disp([Inf 1/0 2*realmax NaN Inf-Inf Inf/Inf])

Inf Inf Inf NaN NaN NaN

octave]

Complex numbers $z \in ℂ$ are specified by two reals, in Cartesian form as $z = x + i y$ , $x, y \in ℝ$ or in polar form as $z = ρ e^{i θ}$ , $ρ, θ \in ℝ$ , $ρ ⩾ 0$ . The computer type complex is similarly defined from two floats and the additional constant I is defined to represent $\sqrt{- 1} = i = e^{i π / 2}$ . Functions are available to obtain the real and imaginary parts within the Cartesian form, or the absolute value and argument of the polar form.

octave]

z1=complex(1,1); z2=complex(1,-1); disp([z1+z2 z1/z2])

2 + 0i 0 + 1i

octave]

disp([real(z1) real(z2) real(z1+z2) real(z1/z2)])

1 1 2 0

octave]

disp([imag(z1) imag(z2) imag(z1+z2) imag(z1/z2)])

1 -1 0 1

octave]

disp([abs(z1) abs(z2) abs(z1+z2) abs(z1/z2)])

1.4142 1.4142 2.0000 1.0000

octave]

disp([arg(z1) arg(z2) arg(z1+z2) arg(z1/z2)])

0.78540 -0.78540 0.00000 1.57080

octave]

I-sqrt(-1)

ans = 0

octave]

Care should be exercised about the cummulative effect of many floating point errors. For instance, in an “irrational” numerical investigation of Zeno's paradox, one might want to compare the distance $S_{N}$ traversed by step sizes that are scaled by $1 / π$ starting from one to $T_{N}$ , traversed by step sizes scaled by $π$ starting from $π^{- N}$

S_{N} = 1 + \frac{1}{π} + \frac{1}{π^{2}} + \dots + \frac{1}{π^{N}}, T_{N} = \frac{1}{π^{N}} + \frac{1}{π^{N - 1}} + \dots + 1 .

In the reals the above two expressions are equal, $S_{N} = T_{N}$ , but this is not verfied for all $N$ when using floating point numbers. Lists of the values $π^{j}$ , for the two orderings $j = 0, \dots, N$ , and $j = N, \dots, 0$ , can be generated and summed.

octave]

N=10; S=pi.^(0:-1:-N); T=pi.^(-N:1:0); sum(S)==sum(T)

ans = 1

octave]

N=15; S=pi.^(0:-1:-N); T=pi.^(-N:1:0); sum(S)==sum(T)

ans = 0

octave]

In the above numerical experiment a==b expresses an equality relationship which might evaluate as true denoted by 1, or false denoted by 0.

octave]

disp([1==1 1==2])

1 0

octave]

The above was called an “irrational” investigation since in Zeno's original paradox the scaling factor was 2 rather than $π$ , and due to the binary representation used by floats equality always holds.

octave]

N=30; S=2.^(0:-1:-N); T=2.^(-N:1:0); sum(S)==sum(T)

ans = 1

octave]

1.2.Quantities described by a single number

The above numbers and their computer approximations are sufficient to describe many quantities encountered in applications. Typical examples include:

the position $x \in ℝ$ of a point on the unit line segment $[0, 1]$ , approximated by the floating point number $\tilde{x} \in 𝔽$ , to within machine epsilon precision, $| x - \tilde{x} | ⩽ ϵ$ ;
the measure of resistance to change of the rate of motion known as mass, $m \in ℝ$ , $m > 0$ ;
the population of a large community expressed as a float $p \in 𝔽$ , even though for a community of individuals the population is a natural number, as in “the population of the United States is $p = 328.2 E6$ , i.e., 328.2 million”.

In most disciplines, there is a particular interest in comparison of two quantities, and to facilitate such comparison a common reference is used known as a standard unit. For measurement of a length $L$ , the meter $ℓ = 1 m$ is a standard unit, as in the statement $L = 10 m$ , that states that $L$ is obtained by taking the standard unit ten times, $L = 10 ℓ$ . The rules for carrying out such comparisons are part of the definition of real and rational numbers. These rules are formalized in the mathematical definition of a field $(F, +, \times)$ presented in the next chapter. Quantities that obey such rules, i.e., belong to a field, can be used in changes of scale and are called scalars. Not all numbers are scalars in this sense. For instance, the integers would not allow a scaling of 1:2 (halving the scale) even though 1,2 are integers.

1.3.Quantities described by multiple numbers

Other quantities require more than a single number. The distribution of population in the year 2000 among the alphabetically-ordered South American countries (Argentina, Bolivia,..,Venezuela) requires 12 numbers. These are placed together in a list known in mathematics as a tuple, in this case a 12-tuple $P = (p_{1}, p_{2}, \dots, p_{12})$ , with $p_{1}$ the population of Argentina, $p_{2}$ that of Bolivia, and so on. An analogous 12-tuple can be formed from the South American populations in the year 2020, say $Q = (q_{1}, q_{2}, \dots, q_{12})$ . Note that it is difficult to ascribe meaning to apparently plausible expressions such as $P + Q$ since, for instance, some people in the 2000 population are also in the 2020 population, and would be counted twice.

2.Vectors

2.1.Vector spaces

In contrast to the population 12-tuple example above, combining multiple numbers is well defined in operations such as specifying a position within a three-dimensional Cartesian grid, or determining the resultant of two forces in space. Both of these lead to the consideration of 3-tuples or triples such as the force $(f_{1}, f_{2}, f_{3})$ . When combined with another force $(g_{1}, g_{2}, g_{3})$ the resultant is $(f_{1} + g_{1}, f_{2} + g_{2}, f_{3} + g_{3})$ . If the force $(f_{1}, f_{2}, f_{3})$ is amplified by the scalar $α$ and the force $(g_{1}, g_{2}, g_{3})$ is similarly scaled by $β$ , the resultant becomes

α (f_{1}, f_{2}, f_{3}) + β (g_{1}, g_{2}, g_{3}) = (α f_{1}, α f_{2}, α f_{3}) + (β g_{1}, β g_{2}, β g_{3}) = (α f_{1} + β g_{1}, α f_{2} + β g_{2}, α f_{3} + β g_{3}) .

It is useful to distinguish tuples for which scaling and addition is well defined from simple lists of numbers. In fact, since the essential difference is the behavior with respect to scaling and addition, the focus should be on these operations rather than the elements of the tuple.

The above observations underlie the definition of a vector space $𝒱$ by a set $V$ whose elements satisfy certain scaling and addition properties, denoted all together by the 4-tuple $𝒱 = (V, S, +, \cdot)$ . The first element of the 4-tuple is a set whose elements are called vectors. The second element is a set of scalars, and the third is the vector addition operation. The last is the scaling operation, seen as multiplication of a vector by a scalar. The vector addition and scaling operations must satisfy rules suggested by positions or forces in three-dimensional space, which are listed in Table 1. In particular, a vector space requires definition of two distinguished elements: the zero vector $𝟎 \in V$ , and the identity scalar element $1 \in S$ .

Addition rules for	$\forall 𝒂, 𝒃, 𝒄 \in V$
$𝒂 + 𝒃 \in V$	Closure
$𝒂 + (𝒃 + 𝒄) = (𝒂 + 𝒃) + 𝒄$	Associativity
$𝒂 + 𝒃 = 𝒃 + 𝒂$	Commutativity
$𝟎 + 𝒂 = 𝒂$	Zero vector
$𝒂 + (- 𝒂_{}) = 𝟎$	Additive inverse
Scaling rules for	$\forall 𝒂, 𝒃 \in V$ , $\forall x, y \in S$
$x 𝒂 \in V$	Closure
$x (𝒂 + 𝒃) = x 𝒂 + x 𝒃$	Distributivity
$(x + y) 𝒂 = x 𝒂 + y 𝒂$	Distributivity
$x (y 𝒂) = (x y) 𝒂$	Composition
$1 𝒂 = 𝒂$	Scalar identity

Table 1. Vector space $𝒱 = (V, S, +, \cdot)$ properties for arbitrary $𝒂, 𝒃, 𝒄 \in V$

The definition of a vector space reflects everyday experience with vectors in Euclidean geometry, and it is common to refer to such vectors by descriptions in a Cartesian coordinate system. For example, a position vector $𝒓$ within the plane can be referred through the pair of coordinates $(x, y)$ . This intuitive understanding can be made precise through the definition of a vector space $ℛ_{2} = (ℝ^{2}, ℝ, +, \cdot)$ , called the real 2-space. Vectors within $ℛ_{2}$ are elements of $ℝ^{2} = ℝ \times ℝ = {(x, y) | . x, y \in ℝ}$ , meaning that a vector is specified through two real numbers, $𝒓 \leftrightarrow (x, y)$ . Addition of two vectors, $𝒒 \leftrightarrow (s, t)$ , $𝒓 \leftrightarrow (x, y)$ is defined by addition of coordinates $𝒒 + 𝒓 = (s + x, t + v)$ . Scaling $𝒓 \leftrightarrow (x, y)$ by scalar $a$ is defined by $a 𝒓 \leftrightarrow (a x, a y)$ . Similarly, consideration of position vectors in three-dimensional space leads to the definition of the $ℛ_{3} = (ℝ^{3}, ℝ, +, \cdot)$ , or more generally a real $m$ -space $ℛ_{m} = (ℝ^{m}, ℝ, +, \cdot)$ , $m \in ℕ$ , $m > 0$ .

Note however that there is no mention of coordinates in the definition of a vector space as can be seen from the list of properties in Table 1. The intent of such a definition is to highlight that besides position vectors, many other mathematical objects follow the same rules. As an example, consider the set of all continuous functions $C (ℝ) = {f | . f : ℝ \to ℝ}$ , with function addition defined by the sum at each argument $t,$ $(f + g) (t) = f (t) + g (t)$ , and scaling by $a \in ℝ$ defined as $(a f) (t) = a f (t)$ . Read this as: “given two continuous functions $f$ and $g$ , the function $f + g$ is defined by stating that its value for argument $x$ is the sum of the two real numbers $f (t)$ and $g (t)$ ”. Similarly: “given a continuous function $f$ , the function $a f$ is defined by stating that its value for argument $t$ is the product of the real numbers $a$ and $f (t)$ ”. Under such definitions $𝒞^{0} = (C (ℝ), ℝ, +, \cdot)$ is a vector space, but quite different from $ℛ_{m}$ . Nonetheless, the fact that both $𝒞^{0}$ and $ℛ_{m}$ are vector spaces can be used to obtain insight into the behavior of continuous functions from Euclidean vectors, and vice versa.

2.2.Real vector space $ℛ_{m}$

Column vectors.

Since the real spaces

ℛ_{m} = (ℝ^{m}, ℝ, +, \cdot)

play such an important role in themselves and as a guide to other vector spaces, familiarity with vector operations in

ℛ_{m}

is necessary to fully appreciate the utility of linear algebra to a wide range of applications. Following the usage in geometry and physics, the

m

real numbers that specify a vector

𝒖 \in ℝ^{m}

are called the components of

𝒖

. The one-to-one correspondence between a vector and its components

𝒖 \leftrightarrow (u_{1}, \dots, u_{m})

, is by convention taken to define an equality relationship,

𝒖 = [\begin{array}{l} u_{1} \\ ⋮ \\ u_{m} \end{array}],

(1)

with the components arranged vertically and enclosed in square brackets. Given two vectors $𝒖, 𝒗 \in ℝ^{m}$ , and a scalar $a \in ℝ$ , vector addition and scaling are defined in $ℛ_{m}$ by real number addition and multiplication of components

𝒖 + 𝒗 = [\begin{array}{l} u_{1} \\ ⋮ \\ u_{m} \end{array}] + [\begin{array}{l} v_{1} \\ ⋮ \\ v_{m} \end{array}] = [\begin{array}{l} u_{1} + v_{1} \\ ⋮ \\ u_{m} + v_{m} \end{array}], a 𝒖 = a [\begin{array}{l} u_{1} \\ ⋮ \\ u_{m} \end{array}] = [\begin{array}{l} a u_{1} \\ ⋮ \\ a u_{m} \end{array}] .

(2)

The vector space $ℛ_{m}$ is defined using the real numbers as the set of scalars, and constructing vectors by grouping together $m$ scalars, but this approach can be extended to any set of scalars $S$ , leading to the definition of the vector spaces $𝒮_{n} = (S^{n}, S, +, \cdot)$ . These will often be referred to as $n$ -vector space of scalars, signifying that the set of vectors is $V = S^{n}$ .

To aid in visual recognition of vectors, the following notation conventions are introduced:

vectors are denoted by lower-case bold Latin letters: $𝒖, 𝒗$ ;

scalars are denoted by normal face Latin or Greek letters: $a, b, α, β$ ;

the components of a vector are denoted by the corresponding normal face with subscripts as in equation (1);

related sets of vectors are denoted by indexed bold Latin letters: $𝒖_{1}, 𝒖_{2}, \dots, 𝒖_{n}$ .

In Octave, successive components placed vertically are separated by a semicolon.

octave]

[1; 2; -1; 2]

ans = 1 2 -1 2

octave]

The equal sign in mathematics signifies a particular equivalence relationship. In computer systems such as Octave the equal sign has the different meaning of assignment, that is defining the label on the left side of the equal sign to be the expression on the right side. Subsequent invocation of the label returns the assigned object. Components of a vector are obtained by enclosing the index in parantheses.

octave]

u=[1; 2; -1; 2]; u

u = 1 2 -1 2

octave]

u(3)

ans = -1

octave]

Row vectors.

Instead of the vertical placement or components into one column, the components of could have been placed horizontally in one row

[\begin{array}{lll} u_{1} & \dots & u_{m} \end{array}]

, that contains the same data, differently organized. By convention vertical placement of vector components is the preferred organization, and

𝒖

shall denote a column vector henceforth. A transpose operation denoted by a

T

superscript is introduced to relate the two representations

𝒖^{T} = [\begin{array}{lll} u_{1} & \dots & u_{m} \end{array}],

and $𝒖^{T}$ is the notation used to denote a row vector. In Octave, horizontal placement of successive components in a row is denoted by a space.

octave]

uT=transpose(u)

uT = 1 2 -1 2

octave]

[1 2 -1 2]

ans = 1 2 -1 2

octave]

uT(4)

ans = 2

octave]

Compatible vectors.

Addition of real vectors

𝒖, 𝒗 \in ℝ^{m}

defines another vector

𝒘 = 𝒖 + 𝒗 \in ℝ^{m}

. The components of

𝒘

are the sums of the corresponding components of

𝒖

and

𝒗

w_{i} = u_{i} + v_{i}

, for

i = 1, 2, \dots, m

. Addition of vectors with different number of components is not defined, and attempting to add such vectors produces an error. Such vectors with different number of components are called incompatible, while vectors with the same number of components are said to be compatible. Scaling of

𝒖

a

defines a vector

𝒛 = a 𝒖

, whose components are

z_{i} = a u_{i}

, for

i = 1, 2, \dots, m

. Vector addition and scaling in Octave are defined using the

+

and

*

operators.

octave]

uT=[1 0 1 2]; vT=[2 1 3 -1]; wT=uT+vT; disp(wT)

3 1 4 1

octave]

rT=[1 2]; uT+rT

operator +: nonconformant arguments (op1 is 1x4, op2 is 1x2)

octave]

a=3; zT=a*uT; disp(zT)

3 6 -3 6

octave]

2.3.Working with vectors

Ranges.

The vectors used in applications usually have a large number of components,

m ≫ 1

, and it is important to become proficient in their manipulation. Previous examples defined vectors by explicit listing of their

m

components. This is impractical for large

m

, and support is provided for automated generation for often-encountered situations. First, observe that Table 1 mentions one distinguished vector, the zero element that is a member of any vector space

𝟎 \in V

. The zero vector of a real vector space

ℛ_{m}

is a column vector with

m

components, all of which are zero, and a mathematical convention for specifying this vector is

𝟎^{T} = [\begin{array}{llll} 0 & 0 & \dots & 0 \end{array}] \in ℝ^{m}

. This notation specifies that transpose of the zero vector is the row vector with

m

zero components, also written through explicit indexing of each component as

𝟎_{i} = 0

, for

i = 1, \dots, m

. Keep in mind that the zero vector

𝟎

and the zero scalar

0

are different mathematical objects. The ellipsis symbol in the mathematical notation is transcribed in Octave by the notion of a range, with 1:m denoting all the integers starting from

1

m

, organized as a row vector. The notation is extended to allow for strides different from one, and the mathematical ellipsis

i = m, m - 1, \dots, 1

is denoted as m:-1:1. In general r:s:t denotes the set of numbers

{r, r + s, \dots, r + n s}

with

r + n s ⩽ t

, and

r, s, t

real numbers and

n

a natural number,

r, s, t \in ℝ

n \in ℕ

. If there is no natural number

n

such that

r + n s ⩽ t

, an empty vector with no components is returned.

octave]

m=4; disp(1:m)

1 2 3 4

octave]

disp(m:-1:2)

4 3 2

octave]

r=0; s=0.2; t=1; disp(r:s:t)

0.00000 0.20000 0.40000 0.60000 0.80000 1.00000

octave]

r=0; s=0.3; t=1; disp(r:s:t)

0.00000 0.30000 0.60000 0.90000

octave]

r=0; s= -0.2; t=1; disp(r:s:t)

[](1x0)

octave]

An efficient, expressive feature of many software systems including Octave is to use ranges as indices to a vector, as shown below for the definition of $𝟎 \in ℝ^{4}$ . Note that the index range i is organized as a row, and a transpose operation must be applied to obtain z as a column vector.

octave]

m=4; i=1:m; z(i)=i.^i; z=transpose(z); disp(z)

1 4 27 256

octave]

i = 1 2 3 4

octave]

disp(transpose(z))

0 0 0 0

octave]

Visualization.

A component-by-component display of a vector becomes increasingly unwieldy as the number of components

m

becomes large. For example, the numbers below seem an inefficient way to describe the sine function.

octave]

 t=0:0.1:1.5; disp(sin(t))

Columns 1 through 8: 0.00000 0.09983 0.19867 0.29552 0.38942 0.47943 0.56464 0.64422 Columns 9 through 16: 0.71736 0.78333 0.84147 0.89121 0.93204 0.96356 0.98545 0.99749

octave]

Indeed, such a piece-by-piece approach is not the way humans organize large amounts of information, preferring to conceptualize the data as some other entity: an image, a sound excerpt, a smell, a taste, a touch, a sense of balance, or relative position. All seven of these human senses will be shown to allow representation by linear algebra concepts, including representation by vectors.

As a first example consider visualization, the process of transforming data into a sight perception. A familiar example is constructing a plot of the graph of a function. Recall that in mathematics the graph of a function $f : X \to Y$ relating elements of the domain $X$ to those in codomain $Y$ is a set of ordered pairs $G_{f} = {(x, y) | y = f (x), x \in X .}$ . For a commonly encountered function such as $\sin : [0, 2 π) \to [- 1, 1]$ , the graph $G_{\sin} = {(x, \sin (x)) | x \in [0, 2 π .)}$ contains an uncountably infinite number of elements, and obviously cannot be explicitly listed. The sine function is continuous, meaning that no matter how small an open interval $(c, d)$ within the function codomain $[- 1, 1]$ one considers, there exists an interval $(a, b)$ in the function domain $[0, 2 π]$ whose image by the sine function is contained in $(c, d)$ . In mathematical “ $δ - ε$ ” notation this is stated as: $\forall ε > 0, \exists δ_{ε}, | x_{1} - x_{0} | < δ_{ε} \Rightarrow | \sin (x_{1}) - \sin (x_{0}) | < ε$ . This mathematical notation is concise and precise, but perceptive mainly to the professional mathematician. A more intuitive visualization of continuity is obtained by approximating the graph of a function $f : X \to Y$ by a finite set of samples, $G_{f}^{m} = {(x_{i}, y_{i}) | x_{i} \in X, y_{i} = f (x_{i}), i = 1, \dots, m, m \in ℕ .}$ . Strictly speaking, the sampled graph $G_{f}^{m}$ would indicate jumps interpretable as discontinuities, but when plotting the points human sight perception conveys a sense of continuity for large sample sizes, $m ≫ 1$ . For the sine function example, consider sampling the domain $[0, 2 π)$ with a step size $h = 2 π / m$ , $m ≫ 1$ . To obtain a visual representation of the sampled sine function the Octave plot function can be used to produce a figure that will appear in another window, interactively investigated, and subsequently closed. For large $m$ one cannot visually distinguish the points in the graph sample, though this is apparent for smaller sample sizes. This is shown below by displaying a subrange of the sampled points with stride $s$ . This example also shows the procedure to save a permanent copy of the displayed figure through the Octave print -deps command that places the currently displayed plot into an Encapsulated Postscript file. The generated figure file can be linked to a document as shown here in Figure 1, in which both plots render samples of the graph of the sine function, but the one with large $m$ is perceived as being continuous.

octave]

m=1000; h=2*pi/m; x=(0:m-1)*h;

octave]

y=sin(x); plot(x,y);

octave]

close;

octave]

s=50; i=1:s:m; xs=x(i); ys=y(i);

octave]

plot(x,y,'b',xs,ys,'bo');

octave]

print -depsc L01Fig01.eps;

octave]

close;

octave]

Figure 1. Visualization of vectors of sampled function graphs.

3.Matrices

3.1.Forming matrices

The real numbers themselves form the vector space $ℛ_{1} = (ℝ, ℝ, +, \cdot)$ , as does any field of scalars, $𝒮_{1} = (S, S, +, \cdot)$ . Juxtaposition of $m$ real numbers has been seen to define the new vector space $ℛ_{m}$ . This process of juxtaposition can be continued to form additional mathematical objects. A matrix is defined as a juxtaposition of compatible vectors. As an example, consider $n$ vectors $𝒂_{1}, 𝒂_{2}, \dots, 𝒂_{n} \in V$ within some vector space $𝒱 = (V, S, +, \cdot)$ . Form a matrix by placing the vectors into a row,

𝑨 = [\begin{array}{llll} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}] .

(3)

To aid in visual recognition of a matrix, upper-case bold Latin letters will be used to denote matrices. The columns of a matrix will be denoted by the corresponding lower-case bold letter with a subscripted index as in equation (3). Note that the number of columns in a matrix can be different from the number of components in each column, as would be the case for matrix $𝑨$ from equation (3) when choosing vectors from, say, the real space $ℛ_{m}$ , $𝒂_{1}, 𝒂_{2}, \dots, 𝒂_{n} \in ℝ^{m}$ .

Vectors were seen to be useful juxtapositions of scalars that could describe quantities a single scalar could not: a position in space, a force in physics, or a sampled function graph. The crucial utility of matrices is their central role in providing a description of new vectors other then their column vectors, and is suggested by experience with Euclidean spaces.

3.2.Identity matrix

Consider first $ℛ_{1}$ , the vector space of real numbers. A position vector $𝒓 \in ℛ_{1}$ on the real axis is specified by a single scalar component, $𝒓 = [x]$ , $x \in ℝ$ . Read this to mean that the position $𝒓$ is obtained by traveling $x$ units from the origin at position vector $𝟎 = [0]$ . Look closely at what is meant by “unit” in this context. Since $x$ is a scalar, the mathematical expression $𝒓 = 𝟎 + x$ has no meaning, as addition of a vector to a scalar has not been defined. Recall that scalars were introduced to capture the concept of scaling of a vector, so in the context of vector spaces they always appear as multiplying some vector. The correct mathematical description is $𝒓 = 𝟎 + x 𝒆$ , where $𝒆$ is the unit vector $𝒆 = [1]$ . Taking the components leads to $r_{1} = 0_{1} + x e_{1}$ , where $r_{1}, 0_{1}, e_{1}$ are the first (and in this case only) components of the $𝒓, 𝟎, 𝒆$ vectors. Since $r_{1} = x$ , $0_{1} = 0$ , $e_{1} = 1$ , one obtains the identity $x = 0 + x \cdot 1$ .

Now consider $ℛ_{2}$ , the vector space of positions in the plane. Repeating the above train of thought leads to the identification of two direction vectors $𝒆_{1}$ and $𝒆_{2}$

𝒓 = [\begin{array}{l} x \\ y \end{array}] = x [\begin{array}{l} 1 \\ 0 \end{array}] + y [\begin{array}{l} 0 \\ 1 \end{array}] = x 𝒆_{1} + y 𝒆_{2}, 𝒆_{1} = [\begin{array}{l} 1 \\ 0 \end{array}], 𝒆_{2} = [\begin{array}{l} 0 \\ 1 \end{array}] .

octave]

x=2; y=4; e1=[1; 0]; e2=[0; 1]; r=x*e1+y*e2

r = 2 4

octave]

Continuing the procees to $ℛ_{m}$ gives

𝒙 = [\begin{array}{l} x_{1} \\ x_{2} \\ ⋮ \\ x_{m} \end{array}] = x_{1} 𝒆_{1} + x_{2} 𝒆_{2} + \dots + x_{m} 𝒆_{m}, 𝒆_{1} = [\begin{array}{l} 1 \\ 0 \\ ⋮ \\ 0 \\ 0 \end{array}], 𝒆_{2} = [\begin{array}{l} 0 \\ 1 \\ ⋮ \\ 0 \\ 0 \end{array}], \dots, 𝒆_{m} = [\begin{array}{l} 0 \\ 0 \\ ⋮ \\ 0 \\ 1 \end{array}] .

For arbitrary $m$ , the components are now $x_{1}, x_{2}, \dots, x_{m}$ rather than the alphabetically ordered letters common for $m = 2$ or $m = 3$ . It is then consistent with the adopted notation convention to use $𝒙 \in ℛ_{m}$ to denote the position vector whose components are $(x_{1}, \dots, x_{m})$ . The basic idea is the same as in the previous cases: to obtain a position vector scale direction $𝒆_{1}$ by $x_{1}$ , $𝒆_{2}$ by $x_{2}$ ,..., $𝒆_{m}$ by $x_{m}$ , and add the resulting vectors.

Juxtaposition of the vectors $𝒆_{1}, 𝒆_{2}, \dots, 𝒆_{m}$ leads to the formation of a matrix of special utility known as the identity matrix

𝑰 = [\begin{array}{llll} 𝒆_{1} & 𝒆_{2} & \dots & 𝒆_{m} \end{array}] .

The identity matrix is an example of a matrix in which the number of column vectors $n$ is equal to the number of components in each column vector $m = n$ . Such matrices with equal number of columns and rows are said to be square. Due to entrenched practice an exception to the notation convention is made and the identity matrix is denoted by $𝑰$ , but its columns are denoted the indexed bold-face of a different lower-case letter, $𝒆_{1}, \dots, 𝒆_{m}$ . If it becomes necessary to explicitly state the number of columns in $𝑰$ , the notation $𝑰_{m}$ is used to denote the identity matrix with $m$ columns, each with $m$ components.

4.Linear combinations

4.1.Linear combination as a matrix-vector product

The expression

𝒙 = x_{1} 𝒆_{1} + x_{2} 𝒆_{2} + \dots + x_{m} 𝒆_{m}

expresses the idea of scaling vectors within a set and subsequent addition to form a new vector

𝒙

. The matrix

𝑰 = [\begin{array}{llll} 𝒆_{1} & 𝒆_{2} & \dots & 𝒆_{m} \end{array}]

groups these vectors together in a single entity, and the scaling factors are the components of the vector

𝒙

. To bring all these concepts together it is natural to consider the notation

as a generalization of the scalar expression

x = 1 \cdot x

. It is clear what the operation

𝑰 𝒙

should signify: it should capture the vector scaling and subsequent vector addition

x_{1} 𝒆_{1} + x_{2} 𝒆_{2} + \dots + x_{m} 𝒆_{m}

. A specific meaning is now ascribed to

𝑰 𝒙

by identifying two definitions to one another.

Linear combination.

Definition. (Linear Combination) . The linear combination of vectors $𝒂_{1}, 𝒂_{2}, \dots, 𝒂_{n} \in V$ with scalars $x_{1}, x_{2}, \dots, x_{n} \in S$ in vector space $(V, S, +, \cdot)$ is the vector $𝒃 = x_{1} 𝒂_{1} + x_{2} 𝒂_{2} + \dots x_{n} 𝒂_{n} .$

Matrix-vector product.

Definition. (Matrix-Vector Product) . In the vector space $(V, S, +, \cdot)$ , the product of matrix $𝑨 = [\begin{array}{llll} 𝒂_{1} & 𝒂_{2} & \dots & 𝒂_{n} \end{array}]$ composed of columns $𝒂_{1}, 𝒂_{2}, \dots, 𝒂_{n} \in V$ with the vector $𝒙 \in S_{n}$ whose components are scalars $x_{1}, x_{2}, \dots, x_{n} \in S$ is the linear combination $𝒃 = x_{1} 𝒂_{1} + x_{2} 𝒂_{2} + \dots x_{n} 𝒂_{n} = 𝑨 𝒙 \in V .$

4.2.Linear algebra problem examples

Linear combinations in

E_{2}

and can be combined into a matrix

𝑨 = [\begin{array}{ll} 𝒕 & 𝒏 \end{array}]

. The value of the components

(x_{t}, x_{n})

are the scaling factors and can be combined into a vector

𝒙 = {[\begin{array}{ll} x_{t} & x_{n} \end{array}]}^{T}

. The same force must result irrespective of whether its components are given along the Cartesian axes or the inclined plane directions leading to the equality

Interpret equation (4) to state that the vector

𝒃

could be obtained either as a linear combination of

𝑰

𝒃 = 𝑰 𝒃

, or as a linear combination of the columns of

𝑨

𝒃 = 𝑨 𝒙

. Of course the simpler description seems to be

𝑰 𝒃

for which the components are already known. But this is only due to an arbitrary choice made by a human observer to define the force in terms of horizontal and vertical components. The problem itself suggests that the tangential and normal components are more relevant; for instance a friction force would be evaluated as a scaling of the normal force. The components in this more natural reference system are not known, but can be determined by solving the vector equality

𝑨 𝒙 = 𝒃

, known as a linear system of equations. Procedures to carry this out will be studied in more detail later, but Octave provides an instruction for this common problem, the backslash operator, as in x=A\b.

Linear combinations in

ℛ_{m}

and

𝒞^{0} [0, 2 π)

Define now another periodic function

b (t + 2 π) = b (t)

by repeating the values

b (t) = t (π - t) (2 π - t)

from the interval

[0, 2 π)

on all intervals

[2 p π, 2 (p + 1) π]

, for

p \in ℤ

. The function

b

is not given in terms of the “naturally” periodic functions

\sin (k t)

\cos (k t)

, but could it thus be expressed? This can be stated as seeking a linear combination

b (t) = \sum_{k = 1}^{\infty} x_{k} \sin (k t),

as studied in Fourier analysis. The coefficients

x_{k}

could be determined from an analytical formula involving calculus operations

x_{k} = \frac{1}{π} \int_{0}^{2 π} b (t) \sin (k t) d t,

but we'll seek an approximation using a linear combination of

n

terms

The idea is to sample the column vectors of

A (𝒕)

at the components of the vector

𝒕 = {[\begin{array}{llll} t_{1} & t_{2} & \dots & t_{m} \end{array}]}^{T} \in ℝ^{m}

t_{j} = (j - 1) h

j = 1, 2, \dots, m

h = π / m

. Let

𝒃 = b (𝒕)

, and

𝑨 = A (𝒕)

, denote the so-sampled

b, A

functions leading to the definition of a vector

𝒃 \in ℝ^{m}

and a matrix

𝑨 \in ℝ^{m \times n}

. There are

n

coefficients available to scale the column vectors of

𝑨

, and

𝒃

has

m

components. For

m > n

it is generally not possible to find

𝒙

such that

𝑨 𝒙

would exactly equal

𝒃

, but as seen later the condition to be as close as possible to

𝒃

leads to a well defined solution procedure. This is known as a least squares problem and is automatically applied in the Octave x=A\b instruction when the matrix A is not square. As seen in the following numerical experiment and Figure 3, the approximation is excellent even though the information conveyed by

m = 1000

samples of

b (t)

is now much more efficiently stored in the form chosen for the columns of

𝑨

and the

n = 11

scaling coefficients that are the components of

𝒙

5.Vectors and matrice in data science

The above examples highlight some essential aspects of linear algebra in the context of data science applications.

Data science seeks to extract regularity directly from available data, not necessarily invoking any additional hypotheses. The typical scenario is that immense amounts of data are available, with limited capability of human analysis. In this context it is apparent that the least squares problem is of greater interest than solving a linear system with a square matrix. It should also be clear that while computation by hand of small examples is useful to solidify theroretical concepts, it is essential to become proficient in the use of software that can deal with large data sets, such as Octave.

Vectors and Matrices

1.Quantities

1.1.Numbers

1.2.Quantities described by a single number

1.3.Quantities described by multiple numbers

2.Vectors

2.1.Vector spaces

2.2.Real vector space $ℛ_{m}$

Column vectors.

Row vectors.

Compatible vectors.

2.3.Working with vectors

Ranges.

Visualization.

3.Matrices

3.1.Forming matrices

3.2.Identity matrix

4.Linear combinations

4.1.Linear combination as a matrix-vector product

Linear combination.

Matrix-vector product.

4.2.Linear algebra problem examples

Linear combinations in $E_{2}$ .

Linear combinations in $ℛ_{m}$ and $𝒞^{0} [0, 2 π)$ .

5.Vectors and matrice in data science

Vectors and Matrices

1.Quantities

1.1.Numbers

1.2.Quantities described by a single number

1.3.Quantities described by multiple numbers

2.Vectors

2.1.Vector spaces

2.2.Real vector space ℛm

Column vectors.

Row vectors.

Compatible vectors.

2.3.Working with vectors

Ranges.

Visualization.

3.Matrices

3.1.Forming matrices

3.2.Identity matrix

4.Linear combinations

4.1.Linear combination as a matrix-vector product

Linear combination.

Matrix-vector product.

4.2.Linear algebra problem examples

Linear combinations in E2.

Linear combinations in ℛm and 𝒞0[0,2π).

5.Vectors and matrice in data science

2.2.Real vector space $ℛ_{m}$

Linear combinations in $E_{2}$ .

Linear combinations in $ℛ_{m}$ and $𝒞^{0} [0, 2 π)$ .