Vectors and Matrices

Synopsis. Data science arises from the need to organize massive amounts of data into meaningful insights into some natural or social process. There are many ways to do so such as lists, trees, clusters. Linear algebra concentrates on grouping quantifiable data into sets of fixed length, known as vectors. Multiple vectors are subsequently grouped as matrices, and a formalism is established to work with vectors and matrices.

1.Numbers

1.1.Number sets

Several types of numbers have been introduced in mathematics to express different types of quantities, and the following will be used throughout this text:

The set of natural numbers, ={0,1,2,3,}, infinite and countable, +={1,2,3,};

The set of integers, ={0,±1,±2,±3,}, infinite and countable;

The set of rational numbers ={p/q,p,q+}, infinite and countable;

The set of real numbers, infinite, not countable, can be ordered;

The set of complex numbers, ={x+iy,x,y}, infinite, not countable, cannot be ordered.

These sets of numbers form a hierarchy, with . The size of a set of numbers is an important aspect of its utility in describing natural phenomena. The set S={Mary,Jane,Tom} has three elements, and its size is defined by the cardinal number, |S|=3. The sets ,,,, have an infinite number of elements, but the relation

z={ -n/2 forneven (n+1)/2 fornodd .

defines a one-to-one correspondence between n and z, so these sets are of the same size denoted by the transfinite number 0 (aleph-zero). The rationals can also be placed into a one-to-one correspondence with , hence

||=||=||=0.

In contrast there is no one-to-one mapping of the reals to the naturals, and the cardinality of the reals is ||=𝔠 (Fraktur-script c). Intuitively, there are exponentially more reals than naturals, formally stated in set theory as 𝔠=20.

1.2.Quantities described by a single number

The above numbers and their computer approximations are sufficient to describe many quantities encountered in applications. Typical examples include:

In most disciplines, there is a particular interest in comparison of two quantities, and to facilitate such comparison a common reference is used known as a standard unit. For measurement of a length L, the meter =1m is a standard unit, as in the statement L=10m, that states that L is obtained by taking the standard unit ten times, L=10. The rules for carrying out such comparisons are part of the definition of real and rational numbers.

1.3.Quantities described by multiple numbers

Other quantities require more than a single number. The distribution of population in the year 2000 among the alphabetically-ordered South American countries (Argentina, Bolivia,..,Venezuela) requires 12 numbers. These are placed together in a list known in mathematics as a tuple, in this case a 12-tuple P=(p1,p2,,p12), with p1 the population of Argentina, p2 that of Bolivia, and so on. An analogous 12-tuple can be formed from the South American populations in the year 2020, say Q=(q1,q2,,q12). Note that it is difficult to ascribe meaning to apparently plausible expressions such as P+Q since, for instance, some people in the 2000 population are also in the 2020 population, and would be counted twice.

2.Vectors

2.1.Vector spaces

In contrast to the population 12-tuple example above, combining multiple numbers is well defined in operations such as specifying a position within a three-dimensional Cartesian grid, or determining the resultant of two forces in space. Both of these lead to the consideration of 3-tuples or triples such as the force (f1,f2,f3). When combined with another force (g1,g2,g3) the resultant is (f1+g1,f2+g2,f3+g3). If the force (f1,f2,f3) is amplified by the scalar α and the force (g1,g2,g3) is similarly scaled by β, the resultant becomes

α(f1,f2,f3)+β(g1,g2,g3)=(αf1,αf2,αf3)+(βg1,βg2,βg3)=(αf1+βg1,αf2+βg2,αf3+βg3).

It is useful to distinguish tuples for which scaling and addition is well defined from simple lists of numbers. In fact, since the essential difference is the behavior with respect to scaling and addition, the focus should be on these operations rather than the elements of the tuple.

The above observations underlie the definition of a vector space 𝒱 by a set V whose elements satisfy scaling and addition properties, denoted all together by the 4-tuple 𝒱=(V,S,+,). The first element of the 4-tuple is a set whose elements are called vectors. The second element is a set of scalars, and the third is the vector addition operation. The last is the scaling operation, seen as multiplication of a vector by a scalar. Vector addition and scaling operations must satisfy rules suggested by positions or forces in three-dimensional space, which are listed in Table 1. In particular, a vector space requires definition of two distinguished elements: the zero vector 𝟎V, and the identity scalar element 1S.

Addition rules for 𝒂,𝒃,𝒄V
𝒂+𝒃V Closure
𝒂+(𝒃+𝒄)=(𝒂+𝒃)+𝒄 Associativity
𝒂+𝒃=𝒃+𝒂 Commutativity
𝟎+𝒂=𝒂 Zero vector
𝒂+(-𝒂)=𝟎 Additive inverse
Scaling rules for 𝒂,𝒃V, x,yS
x𝒂V Closure
x(𝒂+𝒃)=x𝒂+x𝒃 Distributivity
(x+y)𝒂=x𝒂+y𝒂 Distributivity
x(y𝒂)=(xy)𝒂 Composition
1𝒂=𝒂 Scalar identity

Table 1. Vector space 𝒱=(V,S,+,) properties.

The definition of a vector space reflects everyday experience with vectors in Euclidean geometry, and it is common to refer to such vectors by descriptions in a Cartesian coordinate system. For example, a position vector 𝒓 within the plane can be referred through the pair of coordinates (x,y). This intuitive understanding can be made precise through the definition of a vector space 2=(2,,+,), called the real 2-space. Vectors within 2 are elements of 2=×={(x,y)|.x,y}, meaning that a vector is specified through two real numbers, 𝒓(x,y). Addition of two vectors, 𝒒(s,t), 𝒓(x,y) is defined by addition of coordinates 𝒒+𝒓=(s+x,t+v). Scaling 𝒓(x,y) by scalar a is defined by a𝒓(ax,ay). Similarly, consideration of position vectors in three-dimensional space leads to the definition of the 3=(3,,+,), or more generally a real m-space m=(m,,+,), m, m>0.

2.2.Real vector space m

Column vectors.
Since the real spaces m=(m,,+,) play such an important role in themselves and as a guide to other vector spaces, familiarity with vector operations in m is necessary to fully appreciate the utility of linear algebra to a wide range of applications. Following the usage in geometry and physics, the m real numbers that specify a vector 𝒖m are called the components of 𝒖. The one-to-one correspondence between a vector and its components 𝒖(u1,,um), is by convention taken to define an equality relationship,

𝒖=[ u1 um ], (1)

with the components arranged vertically and enclosed in square brackets. Given two vectors 𝒖,𝒗m, and a scalar a, vector addition and scaling are defined in m by real number addition and multiplication of components

𝒖+𝒗=[ u1 um ]+[ v1 vm ]=[ u1+v1 um+vm ],a𝒖=a[ u1 um ]=[ au1 aum ]. (2)

The vector space m is defined using the real numbers as the set of scalars, and constructing vectors by grouping together m scalars, but this approach can be extended to any set of scalars S, leading to the definition of the vector spaces 𝒮n=(Sn,S,+,). These will often be referred to as n-vector space of scalars, signifying that the set of vectors is V=Sn. Note that a vector with a single component, 𝒗1, is identical to a scalar

𝒗=[v1]v1,

and the real line =(,,+,) is a simple example of a vector space.

To aid in visual recognition of vectors, the following notation conventions are introduced:

In Julia, successive components placed vertically are separated by a semicolon.

[1; 2; -1; 2]

[ 1 2 -1 2 ] (3)

The equal sign in mathematics signifies a particular equivalence relationship. In computer systems such as Julia the equal sign has the different meaning of assignment, that is defining the label on the left side of the equal sign to be the expression on the right side. Subsequent invocation of the label returns the assigned object. Components of a vector are obtained by enclosing the index in parantheses.

u=[1; 2; -1; 2]; u

[ 1 2 -1 2 ] (4)

Row vectors.
Instead of the vertical placement or components into one column, the components of could have been placed horizontally in one row [ u1 um ], that contains the same data, differently organized. By convention vertical placement of vector components is the preferred organization, and 𝒖 shall denote a column vector henceforth. A transpose operation denoted by a T superscript is introduced to relate the two representations

𝒖T=[ u1 um ],

and 𝒖T is the notation used to denote a row vector. In Julia, horizontal placement of successive components in a row is denoted by a space.

uT=transpose(u)

[ 1 2 -1 2 ] (5)

Compatible vectors.
Addition of real vectors 𝒖,𝒗m defines another vector 𝒘=𝒖+𝒗m. The components of 𝒘 are the sums of the corresponding components of 𝒖 and 𝒗, wi=ui+vi, for i=1,2,,m. Addition of vectors with different number of components is not defined, and attempting to add such vectors produces an error. Such vectors with different number of components are called incompatible, while vectors with the same number of components are said to be compatible. Scaling of 𝒖 by a defines a vector 𝒛=a𝒖, whose components are zi=aui, for i=1,2,,m. Vector addition and scaling in Julia are defined using the + and operators.

uT=[1 0 1 2]; vT=[2 1 3 -1]; wT=uT+vT

[ 3 1 4 1 ] (6)

rT=[1 2]; uT+rT

DimensionMismatch

2.3.Working with vectors

Ranges.
The vectors used in applications usually have a large number of components, m1, and it is important to become proficient in their manipulation. Previous examples defined vectors by explicit listing of their m components. This is impractical for large m, and support is provided for automated generation for often-encountered situations. First, observe that Table 1 mentions one distinguished vector, the zero element that is a member of any vector space 𝟎V. The zero vector of a real vector space m is a column vector with m components, all of which are zero, and a mathematical convention for specifying this vector is 𝟎T=[ 0 0 0 ]m. This notation specifies that transpose of the zero vector is the row vector with m zero components, also written through explicit indexing of each component as 𝟎i=0, for i=1,,m. Keep in mind that the zero vector 𝟎 and the zero scalar 0 are different mathematical objects. The ellipsis symbol in the mathematical notation is transcribed in Julia by the notion of a range, with 1:m denoting all the integers starting from 1 to m, organized as a row vector. The notation is extended to allow for strides different from one, and the mathematical ellipsis i=m,m-1,,1 is denoted as m:-1:1. In general r:s:t denotes the set of numbers {r,r+s,,r+ns} with r+nst, and r,s,t real numbers and n a natural number, r,s,t, n. If there is no natural number n such that r+nst, an empty vector with no components is returned.

m=4; 1:m

[ 1 2 3 4 ] (7)

m:-1:2

[ 4 3 2 ] (8)

r=0; s=0.2; t=1; (r:s:t)'

[ 0.0 0.2 0.4 0.6 0.8 1.0 ] (9)

r=0; s=0.3; t=1; (r:s:t)'

[ 0.0 0.3 0.6 0.9 ] (10)

r=0; s= -0.2; t=1; (r:s:t)'

[ ] (11)

Visualization.
A component-by-component display of a vector becomes increasingly unwieldy as the number of components m becomes large. For example, the numbers below seem an inefficient way to describe the sine function.

t=LinRange(0,1.5,10); y=sin.(t)

[ 0.0 0.16589613269341502 0.3271946967961522 0.479425538604203 0.618369803069737 0.7401768531960371 0.8414709848078965 0.9194449792537551 0.9719379013633127 0.9974949866040544 ] (12)

Indeed, such a piece-by-piece approach is not the way humans organize large amounts of information, preferring to conceptualize the data as some other entity: an image, a sound excerpt, a smell, a taste, a touch, a sense of balance, or relative position. All seven of these human senses will be shown to allow representation by linear algebra concepts, including representation by vectors.

clf(); t=LinRange(0,2*π,180); y=sin.(t);
plot(t,y); grid("on"); xlabel("t"); ylabel("y");
cd(homedir()*"/courses/MATH347DS/images");
savefig("L01Fig01.eps");

Figure 1. Sample plot

3.Matrices

3.1.Forming matrices

The real numbers themselves form the vector space 1=(,,+,), as does any field of scalars, 𝒮1=(S,S,+,). Juxtaposition of m real numbers has been seen to define the new vector space m. This process of juxtaposition can be continued to form additional mathematical objects. A matrix is defined as a juxtaposition of compatible vectors. As an example, consider n vectors 𝒂1,𝒂2,,𝒂nV within some vector space 𝒱=(V,S,+,). Form a matrix by placing the vectors into a row,

𝑨=[ 𝒂1 𝒂2 𝒂n ]. (13)

To aid in visual recognition of a matrix, upper-case bold Latin letters will be used to denote matrices. The columns of a matrix will be denoted by the corresponding lower-case bold letter with a subscripted index as in equation (13). Note that the number of columns in a matrix can be different from the number of components in each column, as would be the case for matrix 𝑨 from equation (13) when choosing vectors from, say, the real space m, 𝒂1,𝒂2,,𝒂nm.

A vector is identical to a matrix with a single column vector

𝑨=[𝒂1]𝒂1,

and a scalar is identical to a mtrix with a single column vector that has just one component

𝑨=[a11]a11.

3.2.Addition, scaling, transposition of matrices

Since matrices are simply juxtaposition of column vectors, the addition and scaling rules for vectors are carried over for matrices 𝑨,𝑩m×n

𝑨+𝑩=[ 𝒂1 𝒂2 𝒂n ]+[ 𝒃1 𝒃2 𝒃n ], (14)
α𝑨=α[ 𝒂1 𝒂2 𝒂n ]=[ α𝒂1 α𝒂2 α𝒂n ]. (15)

Similarly, the transpose of a matrix turns column vectors into row vectors

𝑨=[ 𝒂1 𝒂2 𝒂n ]m×n,𝑨T=[ 𝒂1T 𝒂2T 𝒂nT ]n×m. (16)

In terms of components

𝑨=[ a1,1 a1,2 a1,n a2,1 a2,2 a2,n am,1 am,2 am,n ]m×n,𝑨T=[ a1,1 a2,1 am,1 a1,2 a2,2 am,2 a1,n a2,n am,n ]n×m.

It is readily verified that 𝑨+𝑩=𝑩+𝑨, (𝑨+𝑩)+𝑪=𝑨+(𝑩+𝑪), (𝑨+𝑩)T=𝑨T+𝑩T.

3.3.Identity matrix

Consider first 1, the vector space of real numbers. A position vector 𝒓1 on the real axis is specified by a single scalar component, 𝒓=[x], x. Read this to mean that the position 𝒓 is obtained by traveling x units from the origin at position vector 𝟎=[0]. Look closely at what is meant by “unit” in this context. Since x is a scalar, the mathematical expression 𝒓=𝟎+x has no meaning, as addition of a vector to a scalar has not been defined. Recall that scalars were introduced to capture the concept of scaling of a vector, so in the context of vector spaces they always appear as multiplying some vector. The correct mathematical description is 𝒓=𝟎+x𝒆, where 𝒆 is the unit vector 𝒆=[1]. Taking the components leads to r1=01+xe1, where r1,01,e1 are the first (and in this case only) components of the 𝒓,𝟎,𝒆 vectors. Since r1=x, 01=0, e1=1, one obtains the identity x=0+x1.

Now consider 2, the vector space of positions in the plane. Repeating the above train of thought leads to the identification of two direction vectors 𝒆1 and 𝒆2

𝒓=[ x y ]=x[ 1 0 ]+y[ 0 1 ]=x𝒆1+y𝒆2,𝒆1=[ 1 0 ],𝒆2=[ 0 1 ].
x=2; y=4; e1=[1; 0]; e2=[0; 1]; r=x*e1+y*e2

[ 2 4 ] (17)

Continuing the procees to m gives

𝒙=[ x1 x2 xm ]=x1𝒆1+x2𝒆2++xm𝒆m,𝒆1=[ 1 0 0 0 ],𝒆2=[ 0 1 0 0 ],,𝒆m=[ 0 0 0 1 ].

For arbitrary m, the components are now x1,x2,,xm rather than the alphabetically ordered letters common for m=2 or m=3. It is then consistent with the adopted notation convention to use 𝒙m to denote the position vector whose components are (x1,,xm). The basic idea is the same as in the previous cases: to obtain a position vector scale direction 𝒆1 by x1, 𝒆2 by x2,..., 𝒆m by xm, and add the resulting vectors.

Juxtaposition of the vectors 𝒆1,𝒆2,,𝒆m leads to the formation of a matrix of special utility known as the identity matrix

𝑰=[ 𝒆1 𝒆2 𝒆m ].

The identity matrix is an example of a matrix in which the number of column vectors n is equal to the number of components in each column vector m=n. Such matrices with equal number of columns and rows are said to be square. Due to entrenched practice an exception to the notation convention is made and the identity matrix is denoted by 𝑰, but its columns are denoted the indexed bold-face of a different lower-case letter, 𝒆1,,𝒆m. If it becomes necessary to explicitly state the number of columns in 𝑰, the notation 𝑰m is used to denote the identity matrix with m columns, each with m components.

4.Linear combinations

4.1.Linear combination as a matrix-vector product

The expression 𝒙=x1𝒆1+x2𝒆2++xm𝒆m expresses the idea of scaling vectors within a set and subsequent addition to form a new vector 𝒙. The matrix 𝑰=[ 𝒆1 𝒆2 𝒆m ] groups these vectors together in a single entity, and the scaling factors are the components of the vector 𝒙. To bring all these concepts together it is natural to consider the notation

𝒙=𝑰𝒙,

as a generalization of the scalar expression x=1x. It is clear what the operation 𝑰𝒙 should signify: it should capture the vector scaling and subsequent vector addition x1𝒆1+x2𝒆2++xm𝒆m. A specific meaning is now ascribed to 𝑰𝒙 by identifying two definitions to one another.

Linear combination.
Repeateadly stating “vector scaling and subsequent vector addition” is unwieldy, so a special term is introduced for some given set of vectors {𝒂1,,𝒂n}.

Definition. (Linear Combination) . The linear combination of vectors 𝒂1,𝒂2,,𝒂nV with scalars x1,x2,,xnS in vector space (V,S,+,) is the vector 𝒃=x1𝒂1+x2𝒂2+xn𝒂n.

Matrix-vector product.
Similar to the grouping of unit vectors 𝒆1,,𝒆m into the identity matrix 𝑰, a more concise way of referring to arbitrary vectors 𝒂1,,𝒂n from the same vector space is the matrix 𝑨=[ 𝒂1 𝒂2 𝒂n ]. Combining these observations leads to the definition of a matrix-vector product.

Definition. (Matrix-Vector Product) . In the vector space (V,S,+,), the product of matrix 𝑨=[ 𝒂1 𝒂2 𝒂n ] composed of columns 𝒂1,𝒂2,,𝒂nV with the vector 𝒙Sn whose components are scalars x1,x2,,xnS is the linear combination 𝒃=x1𝒂1+x2𝒂2+xn𝒂n=𝑨𝒙V.

4.2.Linear algebra problem examples

Linear combinations in E2.
Consider a simple example that leads to a common linear algebra problem: decomposition of forces in the plane along two directions. Suppose a force is given in terms of components along the Cartesian x,y-axes, 𝒃=bx𝒆x+by𝒆y, as expressed by the matrix-vector multiplication 𝒃=𝑰𝒃. Note that the same force could be obtained by linear combination of other vectors, for instance the normal and tangential components of the force applied on an inclined plane with angle θ, 𝒃=xt𝒆t+xn𝒆n, as in Figure 2. This defines an alternate reference system for the problem. The unit vectors along these directions are

𝒕=[ cosθ sinθ ],𝒏=[ -sinθ cosθ ],

and can be combined into a matrix 𝑨=[ 𝒕 𝒏 ]. The value of the components (xt,xn) are the scaling factors and can be combined into a vector 𝒙=[ xt xn ]T. The same force must result irrespective of whether its components are given along the Cartesian axes or the inclined plane directions leading to the equality

𝑰𝒃=𝒃=𝑨𝒙. (18)

Interpret equation (18) to state that the vector 𝒃 could be obtained either as a linear combination of 𝑰, 𝒃=𝑰𝒃, or as a linear combination of the columns of 𝑨, 𝒃=𝑨𝒙. Of course the simpler description seems to be 𝑰𝒃 for which the components are already known. But this is only due to an arbitrary choice made by a human observer to define the force in terms of horizontal and vertical components. The problem itself suggests that the tangential and normal components are more relevant; for instance a friction force would be evaluated as a scaling of the normal force. The components in this more natural reference system are not known, but can be determined by solving the vector equality 𝑨𝒙=𝒃, known as a linear system of equations. Procedures to carry this out will be studied in more detail later, but Julia provides an instruction for this common problem, the backslash operator, as in x=A\b.

ex=[1; 0]; ey=[0; 1]; b=[0.2; 0.4]; I=[ex ey]; I*b

[ 0.2 0.4 ] (19)

θ=π/6; c=cos(θ); s=sin(θ); tvec=[c; s]; nvec=[-s; c];
A=[tvec nvec]

[ 0.8660254037844387 -0.49999999999999994 0.49999999999999994 0.8660254037844387 ] (20)

x=A\b

[ 0.37320508075688774 0.2464101615137755 ] (21)

x[1]*tvec+x[2]*nvec

[ 0.2 0.4 ] (22)

Figure 2. Alternative decompositions of force on inclined plane.

Note from the above calculations how the same vector is obtained by two different linear combinations

𝒃=[ 0.2 0.4 ]=0.2[ 1 0 ]+0.4[ 0 1 ]=0.373[ 0.866 0.500 ]+0.246[ -0.500 0.866 ].

The general problem of determining what description is more insightful is a key question arising in linear algebra applications.

4.3.Multiple linear combinations: matrix products

The expression 𝒃=𝑨𝒙 with 𝑨m×n gives 𝒃 as a linear combination of the n columns of 𝑨. It is often the case that multiple linear combinations are sought, say p linear combinations 𝒃1,,𝒃p of the vectors 𝒂1,𝒂2,,𝒂n with scaling coefficients given by the vectors 𝒙1,,𝒙p . Instead of explicitly writing out each linear combination

𝒃1=𝑨𝒙1,,𝒃p=𝑨𝒙p,

it is convenient to group all the scaling coefficient vectors into a matrix 𝑿=[ 𝒙1 𝒙p ]n×p, and all the resulting linear combinations into another matrix 𝑩=[ 𝒃1 𝒃p ]m×p, and state that

𝑩=[ 𝒃1 𝒃p ]=[ 𝑨𝒙1 𝑨𝒙p ]=𝑨𝑿,

thus defining matrix-matrix multiplication as the juxtaposition of several matrix-vector multiplications. Note that 𝑨m×n must have as many columns as 𝑿n×p has rows, and the resulting matrix 𝑩m×p has as many rows as 𝑨 and as many columns as 𝑿.

The above definition of the matrix products can also be described in terms of components as

bi,j=ai,1x1,j+ai,2x2,j++ai,nxn,j=k=1nai,kxk,j,i=1,2,,m;j=1,2,,p,

often memorized as the “rows over columns rule”

𝑩=𝑨𝑿=[ a1,1 a1,2 a1,n a2,1 a2,2 a2,n am,1 am,2 am,n ][ x1,1 x1,2 x1,p x2,1 x2,2 x2,p xn,1 xn,2 xn,p ],b2,1=a2,1x1,1+a2,2x2,1++a2,nxn,1.

5.Vectors and matrices in data science

The above examples highlight some essential aspects of linear algebra in the context of data science applications.

Data science seeks to extract regularity directly from available data, not necessarily invoking any additional hypotheses. The typical scenario is that immense amounts of data are available, with limited capability of human analysis. In this context it is apparent that the least squares problem is of greater interest than solving a linear system with a square matrix. It should also be clear that while computation by hand of small examples is useful to solidify theroretical concepts, it is essential to become proficient in the use of software that can deal with large data sets, such as Julia.