Stabilized Orthogonal Factorizations

1.Conditioning of linear algebra problems

Recall that the relative condition number of a mathematical problem f:XY characterizes the amplification by f of perturbations in its argument

κ=limε0sup||δx||ε(||f(x+δx)-f(x)||||f(x)||/||δx||||x||).

Linear combination. The basic operation of linear combination 𝑨𝒙, 𝑨m×n, seen as the problem n𝒇m has the condition number

κ=supδx(||𝑨δ𝒙||||𝑨𝒙||/||δ𝒙||||𝒙||)=supδx(||𝑨δ𝒙||||δ𝒙||)||𝒙||||𝑨𝒙||=||𝑨||||𝒙||||𝑨𝒙||.

The matrix norm property ||𝑨𝒚||||𝑨||||𝒚|| can be used to obtain

||𝒙||=||𝑰n𝒙||=||𝑨+𝑨𝒙||||𝑨+||||𝑨𝒙||||𝒙||||𝑨𝒙||||𝑨+||

leading to

κ||𝑨||||𝑨+||=κ(𝑨),

where κ(𝑨) is the condition number of the matrix 𝑨. If 𝑨 is of full rank with m>n, the 2-norm condition number is given by the ratio of largest to smallest singular values.

||𝑨||=σ1,||𝑨+||=1/σnκ(𝑨)=σ1/σn1.

By convention, if 𝑨 is singular, the condition number κ(𝑨)=.

Coordinate transformation. For 𝑨m×m of full rank, the coordinates of vector 𝒃m expressed in the 𝑰 basis can be transformed its coordinates 𝒙m in the 𝑨 basis by solving the linear system 𝑨𝒙=𝑰𝒃, with the solution 𝒙=𝑨-1𝒃 (so written formally, even though the inverse is almost never explicitly computed). This is simply another linear combination of the columns of 𝑨-1, hence the problem 𝒇:mCm, 𝒇(𝒃)=𝑨-1𝒃 again has a condition number bounded by the condition number of the matrix 𝑨.

κ||𝑨-1||||𝑨||=κ(𝑨)=κ(𝑨-1).

Operator perturbation. Instead of changing the input data as above, the linear mapping represented by the matrix 𝑨m×n might itself be perturbed. Two mathematical problems may now be formulated:

  1. For fixed 𝒃m,𝒇:m×nn,𝒇(𝑨,𝒃)=𝑨+𝒃=𝒙. Perturbation of the input 𝑨 induces perturbation of 𝒙 in order for 𝒃 to be kept fixed

    (𝑨+δ𝑨)(𝒙+δ𝒙)=𝒃.

    Using 𝑨𝒙=𝒃, and assuming that δ𝑨δ𝒙 is negligible gives

    𝑨δ𝒙+δ𝑨𝒙=𝟎δ𝒙=-𝑨+δ𝑨𝒙,

    hence the relative condition number is

    κ=||𝑨+δ𝑨𝒙||||𝒙||||𝑨||||δ𝑨||||𝑨+||||δ𝑨𝒙||||𝒙||||𝑨||||δ𝑨||||𝑨+||||δ𝑨||||𝒙||||𝒙||||𝑨||||δ𝑨||=κ(𝑨).

For all above linear algebra problems the condition number is bounded by the associated matrix condition number. Unitary matrices 𝑸m×m have κ(𝑸)=1, and define an orthonormal basis for m. A rank-deficient matrix 𝒁m×m has κ(𝒁)=, and corresponds to a linearly dependent vector set {𝒛1,,𝒛m}. The behavior of many numerical approximation procedures based upon linear combinations is determined by condition number of the basis set.

Monomial basis with uniform sampling. Sampling the monomial basis on interval [a,b] at ti=ih+a,i=0,m, h=(b-a)/(m-1) leads to the Vandermonde matrix

𝑽=[ 𝟏 𝒕 𝒕m ],

an extremely ill-conditioned matrix (Fig. ). This can readily be surmised from the example a=0, b=1, in which case for large m the last columns of 𝑽 become ever more colinear to the same 𝒆m vector. Series expansions based on the monomials such as the Taylor series

f(t)=f(0)+f'(0)t++f(n)(0)n!tn+

are highly sensitive to pertubations, small changes in f(t) lead to large changes in the coordinates {f(0),f'(0),}.

function Vandermonde(a,b,m)
  t=LinRange(a,b,m); v=ones(m,1); V=copy(v)
  for j=2:m
    v = v .* t; V=[V v]
  end
  return V
end;

Monomial basis with Chebyshev sampling. Changing the sampling so that points are clustered towards the interval endpoints reduces the condition number at fixed number of sampling points m, but the same limiting behavior for large m is obtained.

function VandermondeC(m)
  δ=π/(2*m); ϴ=LinRange(δ,π-δ,m)
  t=cos.(ϴ)
  v=ones(m,1); V=copy(v)
  for j=2:m
    v = v .* t; V=[V v]
  end
  return V
end;

Triangular basis with uniform sampling. LU-factorization of the monomial basis leads to a different family of polynomials, known as a triangular basis

{1,t-x1,(t-x1)(t-x2),,(t-x1)(t-xm-1)},

where {x1,,xm} are known as the nodes of the system. These can be chosen to uniformly sample an interval. As to be expected, applying a sequence of non-unitary linear transformations onto an ill-conditioned basis yields even worse conditioning.

function Triangular(a,b,m)
  x=LinRange(a,b,m); T=ones(m,1); Tj=copy(T); t=copy(x)
  for j=2:m
    Tj = Tj .* (t .- x[j-1]); T=[T Tj]
  end
  return T
end;

Triangular basis with Chebyshev sampling. Adopting Chebyshev sampling ameliorates the conditioning, but only marginally.

Figure 1. Monomial basis with: (o) uniform sampling, (x) Chebyshev sampling. Triangular basis with: (+) uniform sampling, (*) Chebyshev sampling.

2.Orthogonal factorization through Householder reflectors

The Gram-Schmidt procedure constructs an orthogonal factorization by linear combinations of the column vectors of 𝑨m×n, mn, rank(𝑨)=n

𝑨𝑹1𝑹2𝑹n=𝑸𝑨=𝑸𝑹,𝑹=𝑹n-1𝑹1-1.

In exact arithmetic C(𝑸)=C(𝑨) by construction, and κ(𝑸)=1, but the sequence of multiplications with 𝑹1,,𝑹n might amplify perturbations in 𝑨 (due for example to floating point representation errors or inherent uncertainty in knowledge of 𝑨). The problem 𝒇:m×nCm×n×n×n, 𝑨𝒇𝑸,𝑹 has condition number

κ=||δ𝑸||||𝑸||||𝑨||||δ𝑨||+||δ𝑹||||𝑹||||𝑨||||δ𝑨||,

and numerical experimentation (Fig. 2) readily exhibits large condition numbers.

An alternative approach is to obtain an orthogonal factorization through unitary transformations

𝑸n𝑸1𝑨=𝑹𝑨=𝑸𝑹,𝑸=𝑸1𝑸n.

Unitary transformations do not modify vector 2-norms or relative orientations

||𝑸𝒙||2=𝒙𝑸𝑸𝒙=||𝒙||2,(𝑸𝒚)(𝑸𝒙)=𝒚𝒙,

and are hence said to be isometric. In Euclidean space reflections and rotations are isometric.

Figure 2. QR-conditioning: (o) modified Gram-Schmidt, (x) Householder.

Construction of an isometric reflection transformation suitable for a QR factorization is represented in Fig. 3. Let vector 𝒙m+1-k represent the portion of the kth column from the diagonal downwards in stage k of reduction of 𝑨m×n to upper triangular form

𝑸k-1𝑸1𝑨=[ 𝑹 𝑪 𝟎 𝑩 ],𝑩=[ 𝒙 𝒃2 𝒃n-k ].

The next stage of in reduction to upper triangular form is the isometric transformation of 𝒙 into ±||𝒙||𝒆1, with 𝒆1m+1-k the unit vector along the first direction. With 𝒗=±||𝒙𝒆1||-𝒙, 𝒒=𝒗/||𝒗||, the projection of 𝒙 onto the span of 𝒗, C(𝒗) is

𝒚=𝑷𝒗𝒙=𝒒𝒒𝒙,

and its complementary projector onto N(𝒗) is

𝒛=𝑷𝒗=(𝑰-𝒒𝒒)𝒙.

The reflector transforming 𝒙 into ±||𝒙||𝒆1 is obtained by doubling the above displacements, and is known as a Householder reflector

𝑯=𝑰-2𝒒𝒒.

Of the two possibilities ±||𝒙||𝒆1, the choice

𝒗=-sign(x1)||𝒙||𝒆1-𝒙,

avoids loss of floating accuracy 𝒙||𝒙||𝒆1. For 𝒙m+1-k, sign(x1)=exp(arg(x1)).

Figure 3. Geometry of Householder reflector

The resulting Householder QR-factorization is given

Input: 𝑨m×n

𝑸=𝟎m,n

for k=1:n

𝒙=𝑨[k:m,k]

𝒗=sign(x1)||𝒙||+𝒙

𝒒=𝒗/||𝒗||; 𝑸[k:m,k]=𝒒

for j=k:n

𝑨[k:m,j]=𝑨[k:m,j]-2𝒒(𝒒𝑨[k:m,j])

function HouseholderQR(A)
  m,n=size(A)
  Q=zeros(m,n); R=copy(A)
  for k=1:n
    x=R[k:m,k]
    e1=zeros(size(x)); e1[1]=1
    v=sign(x[1])*norm(x)*e1+x
    q=v/norm(v); Q[k:m,k]=q
    for j=k:n
      aj=R[k:m,j]; c=2*q'*aj
      R[k:m,j]=aj.-c*q
    end
  end
  return Q,R
end;

Note that the above implementation does not return the 𝑸 matrix, but rather the 𝑸1,,𝑸n reflectors from which 𝑸 can be reconstructed if needed. Usually though, the 𝑸 matrix itself is not required, but rather the product 𝑸𝒖 which can readily be evaluated as 𝑸n𝑸1𝒖. The Householder reflector algorithm is typically the default procedure in QR-factorizations implemented in software systems, and as seen in (Fig. 2), leads to much better conditioning.

3.Orthogonal factorization through Given rotators

An alternative approach to orthogonal factorization utilizes isometric rotation transformations of the form

𝑹(i,k,θ)=𝑰+(cosθ-1)(𝒆i𝒆i+𝒆k𝒆k)-sinθ(𝒆i𝒆k-𝒆k𝒆i),

with the rotation angle θ chosen to nullify the subdiagonal element (i,k), i>k

(𝑹(i,k,θ)𝑨)ik=akksinθ+aikcosθ=0θik=arctan(-aikakk).

Composition of repeated rotations 𝑸ik=𝑹(i,k,θik) can be organized to lead to an upper triangular matrix

𝑸mn𝑸32𝑸m1𝑸31𝑸21𝑨=𝑹.

Whereas Householder reflectors work on entire vectors, Givens rotators nullify individual subdiagonal elements. For full matrices Householder reflectors typically require fewer floating point operations, but the special structure of a sparse matrix is better exploited by use of Givens rotators.

Input: 𝑨m×n

𝑸=𝟎m,n

for k=1:n

for i=k+1:m

θ=arctan(-aik/akk)

c=cos(θ); s=sin(θ)

for j=k:n

u=akj; v=aij

akj=cu-sv

aij=su+cv

function GivensQR(A)
  m,n=size(A)
  Q=zeros(m,n); R=copy(A)
  for k=1:n
    for i=k+1:m
      θ = atan(-R[i,k],R[k,k]); Q[i,k]=
      c = cos(θ); s = sin(θ)
      for j=k:n
        u = R[k,j]; v = R[i,j]
        R[k,j]=c*u-s*v
        R[i,j]=s*u+c*v
      end
    end
  end
  return Q,R
end;

As in the Householder implementation the above implementation returns data to reconstruct 𝑸 if needed.