Square root of a matrix (Cleve Moler)

Index Home About Blog
Newsgroups: sci.math.num-analysis,comp.soft-sys.matlab
Subject: Re: sqrtm() of a matrix.
From: moler@mathworks.com (Cleve Moler)
Date: 14 Mar 1997 12:20:28 -0500

Abed M. Hammoud, D.sc <abed@oregano.wustl.edu> started this discussion with:

> I am trying to decide on the best algorithm to use to calculate the square
> root of a matrix. My first shot at the problem I tried the following:
>
> for a positive semidefinte matrix (A):
>
> 1- [E D] = eig(A); sqrtm(A) = E * sqrt(D) * E' where D is a diagonal matrix.
> sqrt(D) is formed by taking the square root of the diagonal entries in D.
>
> Another equivilant way is to use:
> 2- [U S V] = svd(A); sqrtm(A) = U * sqrt(S) * V'
>
> 3- Then I looked at the book by Golub and Van Loan, they suggest under the
> matrix square root the following method:
> Use cholesky decomposition to write A as A = G G'
> then use SVD on G, i.e, [u s v] = svd(G), then the square root of
> A is equal to sqrtm(A) = u * s * u'
>
> So my question is which is the most accurate and best way of doing
> the sqrtm operation. ...

I responded with:

> If your matrix is symmetric and positive definite, then your method 1,
> based on eigenvalues and eigenvectors, is the best choice.  Your method 2
> is the same as method 1 for symmetric, positive definite A because V = U
> in this case.  But use of "svd", instead of "eig", does not take advantage
> of symmetry and is more work.  I've never seen your method 3, can't
> find it in Golub and Van Loan, and don't believe it will work.
> ...

Several other people joined the discussion, including my long time
friend and colleague, G. W. (Pete) Stewart <stewart@cs.umd.edu>, who wrote:

> For positive semidefinite matrices, all three methods work in
> principal, the first being the most efficient (howevr, you have
> to set to zero any negative eigenvalues that rounding error might
> produce).
> It is worth noting  many applications can be recast to use
> a Cholesky decomposition instead of the square root.
>
> Cleve was refering to indefinite and nonsymmetric matrices.
> Even matlab (at home, at the office, and even on the beach) can
> have problems.  But I did find something funny.  When I run
> the script
>
> small = 1.e-5;
> for n=2:5
>    a = small*diag(randn(n,1)) + diag(ones(n-1,1),1);
>    b = sqrtm(a);
>    c = a^.5;
>    disp([norm(b*b-a), norm(c*c-a)])
> end
>
> I get the following output (error messages deleted)
>
>    2.2204e-16   5.5511e-17
>    1.5172e-02   2.2555e-10
>    1.4740e+04   4.3826e-06
>    1.2471e+06   8.3380e+01
>
> The exponential function, which I believe is based on the
> eigendecompostion, gives the accuracy one deserves.  But sqrtm blows
> it.
>
> Pete Stewart

Here are some additional thoughts on the overall problem, and then
a response to Pete's "funny" results.

First of all, what is a matrix square root?  If A is any square matrix,
then MATLAB's notion of a square root of A is any matrix S which
satisfies

   S*S = S^2 = A

There are no transposes involved in this definition.  But, as has been
pointed out, there are other matrices which act like a square root.
Any matrix X which satisfies X'*X = A, or X*X' = A might also be thought
of as a square root.  If such an X exists, and A is real, then A must be
symmetric and, in fact, positive definite.  Such matrices have a Cholesky
decomposition.

The classic Cholesky decomposition is lower triangular (see Householder,
The Theory of Matrices in Numerical Analysis)

   L*L' = A

But, it was Pete's idea with LINPACK to use an upper triangular decomposition
so that the Cholesky algorithm would be column oriented.

   R'*R = A

MATLAB inherited this convention; our Cholesky decomposition is upper
triangular.

So, we have at least three different candidates for square root -- S, L and R.

If A is diagonal, with positive diagonal entries, we can take the positive
scalar square roots of the diagonal entries.  The resulting diagonal
matrix could be called S, L or R -- all three are the same in this case.
But we could also put minus signs in front of any of the diagonal entries
and obtain other matrices which are square roots of A.

If A is symmetric and positive definite, but not diagonal, then S, L and R
are not the same.  It is possible, and reasonable, to also make S symmetric
and positive definite.  But L and R certainly aren't symmetric -- they
are triangular.  Here's an example.

    >> A = hilb(4)

    A =
        1.0000    0.5000    0.3333    0.2500
        0.5000    0.3333    0.2500    0.2000
        0.3333    0.2500    0.2000    0.1667
        0.2500    0.2000    0.1667    0.1429

    >> S = sqrtm(A)

    S =
        0.9115    0.3390    0.1927    0.1310
        0.3390    0.3529    0.2475    0.1807
        0.1927    0.2475    0.2411    0.2086
        0.1310    0.1807    0.2086    0.2226

    >> R = chol(A)

    R =
        1.0000    0.5000    0.3333    0.2500
             0    0.2887    0.2887    0.2598
             0         0    0.0745    0.1118
             0         0         0    0.0189

    >> L = R'

    L =
        1.0000         0         0         0
        0.5000    0.2887         0         0
        0.3333    0.2887    0.0745         0
        0.2500    0.2598    0.1118    0.0189

Each of S, L and R satisfy their defining equation to within roundoff error.

For symmetric, positive definite A, MATLAB computes sqrtm(A) with the
first algorithm that Hammond mentions.

    [V,D] = eig(A)
    S = V*diag(sqrt(diag(D)))*V'

Hammond's second algorithm is a more expensive way of computing the same
thing.  I was unsure about Hammond's third algorithm for three reasons.
I looked in the second, not the third, edition of Golub and Van Loan.
There is a confusion between the lower and upper Cholesky decompositions.
And, the algorithm doesn't generalize to nonsymmetric or indefinite matries.

Now let's look at Pete's "funny" results.  His matrices have some random
entries, but one of them might be

    A =
       -1.441e-05            1            0            0
                0   5.7115e-06            1            0
                0            0  -3.9989e-06            1
                0            0            0      6.9e-06

This matrix is not symmetric, certainly not positive definite, and is
dangerously close to:

    A =
                0            1            0            0
                0            0            1            0
                0            0            0            1
                0            0            0            0

This is an instance of sqrtm's worst nightmare.  Its square root does not
exist.  Complex numbers and infinites don't help.  There simply is no
matrix S for which S*S is equal to A.

As Pete's parameter "small" gets even smaller, his matrices approach
ones for which sqrtm must fail.  The results don't exist.  Even when
small = 1.e-5, sqrtm knows it's in trouble and prints out warning
messages which Pete has omitted.

The sqrtm function is based on a clever observation known as Parlett's
algorithm.  It does involve eigenvalues.  In contrast, the expoential
function, expm, does not use eigenvalues or Parlett's algorithm.
You can see the details with

    type sqrtm
    type expm1

It is true that sqrtm could be improved by using a block form of Parlett's
algorithm.  But I'm not sure if that would make Pete happy.  It's
impossible to compute things which don't exist.  It's difficult to
compute things which almost don't exist.

  -- Cleve Moler
  moler@mathworks.com
Index Home About Blog