"eig did not converge" in prox_trace

SB7 · June 29, 2015, 7:56pm

I am running a MATLAB simulation in which I use TFOCS to solve a couple of convex programs including a trace norm minimization on PSD matrices. While in most of the trials the code works just fine, occasionally it breaks and I receive the error message: “EIG did not converge” (or something like that). It refers to line 69 of prox_trace where an eigendecomposition is performed.

I suspect there is some inherent problem with eig function. I would appreciate it if someone tells me what might be causing this issue and perhaps how to fix it.

SB7 · June 29, 2015, 10:09pm

Hi @Stephen_Becker and @mcg

I would like to suggest to avoid using eig for eigendecomposition of Hermitian matrices (e.g., in lines 67 and 69 in prox_trace.m). Instead of eig you can use svd and then multiply each singular value by the sign of the inner product of its associated left and right singular vectors.

Thanks.

Mark_L_Stone · June 30, 2015, 2:03am

Eig is reliable in my experience. Are you sure it’s not eigs which is the culprit?

–> EDIT: However, googling shows me that this error message does occasionally occur with eig. <—

eig is called in line 67 or 69 of TFOCS version 1.3.1 of prox_trace.m if largescale is false. If largescale is true and K <= min(M,N)/2, eigs is called in line 100, which is:

 [V,D] = eigs( X, K, SIGMA, opts );

eigs is very unreliable in my experience. You can increase opts.,maxit or opts.tol, but it still can not be relied on to converge at all points in the middle of an optimization. If you look at the code, you can see some finagling with opts.tol, but opts.maxit is left at its default value of 300.

You can increase opts.maxit, but my suggestion is to call eig instead of eigs. I.e., my recommendation is to call eig rather than eigs regardless of the value of largescale.

I occasionally use eigs interactively. I learned my lesson and stopped using it for evaluation of objective function, gradient, Hessian, constraints, Jacobiian, etc. within an optimization.

If indeed the error message refers to line 69, and it is this same version of prox_trace.m, that does sound damning of eig rather than eigs, but my advice to avoid eigs still stands. If eig is failing, perhaps you could capture one or more matrices on which it is failing and post them in a mat file? I’m willing to change my mind about the reliability of eig, and if it fails, perhaps that should be brought to the attention of the MATLAB developers, although there are apparently some examples of this happening, but I have never seen the matrices in question.

As far as I know, eig is deterministic, i.e., will produce exactly the same solution given the exact same input, and this is true even if parallel threads are employed. If the starting vector opts.vo is not provided, eigs is NOT deterministic, i.e…, the solution depends on random numbers which are drawn.

Stephen_Becker · June 30, 2015, 3:08pm

Mark’s answer has good information. I wouldn’t necessarily go with eig over eigs in all cases though, since I don’t view them as competing except in very small “overlap” regime (matrices about 500 x 500 to 2000 x 2000, say). Larger than that, and eig is just too slow; smaller, and eigs is less accurate and slower. Both implementations that Matlab uses are standard, using LAPACK and ARPACK. Finding good tolerances in eigs is not easy; you have to re-orthogonalize appropriately. We use propack to do our iterative svd’s, and you could use this to do eigenvalues (asymptotically, the cost is no different, but you’d probably be 4x slower).

If the error is actually from eigs, then I’d say ignore it, because if it is occasional, we can tolerate the small errors.

If it is from eig, then the returned solution could be catastrophically different than the true solution, and that’d be an issue, so I would follow Mark’s suggestion and try to record the matrix when that happens, and then we can help troubleshoot. In that case, I don’t think it’s a TFOCS issue but some deep issue, and maybe people at stackexchange will have an idea.

SB7 · June 30, 2015, 3:44pm

@Mark_L_Stone @Stephen_Becker
It is not eigs. The error occurs with eig as used in lines 67 and 69 prox_trace.m MATLAB function. As eigendecomposition involves matrix inverses, I can imagine that it can be unstable for some matrices. However, the eigendecomposition in prox_trace is for a Hermitian matrix and not for any arbitrary matrix. As I suggested above, for these matrices we can use svd to get the eigendecomposition without encountering the unreliability of eig.

It’s not a TFOCS issue in the sense that eig doesn’t work as it is supposed to work ideally. However, it is a TFOCS issue in the sense that you don’t need to use eig and you can use svd in your implementation for reliability.

Just to clarify things, for a Hermitian matrix X instead of [V,D] = eig(X); you can use:
[V,D,W] = svd(X); D = D.*sign(diag(real(dot(V,W,1)));

which is more stable.

Mark_L_Stone · June 30, 2015, 4:05pm

Can you capture and post matrices on which eig fails? it’s not that I don’t believe you, but i would like to see and play around with them for the sake of learning.

What is the basis for your assertion that the SVD calculation you propose is more reliable than eig? If it is just that on examples in which eig fails, the SVD approach succeeds, that wouldn’t convince me in general, because there may be cases in which the SVD approach fails but eig would succeed.

Regarding eigs: Back in the “olden” days, I used to use eigs, when for instance doing something with spectral gap in the objective function or a constraint, so only needing the 2 largest eigenvalues of a matrix. When I ran a nonlinear optimization, such as some type of Quasi-Newton, even when starting in a “nice section of town”, and with an optimum which might be in the nicest section of town, it was quite frequently the case that the algorithm, if not venturing to the wrong side of the tracks, would at least get real close, on its way to the nicest section of town. Even raising opt.maxit to 3000 and decreasing opt.tol to 1e6 or greater times the default value, eigs often failed at some point. I switched to using eig, and the algorithm made it through the sketchy parts of town with no problem, and always arrived safely at the nicest section of town.

Stephen_Becker · June 30, 2015, 6:06pm

eig should not involve matrix inverses to the best of my knowledge. It may do the equivalent of inverting a tri-diagonal matrix, but that is stable. I thought that eig is as or more stable than any svd method. I’d be very interested to see the matrix that caused it to fail. Is the matrix dense or sparse?

you don’t need to use eig and you can use svd in your implementation for reliability.

Usually, I thought it was the other way around: people use svd when they could use eig, as eig is better in theory and practice. On my laptop, if I have a 2000 x 2000 Hermitian matrix, eig takes 1.4 seconds and svd takes 4.4 seconds. I’m willing to believe that an implementation svd could be more stable than an implementation of eig but it’s not obvious to my why it should be so.

SB7 · June 30, 2015, 6:37pm

I don’t know how they compute the eigendecomposition under the hood, so I might be wrong and the error might be merely due to bad implementation of eig in MATLAB. Maybe Google can tell us how often people face with the exceptions/errors that eig and svd commands throw and that might be a hint on which one is more reliable. Anyways, I managed to get my Monte-Carlo simulations done without errors using the SVD equivalent in prox_trace.

I have saved a matrix for which eig fails. Here is the link. Surprisingly, eig without requiring the eigenvectors works, but when you ask for both the eigenvectors and the eigenvalues it fails.

Mark_L_Stone · June 30, 2015, 7:45pm

Using R2014A WIN64, I duplicated your behavior of no error for eig without eigenvector and error when also asking for eignevector. However, both worked without error using R2013A WIN32.

Stephen_Becker · July 1, 2015, 9:11pm

I can confirm on Matlab R2014a in 64-bit mac. Very odd. It does converge if we perturb it slightly: [V,D] = eig( BM - 1e-6 ); and [V,D] = eig( BM + 1e-6 ); and [V,D] = eig( fliplr(BM) ) all work just fine. It works fine if we just as for eigenvalues. Condition number is fine, magnitude of entries is fine.

Given that it worked for Mark in an earlier version of Matlab, it could be due to Matlab changing their libraries. I think you should file a report with Mathworks.

What versions of BLAS and LAPACK give the error? e.g., Mark, for your two versions of Matlab, is there a different LAPACK version?

For me, version -lapack gives:

Intel(R) Math Kernel Library Version 11.0.5 Product Build 20130613 for Intel(R) 64 architecture applications
Linear Algebra PACKage Version 3.4.1

LAPACK has an eigenvalue solver, but I think MKL has its own too, so maybe Matlab changed which library they use.

Mark_L_Stone · July 1, 2015, 9:52pm

Stephen,

Note that my two MATLAB versions are R2014A WIN64 and R2013A WIN 32, so there’s also the difference between 64 bit and 32 bit.

For R2014A Win64, version -lapack gives

Intel(R) Math Kernel Library Version 11.0.5 Product Build 20130612 for Intel(R) 64 architecture applications
Linear Algebra PACKage Version 3.4.1

For R2013A Win32, version -lapack gives

Intel(R) Math Kernel Library Version 10.3.11 Product Build 20120606 for 32-bit applications
Linear Algebra PACKage Version 3.4.1

Maybe a change in the least significant bit or so is making the difference between eig failing or not?

SB7 · July 1, 2015, 10:01pm

I’m using MATLAB 2014b 64-bit as well. For version -lapack I get
Intel® Math Kernel Library Version 11.1.1 Product Build 20131010 for Intel® 64 architecture applications Linear Algebra PACKage Version 3.4.1

The version -blas also returns the MKL info as above.

Mark_L_Stone · July 2, 2015, 11:07pm

Perhaps Demmel and Kahan 1990 “Accurate Singular Values of Bidiagonal Matrices” http://www.netlib.org/lapack/lawnspdf/lawn03.pdf will shed at least some murky light on the matter of why under WIN64 with the BM matrix, eig converged when eigenvalues only were requested, but not when eigenvectors were also requested. Singular vectors converge more slowly than singular values. So I suspect the same is true for eigenvalues and eigenvectors. Related to that, a different algorithm might be employed, and in any event, different convergence criteria.

Does eig use a different algorithm depending on whether or not eigenvectors are requested? In any event, I would think that eig is waiting until eigenvectors converge to terminate if eigenvectors are requested. So if eig uses the same algorithm with and without eigenvectors being requested, it may have converged for the no eignenvectors requested case before getting to the point at which it had trouble trying to get eigenvectors to converge.

Now that I think of it, I can tell you one difference in eig between MATLAB R2014A and R2013A. I recently discovered (before this thread was started, however), much to my surprise, that the option to compute both right and left eigenvectors in eig is not in R2013A and earlier (don’t know about R2013B), and is in R2014A and later. I hadn’t used left eigenvectors until recently, and had always thought that the ability to compute left and right eigenvectors in eig was there since time immemorial (i.e., the Fortran version of MATLAB, which I first used in early '82 when taking a Statistical Computing course taught by Gene Golub). So now I am wondering whether the difference in eig performance on BM is due to 64 vs. 32 bit (and Intel Math Kernel Library) or to a difference in eig. I wonder whether someone can run this (BM) in R2013A WIN64. R2013A WIN32 and R2014A WIN64 both say “Linear Algebra PACKage Version 3.4.1”, but the left eigenvector capability is not in R2013A, and is in R2014A.

Stephen_Becker · July 7, 2015, 4:12pm

I tried on 64-bit linux with R2013b and

Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130123 for Intel(R) 64 architecture applications
Linear Algebra PACKage Version 3.4.1

So far, it seems that MKL version 10.3 worked and MKL versions 11.02, 11.05 and 11.1.1 have not worked (i.e., they give the non-convergence message).

I’m very certain the algorithm changes if you do not request the eigenvectors. Many years ago I noticed an incredible speed difference depending on whether you request the eigenvectors or not (same for SVD).

Interesting about the left eigenvectors! I often use symmetric matrices and never put any thought into that.

I do recall hearing something about MKL using the “FEAST” eigenvector algorithm in very recent releases, but I thought this was a competitor with Lanczos and not with direct solvers. As far as I know, FEAST is a fast and promising algorithm but from a very different motivation and still a bit novel.

Maybe this is a fun question for NA-Digest?

Mark_L_Stone · July 7, 2015, 6:35pm

Stephen,

By all means go ahead and submit something to NA-Digest. That sounds like a more likely venue to get authoritative answers than from the few of us here.

Stephen_Becker · July 10, 2015, 10:55pm

Bug seems to have been identified by Julien Langou as most likely bug 115 or 121 in Lapack (see http://www.netlib.org/lapack/bug_list.html).

Thanks to Julien’s investigation, he found out that we see it now because Mathworks started calling DSYEVD instead of DSYEV, which uses a different algorithm and is generally faster, but had the bug. The bug in LAPACK has been around since at least 2000, and Tim Mitchell found a case of the bug in 2013.

Osni Marques has fixed the LAPACK version as of October 2014, and this version has no error when tried on the test matrix. The very latest MKL has the fix (MKL 11.2 Update 4 and 11.3 Update 1). Not yet fixed in Matlab but Mathworks knows about it and plans to incorporate the fixes in upcoming releases.

In addition to using the SVD to calculate the eigenvalue decomposition, Pat Quillen points out that you can ask for the generalized eigenvalue decomposition, with B=eye(n), and it uses a bug-free algorithm that gives the right answer, e.g.,

 [V,D] = eig(A, eye(size(A)));

will work.

I will add this robustness into TFOCS’ routines, e.g.,

try
    [V,D] = eig(A);
catch ME
    if (strcmpi(ME.identifier,'MATLAB:eig:NoConvergence'))
        [V,D] = eig(A, eye(size(A)));
    else
        rethrow(ME);
    end
end

SB7 · July 10, 2015, 11:16pm

@Stephen_Becker @Mark_L_Stone Thanks a lot for following this issue.

Mark_L_Stone · July 11, 2015, 10:26am

Stephen, I guess you were wise to only use the generalized eigenvalue decomposition with B = eye(size(A)) when you need it. It’s not a freebie.

In limited experimentation with some 5000 by 5000 matrices, with or without requesting eigenvectors, the generalized eigenvalue approach with B = eye(size(A)) took 30 to 50 times longer than use of eig without B. I was rather surprised by the magnitude of the computation time penalty.

Stephen_Becker · July 14, 2015, 1:36am

Wow, that’s slow! I switched it now to use the SVD approach, and spread the fix to all files that use eigenvalues (they now call private/safe_eig which runs the try/catch loop and runs the SVD version if necessary)

Mark_L_Stone · July 17, 2015, 5:26am

On a related note, the following is in the July 16 (Vol 15 #28) NA-Digest:

From: Mike Sussman sussmanm@math.pitt.edu
Date: July 09, 2015
Subject: MATLAB inv() much less stable in new releases than old

Does anyone know why the folks at the Mathworks have chosen to change the matrix algorithm for the inv function so that it is much less stable than before? Between MATLAB releases R2012a and R2014a the following sequence of instructions produces drastically different
results:

A=hilb(10);
Ainv=inv(A);
x=ones(10,1);
b=Ax;
norm(x-Ainvb)/norm(x)

Using release R2012a, this produces 2.9743e-4, somewhat better than would be expected from the condition number of 1.6025e+13. If, instead, Ainv is computed as Ainv=A\eye(10), the result is 0.0051, larger than with inv(A) but still consistent with the condition number. In contrast, using release R2014a (and more recent versions) one gets 1.4290e+5, substantially larger than one would expect from the condition number.

If I recall correctly, this change occurred in or before R2013a, but I can no longer verify that. In any case I have been unable to find any mention of a change in the release notes or on the internet.

Can anyone help me understand why such a change was made? Is the new algorithm better in some way for some class of problem?

Of course, we all know it’s naughty to use inv, but sometimes one gives into temptation, or it really, truly is necessary to explicitly compute the inverse.

By the way, congratulations to Stephen R. Becker, Emmanuel J. Candès, Michael C. Grant (mcg) for winning the 2015 Beale-Orchard-Hays Prize for TFOCS. This makes two in a row for mcg, who, along with Stephen Boyd, won the Beale-Orchard-Hays Prize in 2012 (when it was msot recently previously awarded) for CVX.