How to improve the numerical stability for a large SDP using MOSEK

Dear all,

I am running an optimization for a large SDP with only one linear matrix inequality, i.e., LMI (\sum_i x_i M_i \geq 0 where x_i's are variables and M_i's are given matrices) and a few other simple linear constraints (one equality and two inequalities). From my project, I can tune the size of the problem, namely the number of variables and the dimension of the matrices involved in the LMI. For a small problem size, I got perfect convergence using MOSEK within a short amount of time. However, my goal is to optimize the problem with a very large size (dimensions will be shown in the logs). When I increase the problem size, weird numerical instability happens.

Logs for a small size optimization:

MOSEK Version 10.1.24 (Build date: 2024-1-31 14:16:39)
Copyright (c) MOSEK ApS, Denmark WWW: mosek.com
Platform: Windows/64-X86

Problem
  Name                   :                 
  Objective sense        : minimize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 91              
  Affine conic cons.     : 0               
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 808             
  Matrix variables       : 1 (scalarized: 182710)
  Integer variables      : 0               

Optimizer started.
Presolve started.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 0                 time                   : 0.00            
Lin. dep.  - primal attempts        : 0                 successes              : 0               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 0.02    
Optimizer  - threads                : 24              
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 91              
Optimizer  - Cones                  : 1               
Optimizer  - Scalar variables       : 512               conic                  : 2               
Optimizer  - Semi-definite variables: 1                 scalarized             : 182710          
Factor     - setup time             : 0.55            
Factor     - dense det. time        : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 4186              after factor           : 4186            
Factor     - dense dim.             : 0                 flops                  : 1.23e+10        
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+00  1.7e+02  3.9e+00  0.00e+00   2.871200000e+00   0.000000000e+00   1.0e+00  1.92  
1   7.7e-01  1.3e+02  3.4e+00  -9.80e-01  2.573816957e+00   6.748739849e-03   7.7e-01  2.64  
2   2.6e-01  4.4e+01  1.9e+00  -9.79e-01  7.676299896e-02   -5.457652094e-02  2.6e-01  3.23  
3   6.9e-02  1.2e+01  8.4e-01  -8.85e-01  -7.271562859e+00  -4.542790227e-01  6.9e-02  3.70  
4   6.1e-03  1.0e+00  9.5e-02  -4.66e-01  -1.747946026e+01  -1.673677885e+00  6.1e-03  4.09  
5   2.0e-04  3.3e-02  7.8e-04  6.46e-01   -3.075600824e+00  -2.078285646e+00  2.0e-04  4.53  
6   5.7e-05  9.6e-03  6.0e-05  1.79e+00   -2.138607712e+00  -2.071415464e+00  5.7e-05  5.09  
7   9.7e-06  1.6e-03  3.9e-06  1.63e+00   -2.030347811e+00  -2.020488163e+00  9.7e-06  5.99  
8   3.0e-06  5.1e-04  6.5e-07  1.07e+00   -1.856845561e+00  -1.853985548e+00  3.0e-06  6.41  
9   1.3e-06  2.3e-04  1.8e-07  1.02e+00   -1.796512021e+00  -1.795382449e+00  1.3e-06  7.05  
10  1.1e-06  1.8e-04  1.3e-07  1.01e+00   -1.800555849e+00  -1.799674331e+00  1.1e-06  7.66  
11  4.5e-07  7.7e-05  3.3e-08  1.01e+00   -1.787807594e+00  -1.787481775e+00  4.5e-07  8.09  
12  2.2e-07  3.7e-05  1.0e-08  1.00e+00   -1.785963380e+00  -1.785828892e+00  2.2e-07  8.44  
13  8.1e-08  1.4e-05  2.2e-09  1.00e+00   -1.784642747e+00  -1.784597154e+00  8.1e-08  8.80  
14  2.3e-08  3.9e-06  3.1e-10  1.00e+00   -1.784729669e+00  -1.784719375e+00  2.3e-08  9.14  
15  2.3e-09  3.8e-07  9.4e-12  1.00e+00   -1.784655877e+00  -1.784654879e+00  2.3e-09  9.48  
16  2.3e-10  3.9e-08  3.0e-13  1.00e+00   -1.784652828e+00  -1.784652728e+00  2.3e-10  11.67 
17  7.6e-10  3.7e-08  2.8e-13  1.00e+00   -1.784652804e+00  -1.784652710e+00  2.2e-10  13.47 
18  9.9e-10  3.6e-08  2.7e-13  1.00e+00   -1.784652799e+00  -1.784652707e+00  2.1e-10  14.36 
19  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  15.33 
20  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  16.66 
21  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  17.91 
22  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  19.16 
23  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  21.73 
24  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  24.53 
25  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  27.34 
26  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  30.42 
27  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  33.58 
28  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  35.77 
29  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  37.50 
30  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  39.67 
31  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  41.27 
32  1.1e-09  3.6e-08  2.7e-13  1.00e+00   -1.784652798e+00  -1.784652706e+00  2.1e-10  44.38 
Optimizer terminated. Time: 47.58   


Interior-point solution summary
  Problem status  : PRIMAL_AND_DUAL_FEASIBLE
  Solution status : OPTIMAL
  Primal.  obj: -1.7846527976e+00   nrm: 9e+00    Viol.  con: 2e-08    var: 3e-09    barvar: 0e+00  
  Dual.    obj: -1.7846527056e+00   nrm: 2e+02    Viol.  con: 0e+00    var: 7e-07    barvar: 2e+53  
Optimizer summary
  Optimizer                 -                        time: 47.58   
    Interior-point          - iterations : 33        time: 47.56   
      Basis identification  -                        time: 0.00    
        Primal              - iterations : 0         time: 0.00    
        Dual                - iterations : 0         time: 0.00    
        Clean primal        - iterations : 0         time: 0.00    
        Clean dual          - iterations : 0         time: 0.00    
    Simplex                 -                        time: 0.00    
      Primal simplex        - iterations : 0         time: 0.00    
      Dual simplex          - iterations : 0         time: 0.00    
    Mixed integer           - relaxations: 0         time: 0.00   

Logs for a very-large-size optimization

MOSEK Version 10.1.27 (Build date: 2024-2-27 09:09:28)
Copyright (c) MOSEK ApS, Denmark WWW: mosek.com
Platform: Linux/64-X86

Problem
  Name                   :                 
  Objective sense        : minimize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 1275            
  Affine conic cons.     : 0               
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 12353           
  Matrix variables       : 1 (scalarized: 47108071)
  Integer variables      : 0               

Optimizer started.
Presolve started.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 0                 time                   : 0.00            
Lin. dep.  - primal attempts        : 0                 successes              : 0               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 2.24    
GP based matrix reordering started.
GP based matrix reordering terminated.
Optimizer  - threads                : 112             
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 1275            
Optimizer  - Cones                  : 0               
Optimizer  - Scalar variables       : 12330             conic                  : 0               
Optimizer  - Semi-definite variables: 1                 scalarized             : 47108071        
Factor     - setup time             : 2987.78         
Factor     - dense det. time        : 0.00              GP order time          : 0.04            
Factor     - nonzeros before factor : 8.13e+05          after factor           : 8.13e+05        
Factor     - dense dim.             : 0                 flops                  : 6.06e+13        
Factor     - GP saved nzs           : 0                 GP saved flops         : 1.59e+10       
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+00  2.5e+03  4.9e+01  0.00e+00   4.801030303e+01   0.000000000e+00   1.0e+00  10678.67
1   9.6e-01  2.4e+03  4.8e+01  -9.99e-01  4.796718877e+01   2.903752932e-03   9.6e-01  12037.49
2   8.7e-01  2.2e+03  4.6e+01  -9.99e-01  4.785951174e+01   2.192297316e-03   8.7e-01  13418.61
3   6.2e-01  1.6e+03  3.9e+01  -9.99e-01  4.739082074e+01   8.803435850e-05   6.2e-01  14822.00
4   2.5e-01  6.2e+02  2.4e+01  -9.98e-01  4.490021409e+01   -1.460535855e-02  2.5e-01  16224.85
5   1.1e-01  2.8e+02  1.6e+01  -9.92e-01  3.987107454e+01   -8.753194756e-02  1.1e-01  17586.76
6   3.6e-02  8.9e+01  8.6e+00  -9.56e-01  2.060897191e+01   -6.088880515e-01  3.6e-02  18978.80
7   3.3e-03  8.2e+00  1.7e+00  -8.26e-01  -9.310912200e+01  -4.191896741e+00  3.3e-03  20392.30
8   1.6e-03  3.9e+00  7.1e-01  3.54e-02   -7.598453767e+01  -5.669704003e+00  1.6e-03  21837.24
9   1.4e-03  3.5e+00  6.2e-01  3.83e-01   -7.443745453e+01  -5.787910719e+00  1.4e-03  23254.69
10  1.3e-03  3.4e+00  6.3e-01  -3.35e-01  -8.205138709e+01  -5.711890520e+00  1.3e-03  24658.32

Sorry that I had to shut down the solver since it was getting unstable and took too much time for one iteration. The weird thing is that the solver was approaching convergence at the 8th iteration but suddenly returned to infeasible after iteration 9. I would believe this is because of the numerical instability.

Then the next step I did was to rescale the PSD constraint by a very large number: \sum_i x_i M_i/C \geq 0 with C = 10000.

MOSEK Version 10.1.27 (Build date: 2024-2-27 09:09:28)
Copyright (c) MOSEK ApS, Denmark WWW: mosek.com
Platform: Linux/64-X86

Problem
  Name                   :                 
  Objective sense        : minimize        
  Type                   : CONIC (conic optimization problem)
  Constraints            : 1275            
  Affine conic cons.     : 0               
  Disjunctive cons.      : 0               
  Cones                  : 0               
  Scalar variables       : 12353           
  Matrix variables       : 1 (scalarized: 47108071)
  Integer variables      : 0               

Optimizer started.
Presolve started.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 0                 time                   : 0.00            
Lin. dep.  - primal attempts        : 0                 successes              : 0               
Lin. dep.  - dual attempts          : 0                 successes              : 0               
Lin. dep.  - primal deps.           : 0                 dual deps.             : 0               
Presolve terminated. Time: 2.01    
GP based matrix reordering started.
GP based matrix reordering terminated.
Optimizer  - threads                : 112             
Optimizer  - solved problem         : the primal      
Optimizer  - Constraints            : 1275            
Optimizer  - Cones                  : 0               
Optimizer  - Scalar variables       : 12329             conic                  : 0               
Optimizer  - Semi-definite variables: 1                 scalarized             : 47108071        
Factor     - setup time             : 2993.52         
Factor     - dense det. time        : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 8.13e+05          after factor           : 8.13e+05        
Factor     - dense dim.             : 0                 flops                  : 5.99e+13        
Factor     - GP saved nzs           : 0                 GP saved flops         : 1.59e+10      
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   1.0e+00  2.5e+03  1.5e+00  0.00e+00   4.801030303e-01   0.000000000e+00   1.0e+00  10774.83
1   9.6e-01  2.4e+03  1.4e+00  -9.99e-01  4.383002018e-01   2.903795251e-03   9.6e-01  12159.70
2   8.7e-01  2.2e+03  1.4e+00  -9.99e-01  3.330426946e-01   2.192097552e-03   8.7e-01  13584.46
3   6.2e-01  1.6e+03  1.2e+00  -9.99e-01  -1.236582975e-01  8.760796023e-05   6.2e-01  15014.48
4   2.5e-01  6.3e+02  7.4e-01  -9.98e-01  -2.523614050e+00  -1.453934461e-02  2.5e-01  16436.65
5   1.1e-01  2.8e+02  4.9e-01  -9.92e-01  -7.166519698e+00  -8.786709180e-02  1.1e-01  17886.38
6   3.6e-02  9.0e+01  2.6e-01  -9.56e-01  -2.313890259e+01  -6.015650037e-01  3.6e-02  19292.97
7   3.4e-03  8.4e+00  5.3e-02  -8.28e-01  -1.154919834e+02  -4.139920395e+00  3.4e-03  20725.48
8   2.9e-05  7.3e-02  3.2e-04  1.93e-02   -6.202874589e+01  -8.198851952e+00  2.9e-05  22115.60
9   1.6e-05  3.9e-02  1.1e-04  9.92e-01   -3.309063995e+01  -8.206290406e+00  1.6e-05  23501.73
10  1.4e-05  3.5e-02  1.0e-04  9.85e-01   -3.151362600e+01  -8.183827478e+00  1.4e-05  24918.53
11  1.4e-05  3.5e-02  1.0e-04  -1.95e-01  -3.223246683e+01  -8.214260772e+00  1.4e-05  26307.87
12  1.3e-05  3.2e-02  1.1e-04  -9.07e-02  -4.150201417e+01  -8.375084758e+00  1.3e-05  27734.81
13  4.9e-06  1.2e-02  5.0e-05  -2.87e-02  -5.681992409e+01  -8.304358582e+00  4.9e-06  29085.59
14  6.5e-07  1.6e-03  1.9e-05  -8.87e-01  -3.861553933e+02  -8.363352237e+00  6.5e-07  30491.00
15  6.9e-08  2.6e-06  7.4e-07  -9.80e-01  -2.366388630e+05  -8.472679413e+00  1.0e-09  31856.87
16  8.1e-09  2.6e-10  2.2e-08  -1.00e+00  -2.366262255e+09  -8.472679666e+00  1.0e-13  33244.64
17  2.8e-14  1.7e-13  9.3e-09  -1.00e+00  -5.139083358e-02  -5.927359780e-16  3.5e-19  34936.94
Optimizer terminated. Time: 34940.37


Interior-point solution summary
  Problem status  : DUAL_INFEASIBLE
  Solution status : DUAL_INFEASIBLE_CER
  Primal.  obj: -5.1390833584e-02   nrm: 7e+01    Viol.  con: 3e-14    var: 0e+00    barvar: 0e+00  
Optimizer summary
  Optimizer                 -                        time: 34940.37
    Interior-point          - iterations : 17        time: 34940.10
      Basis identification  -                        time: 0.00    
        Primal              - iterations : 0         time: 0.00    
        Dual                - iterations : 0         time: 0.00    
        Clean primal        - iterations : 0         time: 0.00    
        Clean dual          - iterations : 0         time: 0.00    
    Simplex                 -                        time: 0.00    
      Primal simplex        - iterations : 0         time: 0.00    
      Dual simplex          - iterations : 0         time: 0.00    
    Mixed integer           - relaxations: 0         time: 0.00   

As one can see, the stability was improved slightly with the PRSTATUS close to one at iterations 9 and 10, but soon became negative again. I even increased the scaling factor all the way up to C=5\times10^6 which I don’t think is reasonable. The convergence got much better, but still after about 13 interactions, the PRSTATUS went back to negative values and the whole optimization became infeasible.

For the SDP itself, I am pretty sure it is feasible which can be proved analytically. I wish I could share the model file but the file size is just too large and I used about 1.5 TB of memory for solving it. So, are there any internal ways I can use to stabilize the solver? For example, some parameters I can tune in MOSEK to make it more stable. Could this be the equality constraint that causes the problem? The equality constraint is implemented as \sum_i k_ix_i = K where k_i's are given and K is a large number (\sim10^4)

Another observation I noticed is that the result from the intermediate step with near-one PRSTATUS is pretty close to the ideal result from the analytical expectations. This means the solver can push the variables towards the correct optimal point. But for some reason, the solver became very unstable after getting to that point.

Thanks

Since the problem in the original form is an LMI cvx choose to feed the dual problem into MOSEK, because this improve the computational efficiency.

Now the solution summary for large problem shows that MOSEK computes a fairly accurate certificate for the problem is dual is infeasible which is same as a concluding the original LMI problem is primal infeasible.

So we can we conclude that your problem is primal infeasible or at least very close to being primal infeasible. And Mosek has a numerical certificate that supports that conclusion which you can check.

Assume your statement

I am pretty sure it is feasible which can be proved analytically.

is true, then that is not enough in general because according to the theory then your problem must be robust feasible in order not to be ill-posed. So my counter statement is

Your problem is exact primal infeasible given a small perturbation in the problem data.

The Mosek log output is consistent with that statement.

So what should you do. One thing for sure there is no magic option change in Mosek that resolve the issue. Rather you have to make your problem robust feasible by using facial reduction or something like that. It may also help to reduce the size of the problem.

Thanks very much for your reply.

I understand what you wrote about the problem being primal infeasible from the numerical certificate. However, my statement about the analytical feasibility comes from the fact that there must be solutions for the optimization variables that satisfy all the constraints (not necessarily the optimal point). Combined with the constraints being convex, the primal problem should be feasible. However, it is very hard to explicitly construct one of such solutions (the existence can be proved analytically).

I do not fully understand your statement about the infeasibility given a small perturbation. Does that mean MOSEK will add a small perturbation near the optimal point to check if it is truly optimal? In that sense, could the equality constraint become problematic? Is there any way that I can tune or turn off such perturbation?

I also googled the facial reduction which seems to be something I need to do before feeding the problem to the solver. The issue is that the problem is very complicated and lives in the complex domain (the PSD matrix is Hermitian). I don’t think it is very possible.

The message from my post is that your problem is either infeasible or close to be primal infeasible.
I am pretty sure the only to get better results is to improve your model. I know that is easier said than done.Good luck.

Thanks. I will try to figure out why the problem becomes infeasible when going for large problem sizes. My hope is that there might be some simple mistakes in the modeling and I will update it here if I find the mistake.

I finally found a way to fix the issue. The number of optimization variables is 1275 (namely the number of x's). Luckily, in my problem, a certain approximation can be made to map the x's to a much smaller variable y with a dimension of 99 by 1. The new variable is linearly related to the original variable by matrix multiplication: x = Ly where L is a sparse matrix having a dimension of 1275 by 99. Thus, the linear inequality matrix becomes \sum_{ij}y_jL_{ij}M_i=\sum_j y_j\mathcal{M}_j where \mathcal{M}_j = \sum_i M_iL_{ij} is the new matrix. With this approximation, the solved optimum will be larger than using x directly since I am doing a minimization. However, the transformed SDP is now much smaller than the original one.

I then did the optimization with new variables and found the optimized point (y_o). This means the original problem must be feasible since x=Ly_o is a valid solution to the original problem. So, I think the infeasibility in the original problem does indeed come from numerical instability.