CVX vs fmincon for sum of exponentials


(Hugh) #1

I have a convex function that would be written in matlab as

f(x) = sum( a.*exp(b.*x) );

where a is a vector of non-negative reals, b is a vector of reals (more often than not negative), and x is a vector of reals. From the composition rules, this is convex, as it is the sum of positive scalars times convex functions.

I would like to minimize f subject to the constraint

c(x) - C <= 0

c(x) = c_0 * sum( exp( x) );

where c_0 and C are specified positive constants. My question is related to syntax, how can I write element wise multiplication like I have for f?

I have been using fmincon for this, but would like to move away from requiring the optimization toolbox, both because I have convexity I should be able to exploit and because I want something open source.

I have a minimal working example, which shows fmincon beating CVX, and given the problem is convex, I don’t think this should be the case, thus I suspect I have gotten something wrong syntactically.

One note: this is a toy version, in practice x is much larger, and I don’t know its length in advance, I only receive a and b which are of the same length and obey the rules I mentioned previously. So for any final solution i could not write something like

sum( a(1)*exp(b(1)*x(1)) + a(2)*exp(b(2)*x(2)) )

CODE:

% CVX

a = 1.0e-04 * [0.9, 0.05, 0.04];
b = [ -11.1 -5.11 -5.05 ];

n = numel(a);
c_0 = 2; C = 2;
cvx_begin
variables x(n)
minimize sum( aexp(bx) )
subject to
c_0 * sum(exp(x)) - C <= 0
cvx_end

x_cvx = x’;

% FMINCON

f = @(x) fM(x,a,b);
c = @(x) cM(x,c_0,C);

[x,fval] = fmincon( f, zeros(size(a)),[],[],[],[],[],[],c);
x_fmincon = x;

f(x_cvx) % This gives 0.153158
f(x_fmincon) % This gives 0.05774

% functions for passing anonymously to fmincon
function [f,fx] = fM(x,a,b)
f = sum( a.* exp(b.*x) );

if nargout >1
fx = a .b . exp( b.*x);
end
end
function [c,ceq] = cM(x,c0,C)
c = c0 * sum( exp( x) ) - C;
ceq = [];
end


(Mark L. Stone) #2

You have mismatched dimensions between x(n0, which is n by 1, and a and b, which are 1 by n.

variable x(n)
minimize(sum( a'.*exp(b'.*x) ))

will work, or

variable x(1,n)
minimize(sum( a.*exp(b.*x) ))

In order to improve execution speed and reliability, you can try installing the open source CVXQUAD https://github.com/hfawzi/cvxquad , including its exponential.m replacement for CVX’s` exponential.m as described at the link, For your problem, an additional modification is needed (after my corrections above are made) in order for CVXQUAD’s Pade approximant to be invoked instead of CVX’s successive approximation method.

This should invoke CVXQUAD;s Pade apprxoimant instead of CVX’s successive approximation method, and should be faster than the successive approximation method.

cvx_begin
variable x(n)
minimize(sum( a'.*exp(b'.*x) ))
    subject to
        sum(exp(x))  <= C/c_0
cvx_end

Please try your full-sized problem before and after installing CVXQUAD, and tell us what the differences in execution time and results are.


(Hugh) #3

So there was an answer here a few moments ago which seems to have disappeared… Either way I made some changes, and it runs now, but is much much slower.

I also ran using the CVXQUAD implementation of exponential, and it got slower still, taking 4s for the solve.

% CVX

a = 1.0e-04 * [0.9, 0.05, 0.04];
b = [ -11.1 -5.11 -5.05 ];

n = numel(a);
c_0 = 2; C = 2;
cvx_expert true
tic
cvx_begin quiet
variables x(n)
minimize sum( a*exp(b’.*x) )
subject to
c_0 * sum(exp(x)) - C <= 0
cvx_end
t_cvx = toc;

x_cvx = x’;

% FMINCON

f = @(x) fM(x,a,b);
c = @(x) cM(x,c_0,C);

options = optimoptions(‘fmincon’,‘Display’,‘off’);
tic
[x,fval] = fmincon( f, zeros(size(a)),[],[],[],[],[],[],c,options);
t_fmincon = toc;
x_fmincon = x;

disp(t_cvx); % gives 1.699506s
disp(t_fmincon); % gives 0.109749s
f(x_cvx) % 0.057743623326151
f(x_fmincon) % 0.057743704468261

% functions for passing anonymously to fmincon
function [f,fx] = fM(x,a,b)
f = sum( a.* exp(b.*x) );

if nargout >1
fx = a .b . exp( b.*x);
end
end
function [c,ceq] = cM(x,c0,C)
c = c0 * sum( exp( x) ) - C;
ceq = [];
end


(Mark L. Stone) #4

Sorry, I realized I incorrectly wrote that no further modifications were needed to invoke CVXQUAD’s Pade approiximant. Therefore, I deleted my post while I figured out the simplest modification to make. I have now done so, and have undeleted and edited my post., Please read it. Thanks.

I suspect you are still invoking CVX’s successive approximation method, so please try my version. Also, in the interests of clarity of what method is actually being used, I recommend you get rid of cvx_expert true until you are ready for production runs.

Edit: I also recommend you don;t use the quiet option until you are ready for production runs, so you can see what’s happening with the solver.

Also, your objective function can be simplified a bit to
minimize(a*exp(b'.*x))


(Hugh) #5

EDIT: I tried switching to SeDuMi and its about twice as fast. 25s compared to 60s, and the answers are identical.

Thank you for the help. Things seems to be working now.

In brief: The answers from CVX are better, at the cost of being significantly more expensive in terms of time.

I suspect this is because fmincon is likely doing some slick C things under the hood. Is there a similar thing for CVX? I was at a talk with Stephen Boyd once where he mentioned having a C level interface that was blazing fast by comparison to the Python/Matlab interface.

A more detailed breakdown:

  1. I am really solving many almost identical optimization problems, just with the values (and lengths) of a and b changing. Typically I will do 30 such solves per iteration. Is there a way to intelligently re-use the problem setup or is this construction/destruction just the cost of doing business in matlab?

  2. CVX gives better final results than using fmincon. The final minimized functional is about ~10 smaller (1e-9 compared to 1e-8, where 0 is perfect), and the reduction is monotone for CVX, whereas for fmincon, it oscillates near the end. This is a huge plus.

  3. CVXQUAD implementation of exponential is approximately 3 times faster than the original CVX implementation (~ 60s for 30 iterations vs ~180s), thank you for pointing me towards this.

  4. I believe it is not calling successive, here is a sample output:

Calling SDPT3 4.0: 2011 variables, 785 equality constraints
For improved efficiency, SDPT3 is solving the dual problem.

num. of constraints = 785
dim. of sdp var = 1176, num. of sdp blk = 588
dim. of socp var = 98, num. of socp blk = 49
dim. of linear var = 149


SDPT3: Infeasible path-following algorithms


version predcorr gam expon scale_data
HKM 1 0.000 1 0
it pstep dstep pinfeas dinfeas gap prim-obj dual-obj cputime

0|0.000|0.000|1.9e+02|8.9e+00|1.6e+05| 6.618678e+03 0.000000e+00| 0:0:00| spchol 1 1
1|0.939|0.355|1.2e+01|5.8e+00|8.3e+04| 6.414670e+03 -1.228776e+02| 0:0:00| spchol 1 1
2|0.985|0.934|1.8e-01|3.9e-01|9.8e+03| 4.887410e+03 -2.713751e+02| 0:0:00| spchol 1 1
3|0.928|0.865|1.3e-02|5.3e-02|1.9e+03| 1.213110e+03 -1.951581e+02| 0:0:00| spchol 1 1
4|0.841|0.864|2.0e-03|9.9e-03|4.7e+02| 3.434653e+02 -4.564116e+01| 0:0:00| spchol 1 1
5|0.993|1.000|1.7e-05|4.2e-04|6.9e+01| 7.412704e+01 6.860132e+00| 0:0:00| spchol 1 1
6|1.000|0.956|1.7e-06|2.3e-05|1.0e+01| 2.691871e+01 1.656662e+01| 0:0:00| spchol 1 1
7|1.000|0.800|1.1e-07|4.9e-06|6.2e+00| 2.531136e+01 1.915579e+01| 0:0:00| spchol 1 1
8|1.000|1.000|1.0e-08|3.1e-08|2.8e+00| 2.311597e+01 2.027370e+01| 0:0:00| spchol 1 1
9|0.906|0.901|8.3e-09|5.9e-09|4.4e-01| 2.164992e+01 2.121459e+01| 0:0:00| spchol 1 1
10|0.992|1.000|6.4e-11|1.7e-09|4.0e-02| 2.141795e+01 2.137779e+01| 0:0:00| spchol 1 1
11|0.987|0.987|8.4e-13|4.5e-11|5.3e-04| 2.139689e+01 2.139636e+01| 0:0:01| spchol 1 1
12|0.998|0.996|9.3e-14|1.2e-12|1.4e-05| 2.139662e+01 2.139660e+01| 0:0:01| spchol 1 1
13|0.627|1.000|1.1e-10|1.0e-12|5.3e-06| 2.139661e+01 2.139661e+01| 0:0:01| spchol 1 1
14|0.635|1.000|5.8e-11|1.5e-12|2.9e-06| 2.139661e+01 2.139661e+01| 0:0:01| spchol 1 1
15|0.636|1.000|2.6e-11|2.2e-12|1.5e-06| 2.139661e+01 2.139661e+01| 0:0:01| spchol 1 1
16|0.637|1.000|1.0e-11|3.4e-12|8.1e-07| 2.139661e+01 2.139661e+01| 0:0:01| spchol 1 1
17|0.637|1.000|3.8e-12|2.0e-12|4.3e-07| 2.139661e+01 2.139661e+01| 0:0:01|
stop: max(relative gap, infeasibilities) < 1.49e-08

number of iterations = 17
primal objective value = 2.13966113e+01
dual objective value = 2.13966109e+01
gap := trace(XZ) = 4.30e-07
relative gap = 9.81e-09
actual relative gap = 9.79e-09
rel. primal infeas (scaled problem) = 3.80e-12
rel. dual " " " = 2.02e-12
rel. primal infeas (unscaled problem) = 0.00e+00
rel. dual " " " = 0.00e+00
norm(X), norm(y), norm(Z) = 2.2e+01, 3.0e+01, 4.2e+01
norm(A), norm(b), norm© = 4.2e+01, 2.0e+00, 4.1e+01
Total CPU time (secs) = 0.79
CPU time per iteration = 0.05
termination code = 0
DIMACS: 3.8e-12 0.0e+00 2.2e-11 0.0e+00 9.8e-09 9.8e-09


Status: Solved
Optimal value (cvx_optval): +5.1e-10


(Mark L. Stone) #6

Yes, that is not using successive approximation method.

Solution accuracy of FMINCON can be controlled by solver tolerances. FMINCON time can be affected by starting point. If you have a good starting point, FMINCON should be fast. Perhaps FMINCON would be faster if you supplied the Hessian, which you are apparently not doing.

CVX was not built for execution speed, rather, for modeling ease and saving human time.

I believe Stephen Boyd was referring to cvxgen https://cvxgen.com/docs/index.html , which only addresses LPs and convex QPs, therefore, not your problem.


(Erling D.Andersen) #7

At MOSEK we are working on handling these exponential terms directly using the exponential cone. See

https://docs.mosek.com/slides/2018/ismp2018/ismp-dahl.pdf

It provides good accuracy results. it is not available in Cvx yet since it is still alpha code. But you can contact MOSEK support if you want to give it a shot.


(Hugh) #8

Thank you for the help. I think use SeDuMi right now is going to be enough for the proof of concept. I might look at MOSEK for later potentially.

Speaking of the exponential cone, would CVX be capable of the generalization of the above model: take a as positive reals, but now B and X are symmetric matrices and then

f(X) = sum_X ( a .* exp( tr( B * X ) ) )

we have in the past used NLOPT on this fairly successfully, but I have on and off thought this looks like it ought to be convex and that this structure ought to be exploitable.


(Mark L. Stone) #9

You may get some speedup using the current version of MOSEK (8.1) rather than SDPT3 or SeDuMi. That is because CVXQUAD converts the exponentials into 2 by 2 LMIs, which I believe are presented by CVX to the solver as second order cone constraints. So you will benefit by using a better SOCP solver (MOSEK 8.1) rather than SDPT3 or SeDuMii.

When MOSEK 9 does become available, I don’t believe you will be able to access MOSEK’s native exponential cone capability from CVX until a CVX version which can exploit it becomes available - I have no idea what plans, if any, there are for doing so. As of now, CVX 3.0beta actually can use the native exponential cone capability of ECOS and SCS, but unfortunately, CVX 3.0beta has many bugs, and is at best a crapshoot.

As for your generalization, you can do the following (symmetry not required on anything, and you can adjust the (now vector) constraint as meaningful for your problem):

n = 3; a = rand(1,n); B = rand(n); C = 2 ; c_0 = 1;
cvx_begin
variable X(n,n) symmetric
minimize(sum(a*exp(trace(B*X))))
    subject to
        sum(exp(X))  <= C/c_0
cvx_end

=====================================
Using Pade approximation for exponential
cone with parameters m=3, k=3

Calling SDPT3 4.0: 174 variables, 69 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
------------------------------------------------------------

 num. of constraints = 69
 dim. of sdp    var  = 108,   num. of sdp  blk  = 54
 dim. of linear var  = 12
*******************************************************************
   SDPT3: Infeasible path-following algorithms
*******************************************************************
 version  predcorr  gam  expon  scale_data
   HKM      1      0.000   1        0    
it pstep dstep pinfeas dinfeas  gap      prim-obj      dual-obj    cputime
-------------------------------------------------------------------
 0|0.000|0.000|1.1e+01|9.3e+00|2.7e+04| 1.146782e+03  0.000000e+00| 0:0:00| chol  1  1 
 1|0.547|0.821|4.9e+00|1.8e+00|1.0e+04| 1.047300e+03  2.476639e+03| 0:0:00| chol  1  1 
 2|0.056|0.691|4.6e+00|5.5e-01|1.5e+04| 1.093062e+03  1.355218e+05| 0:0:00| chol  1  1 
 3|0.001|0.347|4.6e+00|3.6e-01|4.1e+06| 1.098287e+03  6.961621e+08| 0:0:00| chol  1  1 
 4|0.000|0.001|4.6e+00|3.6e-01|1.2e+08| 1.463785e+03  2.271020e+10| 0:0:00| chol  2  2 
 5|0.001|0.000|4.6e+00|3.6e-01|3.6e+08| 2.731265e+04  7.761666e+10| 0:0:00| chol  1  1 
 6|0.000|0.000|4.6e+00|3.6e-01|5.7e+09| 5.416884e+04  1.267585e+12| 0:0:00| chol  1  1 
 7|0.000|0.000|4.6e+00|3.6e-01|2.1e+10| 9.720075e+05  5.219580e+12| 0:0:00| chol  1  1 
 8|0.000|0.000|4.6e+00|3.6e-01|2.9e+11| 2.212709e+06  7.403608e+13| 0:0:00| chol  1  1 
 9|0.000|0.000|4.6e+00|3.6e-01|1.1e+12| 3.681163e+07  2.977169e+14| 0:0:00| chol  1  1 
10|0.000|0.000|4.6e+00|3.6e-01|1.4e+13| 8.621263e+07  4.093203e+15| 0:0:00| chol  1  2 
11|0.000|0.000|4.6e+00|3.6e-01|5.6e+13| 6.802093e+08  1.677517e+16| 0:0:00| chol  1  2 
12|0.000|0.000|4.6e+00|3.6e-01|3.4e+14| 1.220113e+09  1.030532e+17| 0:0:00| chol  1  2 
13|0.000|0.000|4.6e+00|4.5e-01|1.0e+15| 4.237204e+09  3.019876e+17| 0:0:00| chol  1  2 
14|0.000|0.000|4.6e+00|8.8e-01|2.8e+15| 1.276803e+10  8.621951e+17| 0:0:00| chol  1  2 
15|0.000|0.000|4.6e+00|3.7e-01|9.9e+15| 3.644203e+10  3.054953e+18| 0:0:00|
  sqlp stop: primal problem is suspected of being infeasible
-------------------------------------------------------------------
 number of iterations   = 15
 residual of primal infeasibility      
 certificate (y,Z)      = 4.05e-18
 reldist to infeas.    <= 2.06e-18
 Total CPU time (secs)  = 0.22  
 CPU time per iteration = 0.01  
 termination code       =  1
 DIMACS: 9.6e+00  0.0e+00  2.1e+00  0.0e+00  -1.0e+00  3.3e-03
-------------------------------------------------------------------
 
------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): +0

(Hugh) #10

That’s very promising indeed. I wasn’t quite clear enough in my question though, it would be more akin to (in matlab notation). I have to resort to the anon functions as matlab isn’t the best at tensor multiplication.

a = rand(1,n); B = rand(2,2,n); X = zeros(2,2,n);

f = @(x) fM(x,a,B);

function [ f ] = fM(x,A,B)
f = 0;
for i = 1:numel(A)
f = f + A(n) * exp( trace(x(:,:,n) * B(:,:,n) );
end

end


(Mark L. Stone) #11

You can build up CVX expressions in for loops, the same way your anonymous function code does, except, of course, X is declared as a symmetric variable for each of n slices, as shown in mcg’s answer at Cvx sdp mode and cell arrays. I.e., variable X(2,2,n) symmetric. After the for loop, f can be your objective function.