Logistic Regression + linear programming - getting only Nan

MTuk · May 3, 2018, 11:15am

Hello,

I need your help in solving the following problem:
First of all, my aim is given the objective function of logistic regression, my aim is to find the farthest point p along a direction q such that g( p ) = 1 which gives the following optimization problem:

cvx_begin
cvx_precision best
%cvx_solver SeDuMi
cvx_expert true
variable p( d );
maximize(f( p ));
subject to
g( p ) <=1;
cvx_end

% d is the dimension

where g( p ) = norm(p(1:end-1),2) + 1/N * sum(log(exp(-Y.Xp) + 1)) and f( p ) = dot(p,q) for some given direction q, mostly a unit vector!
The problem is that when i run the optimization on logistic regression, it fails while on other objective function like SVM, it works fine!

The data that i use is just a data randomly sampled from a logistic distribution, as well as randomly choosing -1,+1 for each point as it’s label.

Bellow i have attached the progress done by CVX optimization tool:

Successive approximation method to be employed.
For improved efficiency, SDPT3 is solving the dual problem.
SDPT3 will be called several times to refine the solution.
Original size: 6005 variables, 2005 equality constraints
2000 exponentials add 16000 variables, 10000 equality constraints

Cones | Errors |
Mov/Act | Centering Exp cone Poly cone | Status
--------±--------------------------------±--------
2000/2000 | 8.000e+00 1.986e+02 1.184e+02 | Solved
1944/1988 | 8.000e+00 7.283e+01 4.932e+01 | Solved
281/320 | 2.936e+00 2.576e-01 1.621e-01 | Inaccurate/Solved
2/ 44 | 4.957e-05 3.842e-02 3.691e-02 | Inaccurate/Solved
28/133 | 1.838e+00 1.095e+00 1.094e+00 | Solved
118/118 | 8.000e+00 1.231e+01 0.000e+00 | Inaccurate/Solved
96/178 | 8.000e+00 1.650e+01 4.920e-01 | Solved
109/237 | 8.000e+00 2.562e+01 6.209e-01 | Inaccurate/Solved
116/116 | 8.000e+00 3.373e+01 0.000e+00 | Inaccurate/Solved
121/121 | 8.000e+00 4.335e+01 0.000e+00 | Inaccurate/Solved
109/259 | 8.000e+00 4.956e+01 8.309e-01 | Inaccurate/Solved
104/280 | 2.627e+00 5.400e+01 1.321e+00 | Solved
121/121 | 2.387e-05 6.215e+01 0.000e+00 | Inaccurate/Solved
117/117 | 9.014e-06 6.050e+01 0.000e+00 | Solved
121/121 | 9.037e-06 6.217e+01 0.000e+00 | Inaccurate/Solved
86/181 | 1.724e-04 5.566e+01 7.251e-01 | Solved
117/117 | 3.327e-05 6.050e+01 0.000e+00 | Inaccurate/Solved
121/121 | 8.779e-06 6.195e+01 0.000e+00 | Inaccurate/Solved
117/117 | 8.779e-06 6.051e+01 0.000e+00 | Inaccurate/Solved
121/179 | 7.823e-06 5.987e+01 8.343e-02 | Inaccurate/Solved
121/121 | 8.393e-06 6.194e+01 0.000e+00 | Inaccurate/Solved
121/121 | 1.986e-08 6.193e+01s 0.000e+00 | Inaccurate/Solved
92/287 | 2.093e-04 5.717e+01 3.017e+00 | Solved
117/280 | 6.313e-05 6.176e+01 7.350e-01 | Inaccurate/Solved
99/142 | 9.051e-05 6.029e+01 1.346e-01 | Solved

Status: Failed
Optimal value (cvx_optval): NaN

All this was done under MATLAB environment.

Please advise and thanks in advance.

Mark_L_Stone · May 3, 2018, 11:45am

I haven’t checked to determine whether your program is correct. (Perhaps you’ve already seen http://web.cvxr.com/cvx/examples/cvxbook/Ch07_statistical_estim/html/logistics.html and http://web.cvxr.com/cvx/examples/cvxbook/Ch07_statistical_estim/html/logistics_gp.html )

Presuming your formulation is “good”, you are still subject to the limitations of CVX’s successive approximation method - read the discliamers at http://cvxr.com/cvx/doc/advanced.html#the-successive-approximation-method . You can avoid using CVX’s successive approximation method by installing CVXQUAD https://github.com/hfawzi/cvxquad and the exponential.m replacement for CVX’s version, as discussed at the link. This will invoke CVXQUAD’s Pade Appoximant method and avoid use of CVX’s successive approximation method for Geometric Programming mode, and for certain log and exp related functions, such as entr and rel_entr, log_sum_exp, and I think should automatically be invoked for log(sum(exp(…))) as you have. If it is invoked, you should see a message “Using Pade approximation for exponential cone with parameters m=3, k=3” Please remove cvx_expert true from your program. If you see that the successive approximation method was invoked, you may need some reformulation of your program in ordrer to get it invoked. If CVXQUAD’s Pade approximation is invoked instead of CVX’s successive approximation method, the solution should be more reliable, and hopefully succeed, unless you have a poorly formulated problem.

Edit: Note that I fixed the link to CVXQUAD, which is https://github.com/hfawzi/cvxquad .

kiran · May 9, 2018, 10:24pm

Thank you a ton for your quickest replies! I will try this now. Thanks again for your great help

Logistic Regression + linear programming - getting only Nan

Successive approximation method to be employed. For improved efficiency, SDPT3 is solving the dual problem. SDPT3 will be called several times to refine the solution. Original size: 6005 variables, 2005 equality constraints 2000 exponentials add 16000 variables, 10000 equality constraints

Successive approximation method to be employed.
For improved efficiency, SDPT3 is solving the dual problem.
SDPT3 will be called several times to refine the solution.
Original size: 6005 variables, 2005 equality constraints
2000 exponentials add 16000 variables, 10000 equality constraints