Failed answer using cvx

(Omar El Sherbeny) #1

Hello, i am using matlab cvx for classification, the input data is a random data of positive and negative numbers as the samples and the labels are -1 for negative numbers and +1 for positive numbers, first i train cvx with some training randomly generated data and then i test it. in the test i require cvx to predict the labels but for some reason it always predicts all the labels to be -1. i did that with two different equations for classifications from the same paper i am reading which is “interaction between financial risk measures and machine learning methods” jun-ya gotoh.

plus when i increase the number of samples (s) over 1000 the training code doesnt work

here is the code

clc 
close all 
clear all 
s=1000; %number of samples 
%randomly generated samples 
samplesneg1 = -1 +.07*rand(s,1);
samples1 = 1+.07*rand(s,1);
y = [ zeros(s/2,1) ; ones(s/2,1) ];

for i =1:length(y)
    if(y(i)<1)
        y(i)=-1;
    end
end


trainneg1 = samplesneg1(1:s/2 , :);
train1 = samples1(1:s/2 , :);
x = [ trainneg1 ; train1 ];


m=length(y);

%generating test samples 
xt= [ samplesneg1((s/2 +1):s, :) ; samples1((s/2 +1):s , :) ];
xt = xt (randperm (length (xt)));

n= size(x(1,:)); %size of one sample 


cvx_begin 

    variables w(n)   
        
        minimize ( ((1/2)* norm(w,2)^2));
        
        subject to 
        
        for i=1:m
            ( y(i)*((w*(x(i,:))') -1)) >= 0;
        end
      
    
cvx_end 


disp('after')

 labelsT = [];
for i=1:length(xt)
    if(xt(i)<0)
        labelsT(i)=-1;
        
    else if (xt(i)>0)
            labelsT(i)=1;
        end
    end
end



w1=w;
u=length(xt);

cvx_begin 

    variables yf(s) 
        
        minimize ( ((1/2)* norm(w1,2)^2) );
       
        
        subject to 
        
        for i=1:m
            (yf(i)* ((w1*(xt(i,:))') -1 )) >= 0;
        end
        
cvx_end 



for i=1:length(yf)
    if(yf(i)<=0)
        yf(i)=-1;
    else
        yf(i)=1;
    end
end
    
    c=0;
    L = labelsT;
for i=1:length(L)
    if(L(i)~=yf(i))
       c=c+1;
    end
end

error = (c/length(L))*100
(Mark L. Stone) #2

In your 2nd CVX invocation, CVX is able to analytically determine an optimal solution without calling the solver. Specifically, the only constraints are for i from 1 to n,
(yf(i)* ((w1*(xt(i,:))') -1 )) >= 0
Clearly, as CVX determines analytically, yf = vector of all zeros is feasible. w is not declared as a variable (nor is it a CVX expression), therefore there is nothing to optimize, and this is just a feasibility problem, for which yf = vector of all zeros is feasible, and therefore, “optimal”.

The value of n comes out to 1, which I suspect is not what you want. Bit that is a MATLAB, not CVX matter for you to deal with. As a result, in your first CVX invocation,
variables w(n) declares a scalar variable, which I doubt is what you want.

Rather than using (1/2)* norm(w1,2)^2) as the objective function, it would be better to use norm(w1,2), which is equivalent in the sense of producing the same argmin in exact arithmetic, but is more numerically stable and reliable.

If your objective did not produce an error message, you must have been using CVX 3.0beta. I recommend you use CVX 2,.1 instead, because CVX 3.,0beta has many bugs, and may produce the wrong answer without producing any error or warning messages. If you do use CVX 2.1, then I recommend you don’t square norm, but if you really want to, then use square_pos(norm(w1,2)) to comply with CVX 2.1’s stricter DCP rules than what CVX 3.0 beta allows.

As for “training code doesn’t work” when number of samples is over 1000, specifically what happened? I tried the code with s = 2000, and it worked as well (if you want to call it that) as with s = 1000.

I didn’t look very carefully at what you did, so don’t assume everything not mentioned is correct. Nor did I examine at all whether your statistical procedure makes any sense.