Formulating Sparse Group lasso in CVX


I would like to implement sparse group lasso using CVX, but I do n’t know how formulate it.
Does anyone help me?


At top level, this will be a non-convex optimization problem, right? However, if you fix the blocks, do you then have a convex optimization problem which can be written in compliance with CVX’s rules (using for instance 1-norm and 2-norm)? If so, do you have a higher level optimization structure to optimize over block selection within which to embed the convex optimization inner problem resulting from fixing the blocks? if so, you could call CVX to solve the inner problem within a top level loop you write in MATLAB. I am not telling you whether such an approach, even if possible, will be “good”.

If you can figure that out, you may be set. If not, you may need to get help elsewhere.

I suggest you offer us what you have tried, and if we can find issues, we can suggest corrections. But we’re not in the business of offering models from scratch here; you’ll need to learn to use CVX by studying the documentation and the examples.

Hi @mcg ,

Finally, I wrote something and I d’ like to know your opinion. If there is something wrong, please let me know.

Mathematical formulation:

Let Y=[n \times 1] and X=[n \times p] be the response and independent variables, respectively, where n is the number of samples and p is the number of variables. In addition, the set of variables have been broken down into m subgroups, X^{(1)}, X^{(2)}, \ldots, X^{(m)} where each X^{(l)} is an n \times p_l matrices where p_l is the number of covariates in group l.
We aim to find \beta values and Y^{(0)} which minimize the below equation:

\min_{\beta,Y^{0}} \bigg{ \big\lVert Y-\sum_{l=1}^{m} X^{(l)} \beta^{(l)} - Y^{0}\big\rVert_{2}^{2} + \lambda_1 \sum_{l=1}^{m} \sqrt{p_l} \big\lVert \beta^{l} \big\rVert_2 + \lambda_2 \big\lVert \beta^{(l)} \big\rVert_1 \bigg}

subject to \beta^{(l)} \leq 0.

Implementation in MATLAB using CVX:

m = size(unique(groups),1); % Number of groups
p = size(X,2);              % Number of variables
n = size(X,1);              % Number of samples

group_idx   % is a logical matrix [m,p]; Each row is a group, columns are variables
not_group_idx = ~group_idx'  % I use it in CVX statements
sqr_group_sizes % is a vector [m,1] square root of group sizes

    variables beta(p,m) Y0(1)
    minimize( square_pos (norm (Y - sum (X * beta, 2 ) - Y0 *ones (n, 1) ) ) + l1 * norms( beta, 2, 1) * sqr_group_sizes + l2 * sum( abs( beta(:) ) ) )

subject to
        beta <= 0 
        % Zero-Constraint for grouping
        beta(not_group_idx) == 0 

I am looking forward to hearing from you.

I haven’t done a careful check on what you’ve done - I’ll defer to mcg or others more familiar with Sparse Group Lasso for that. I will note that while not incorrect, you don’t need that ones(n, 1) multiplying Y0.

You appear to just be showing us the inner loop in the approach I described above, in which you would also need to implement an outer loop to optimize over the groupings. So that is works which perhaps you still need to perform.

What do you mean by “outer loop to optimize over the groupings”?
I have considered it as constraint:

subject to
     beta(not_group_idx) == 0

I withdraw all my comments in this thread. I was thinking group compositions were optimally determined, not provided as input. I defer to others to determine the correctness or goodness of your approach.

As far as I can tell the CVX model looks like a faithful representation of the mathematical formulation you have presented. I cannot speak to the effectiveness of the model itself. Note that you can also use sum_square instead of square_pos(norm( )).

Thanks for your suggestion.
I did not know about sum_square.

It seems your example doesn’t allow different group sizes (beta(p,m) in cvx variable declaration). However, group LASSO does. You may refer to the following example
(The formula used is Eq.3 in Simon, Noah, and Robert Tibshirani. “Standardization and the group lasso penalty.” Statistica Sinica 22.3 (2012): 983.)

% Refer to Eq. (3) in /Simon, Noah, and Robert Tibshirani. 
%    "Standardization and the group lasso penalty." 
%    Statistica Sinica 22.3 (2012): 983./

% Note that group LASSO allows different group sizes
N = 64; m = 3;   
rho = [2; 4; 6];  % group sizes
n = sum(rho); % num of total parameters
X = rand(N,n);   % X = [X1, X2, ..., X_m]
y = rand(N,1);
lambda = 1;

IndexM = [1, 2; 3, 6; 7, 12];  % indexes of elements in each group
    % w = [beta1'; beta2'; ...; beta_m']
    variable w(n)    
    expression ws(m)
    for i = 1:m
        ws(i) = norm(w(IndexM(i,1):IndexM(i,2)),2);
    minimize( norm(y-X*w, 2) +  lambda*(sqrt(rho)' * ws) )

% get beta_i, i.e. i-th beta corresponding to i-th group
% e.g. 
i = 2;
beta_i = w(IndexM(i,1):IndexM(i,2));

Oh, I am wrong. It supports different group sizes. I overlooked the constraint part.

I know you’ve ever implemented the group lasso using CVX toolbox,I would like to know which algorithm is used to solve the optimization problem?Gradient descent or something else?

Thank you.

The algorithm used depends on which solver you specify in CVX. Fortunately, none of them use gradient descent.

Thanks.I got it.I used the default solver sdpt.

CVX does some pre-processing, then provides the problem to SDPT3, which uses an infeasible primal-dual predictor-corrector path-following method . That is quite different than gradient descent.

Thanks a lot,I will study it.

I am currently studying and implementing Sparse group lasso in a case with 1056 features in input, divided into 176 groups. Is it too big problem for the cvx optimization?
Thanks in advance