Multinomial loss with group lasso optimization problem

Ynot · July 25, 2013, 3:07am

Please help. I am new to cvx.
I am trying to use cvx to solve this problem with respect to \beta:
$$\text{minimize} -\sum_{i=1}^n\sum_{k=1}^Ky_{ik}\log p_{ik}+ \lambda\sum_{k=1}^{K-1}|\beta_k|2$$
$$\text{where } p{ik}=\frac{\exp(x^t_i\beta_k)}{1+\sum_{i=1}^{K-1}\exp(x_i^t\beta_k)}$$

I have tried to use for loops with an expression bellow but was not successful

$$ \text{minimize} -\sum_{k=1}^{K-1}[y(:,k)^{t}X(:,:,k)\beta(:,k)-\sum_{i=1}^n \log(1+\sum_{i=1}^{K-1}\exp(X(:,:,k)\beta(:,k))] + \lambda\sum_{k=1}^{K-1}|\beta_k|_2 $$

My data structure as follows:

n=number of observations
p=number of variables
K=number of groups (the last group is the reference group)
size(X)=n x p x K-1
size(y)=n x K-1
size(\beta)=p x K-1

X and y are replicated to have dummy variables for each category

How can I restructure my variables to avoid for loops?

Thank you

Bien · July 25, 2013, 6:22am

Yes, this is possible without any for loops. You can keep X as an n\times p matrix without duplicating it K-1 times (at least that’s what I think you were doing?). The first term in the negative log-likelihood is simply -\langle Y,XB\rangle. For the next term, you can use the CVX function log_sum_exp which when applied to a matrix A gives a vector with $j$th component \log(\sum_i\exp A_{ij}). In your case, the matrix A you want is the transpose of [XB:0_n]. The zero vector gives the 1 that we need within each \log. The result of this log_sum_exp expression is an n-vector which needs to be multiplied by the matrix Y (this Y term is missing in your expression above) and then summed. Finally, the penalty can be expressed compactly as sum(norms(B)).

In summary, I think this is a correct CVX-translation of your problem that doesn’t use any for loops:

cvx_begin
variable B(p,K-1)
minimize(-trace(Y'*X*B)+sum(log_sum_exp([X*B,zeros(n,1)]')*Y)+lambda*sum(norms(B)))
cvx_end

Ynot · July 25, 2013, 2:49pm

Thank you! I will try this