# Multinomial loss with group lasso optimization problem

I am trying to use cvx to solve this problem with respect to \beta:
$$\text{minimize} -\sum_{i=1}^n\sum_{k=1}^Ky_{ik}\log p_{ik}+ \lambda\sum_{k=1}^{K-1}|\beta_k|2$$
$$\text{where } p {ik}=\frac{\exp(x^t_i\beta_k)}{1+\sum_{i=1}^{K-1}\exp(x_i^t\beta_k)}$$

I have tried to use for loops with an expression bellow but was not successful

$$\text{minimize} -\sum_{k=1}^{K-1}[y(:,k)^{t}X(:,:,k)\beta(:,k)-\sum_{i=1}^n \log(1+\sum_{i=1}^{K-1}\exp(X(:,:,k)\beta(:,k))] + \lambda\sum_{k=1}^{K-1}|\beta_k|_2$$

My data structure as follows:

• n=number of observations
• p=number of variables
• K=number of groups (the last group is the reference group)
• size(X)=n x p x K-1
• size(y)=n x K-1
• size(\beta)=p x K-1

X and y are replicated to have dummy variables for each category

How can I restructure my variables to avoid for loops?

Thank you

Yes, this is possible without any for loops. You can keep X as an n\times p matrix without duplicating it K-1 times (at least that’s what I think you were doing?). The first term in the negative log-likelihood is simply -\langle Y,XB\rangle. For the next term, you can use the CVX function log_sum_exp which when applied to a matrix A gives a vector with $j$th component \log(\sum_i\exp A_{ij}). In your case, the matrix A you want is the transpose of [XB:0_n]. The zero vector gives the 1 that we need within each \log. The result of this log_sum_exp expression is an n-vector which needs to be multiplied by the matrix Y (this Y term is missing in your expression above) and then summed. Finally, the penalty can be expressed compactly as sum(norms(B)).

In summary, I think this is a correct CVX-translation of your problem that doesn’t use any for loops:

cvx_begin
variable B(p,K-1)
minimize(-trace(Y'*X*B)+sum(log_sum_exp([X*B,zeros(n,1)]')*Y)+lambda*sum(norms(B)))
cvx_end

Thank you! I will try this