is a convex problem (see the details in the link above)

I know the special case (K=2 which is the logistic regression) can be solved with CVX and I have seen the formulation, does anyone knows the formulation for the K > 2 (the general softmax regression)?

Here’s a start. I found this nearby link a bit cleaner to read. We have
$$P(y=k|x) = \frac{e^{x^Tw_k}}{\sum_j e^{x^Tw_j}} = \frac{1}{\sum_j e^{x^T(w_j-w_k)}}$$
The cost function is
$$J(\theta) = -\sum_{i=1}^m \sum_{k=1}^K 1{y^{(i)}=k} \log \frac{1}{\sum_j e^{x^T(w_j-w_k)}}$$
$$J(\theta) = \sum_{i=1}^m \sum_{k=1}^K 1{y^{(i)}=k} \log \sum_j e^{x^T(w_j-w_k)}$$
$$J(\theta) = \sum_{i=1}^m \sum_{k=1}^K 1{y^{(i)}=k} \log \left(e^0 + \sum_{j\neq k} e^{x^T(w_j-w_k)}\right)$$
This seems like just a sum of log_sum_exp functions, so I don’t see a problem with representing this in CVX, although for most solvers of course it requires the use of the successive approximation method. The ECOS solver has added support for the exponential cone recently, which allows us to avoid the successive approximation method. It’s still a bit rough around the edges, but soon we’ll be able to solve problems like this in CVX more natively.