TFOCS for mixed trace and $\ell_1$-norm minimization

SB7 · May 19, 2015, 8:58pm

I would like to solve
[\begin{aligned}
\min_{X\succcurlyeq 0}& &\mathsf{trace}\left(X\right)+\lambda \lVert X \rVert_1\
\text{subject to} & &\lVert \mathcal{A}\left(X\right)-b\rVert_2 \leq \varepsilon
\end{aligned}]
numerically, where \lambda\geq0, \varepsilon\geq 0, b, and the linear operator \mathcal{A} are given. Also \lVert X\rVert_1 denotes the \ell_1-norm of X. I wonder if it’s possible to implement this program in TFOCS.

I tried to apply tfunc_sum to get a mixture of prox_trace and prox_l1 as the objective function, but that doesn’t seem to work. (the error message was saying there was too many input arguments when the objective was called in one of the subroutines.)

I’d appreciate it if you can tell me if and how I can implement the optimization above in TFOCS.

Stephen_Becker · May 19, 2015, 10:07pm

tfunc_sum creates the right object, but it’s not usable because you’d have to know the proximity operator of two functions, which is not generally known. So the standard approach would be to pick one of those functions and dualize it, since then we can handle as many of these terms as we like. In that case, no need to use tfunc_sum, just add it to the cell array of dual variables.

Note that if it was just the l1 norm and the trace, then because the trace is linear, we could do the prox in closed form. However, the PSD constraint interferes.

To see how to work with this, look at my RPCA demo as a starting point. You can keep either trace (and PSD) as primal, and dualize the l1 norm (so use proj_l1), or vice-versa (so keep prox_l1 and use proj_spectral(1,'symm') ).

SB7 · May 20, 2015, 1:29am

Thanks. It seems to be working, although I guess you meant proj_linf rather than proj_l1 to dualize the \ell_1-term. I have also added an affine constraint with identity operator corresponding to the extra dual variable.

SB7 · August 5, 2015, 7:58pm

@Stephen_Becker I have a follow up question. Suppose that we keep the TFOCS’ options (e.g., tolerance, max iterations, etc) and the smoothing parameter \mu the same. Is it reasonable to expect that the solution to the above convex program, as solved by TFOCS, tends to the solution of the convex program

\begin{align*} \min_{X} \ \lVert X \rVert_1 \tag{P$_\infty$}\\ \text{subject to } & \lVert \mathcal{A}\left(X\right)-b\rVert_2 \leq \varepsilon, \end{align*}

when \lambda tends to \infty?

I think that is the ideal behaviour, but it seems that for any \lambda<\infty, TFOCS’ solution is significantly different from that of \mathrm{(P_\infty)}.

UPDATE: I have to clarify that I encountered the issue described above when I keep the trace norm (i.e., prox_trace) in the objective and dualize the \ell_1-norm. When \ell_1-norm is in the objective and I dualize the trace norm via proj_maxEig everything seems reasonable. Is there something special about having objective functions that operate on PSD matrices?

shirleyxq · October 7, 2022, 1:43pm

Hi SB7,

Thanks for your post! My name is Shirley and I’m a PhD student at Cambridge. I’ve been solving the exact same mixed-norm minimization problem. However, unfortunately I don’t have much background in convex optimization, so haven’t figured out how to realize my mixed norm objective function + a PSD constraint. Would you be kind enough to share your source code with me? I’d be grateful if you could either share your code here or email me at xl394@cam.ac.uk.

Thanks so much
Shirley

shirleyxq · October 7, 2022, 2:18pm

Hi Stephen,

I’m an information theory PhD student at Cambridge, and am relatively inexperienced in convex optimization. Could you elaborate what you meant by “dualize” one of the terms in the objective function? Are there some lectures/ lecture notes that I can look into?

Thanks,
Shirley

Mark_L_Stone · October 7, 2022, 2:30pm

@shirleyxq See my post at How to include PSD constraint in TFOCS . Unfortunately, TFOCS developer @Stephen_Becker was last “seen” on this forum Nov 12, 2016, although perhaps your post will trigger a notification to him.

Stephen_Becker · October 10, 2022, 8:45am

Hi Shirley,

By dualize, I meant find the Fenchel-Legendre conjugate. Often you can find this very quickly if you know a few tricks, e.g., the conjugate function of a norm is the indicator function of the dual norm (so, the proximity operator of the l1 norm is the projection operator onto the l_infinity ball, and vice-versa).

Unfortunately I don’t have time to help with individual codes these days, but for more references on the conjugate function, you could try the books by Amir Beck (his '17 book) or Combettes & Bauschke, or try these:

lecture notes from my class; see page 51 at https://github.com/stephenbeckr/convex-optimization-class/blob/master/TypedNotes/lecture_notes.pdf
SIGNAL RECOVERY BY PROXIMAL FORWARD-BACKWARD SPLITTING, https://pcombet.math.ncsu.edu/mms1.pdf
Proximal Splitting Methods in Signal Processing, https://link.springer.com/chapter/10.1007/978-1-4419-9569-8_10

Best,
Stephen

shirleyxq · October 10, 2022, 8:17pm

Hi Stephen,

Thanks so much for your kind reply. I find your lecture notes really useful.

Best wishes,
Shirley

shirleyxq · November 13, 2022, 5:40pm

Hi Stephen,

I hope you’ve been doing well

Further to our messages, I think I have implemented the mixed norm minimization problem correctly in TFOCS, where I kept the trace and PSD as primal and dualized the l1 norm. I compared the optimal solution I obtain from TFOCS with that from CVX, and they indeed match each other.

However, since I only dualized part of my original problem, I’m not sure how to write down the convex program that I implemented. Could you kindly help me with this?

Thanks so much,
Shirley

Stephen_Becker · November 16, 2022, 4:36pm

I can try to help with that, that I can’t guarantee to have too much time to put on this. Can you start by writing down the original problem before the dualization? (I think you can use latex markup here). Also, can you then give me the most relevant snippets of TFOCS code?

shirleyxq · November 16, 2022, 11:18pm

Hi Stephen,

Many thanks for getting back to me.

The problem I was looking at is to estimate a sparse and low-rank matrix X_0 from its noisy linear measurements y=\mathcal{A}(X_0)+z by solving the convex program:
\hat{X}_0 \in \arg\min_{X\succeq 0} \|X\|_* +\lambda \|X\|_1 subject to \|\mathcal{A}(X) - y\|_2\le \varepsilon.

And here are the key lines of code where I use TFOCS:

opts = [];
opts.alg = ‘AT’;
opts.continuation = true;
opts.maxIts = 1000;
opts.printEvery = 0;
opts.stopCrit = 4;
opts.tol = 1e-5;
mu = 1e-5;

% oymak_A and oymak_A_adjoint are the linear operator A and its adjoint A*
A_h = @(X)oymak_A(A, X);
A_adjoint_h = @(y)oymak_A_adjoint(n, A, y);
A_op = linop_handles({[n n], [m 1]}, A_h, A_adjoint_h, ‘R2R’);

prox = {proj_linf, prox_l2( sqrt(noiseVar) )};
[XHat, out_oymak] = tfocs_SCD(prox_trace(1/lambda), …
{1, 0; A_op, -y}, prox, mu, [], [], opts);

How could I write down explicitly the convex program that I implemented?

Thanks so much,
Shirley

Stephen_Becker · December 5, 2022, 12:14pm

Hi Shirley,
Sorry for taking a while to get back to you. If you set opts.continuation = false then it is solving the problem:
\hat{X} \in \arg\min_{X\succeq 0} \|X\|_* +\lambda \|X\|_1 + \frac{\mu}{2}\|X-X_0\|_F^2 subject to \|\mathcal{A}(X) - y\|_2\le \varepsilon,
where X_0 is your initial guess (if not provided, then this is by default the all zero vector or matrix).

With opts.continuation = true then it solves a sequence of the above problems, each time updating X_0 to be the output from the previous run. This is just a special case of the “proximal point” algorithm, and if you run enough iterations, this converges to the solution of your original problem,
\hat{X} \in \arg\min_{X\succeq 0} \|X\|_* +\lambda \|X\|_1 subject to \|\mathcal{A}(X) - y\|_2\le \varepsilon.

So the short answer is that TFOCS is trying to solve your original problem; the longer answer is that both the inner and outer (“continuation”) solve are not run for an infinite number of iterations, so you don’t have the exact solution.

A reasonable thing to do, to double-check everything, is to turn opts.continuation=false and solve that \mu- regularized version (but with a small problem size), and then solve this using CVX, and compare the two answers. (You can also compare TFOCS and CVX on the full problem! CVX should be more accurate but will have a harder time scaling to very large problems)

shirleyxq · December 6, 2022, 11:20am

Hi Stephen,

Thanks so much for getting back to me. I really appreciate your time! Thanks for confirming that my implementation solves my original problem.

Indeed, I observed that CVX gives approximately the same solution as TFOCS when I set opts.continuation = true, but doesn’t scale to larger problems.