Yes, TFOCS can handle this. You have two options:
- include the f(x)=\|Ax-b\|_2^2 term in the smooth part. You express worry that
tfocs_SCD.m
cannot handle gradients with an affine term. It’s true that we don’t include a specific method to generally handle affine terms, but that doesn’t mean you can’t do it (unlike the projections, there’s no benefit for us explicitly handling it). So you can either
– write your own function that computes \nabla f(x), which is very easy since it’s just \nabla f(x) = A^T( Ax-b), or…
– use our pre-built function smooth_quad.m
which does this for you (and handles affine terms). This is my recommendation. To use smooth_quad
, we don’t separate out “A” and “b” the same way we did for projections, rather we use it like:
smooth_op = smooth_quad( A'*A, -2*A'*b, b'*b );
[x , out] = tfocs_SCD(smooth_op, {B, 0}, prox_linf, mu ,[] ,[] , opt)
The smoothing term \mu\|x-x_0\|_2^2 is added to smooth_op
automatically.
- you could use the quadratic term as a proximal term if you wished, since we can compute the proximity operator of a quadratic, but it is likely that this approach will be a bit slower than taking advantage of its smoothness and using derivatives
Update, in response to question 703)
I misspoke earlier: while tfocs.m
takes advantage of smooth terms, tfocs_SCD.m
does not. But we can still solve the problem. As mentioned in the question, we can solve
\min \,\beta\|B x\|_1 \;\mathrm{such\,that}\; \|Ax - b\|_2 \le \epsilon
using solver_sBPDN_W.m
, which in turn calls TFOCS in the following fashion:
tfocs_SCD( [], { A, -b; W, 0 }, prox, mu, x0,...)
where prox = { prox_l2( epsilon ), proj_linf(beta) }
Now, to solve
\min \, \alpha\|Ax - b\|_2^2 + \beta\|B x\|_1, \; x \in R^n
keep almost the same calling sequence, but with AtA = A'*A
and Atb = -2A'*b
, i.e.
tfocs_SCD( [], { AtA, -Atb; W, 0 }, prox, mu, x0,...)
and then change the prox to
prox = { smooth_quad( alpha ), proj_linf(beta) }
For more efficiency, see solver_sBPDN_W.m
for some scaling tricks that make sure A and B have roughly the same scale.
The previous solution I posted works, but it actually relies on the ability of smooth_quad
to compute a proximity operator (rather than taking a gradient). Using that method works, but it will need to solve the linear system of equations I + \tau A^TA every time it is called. When A is a scalar or diagonal, this is trivial, but for large matrices it will be slow. Our new release will make this faster by computing a one-time factorization (a bit slow) to make all subsequent calls very fast. But still, for very large systems, this new approach (in the updated response) is the way to go.