Minimizing trace inverse of grammian :/

perplexabot · November 11, 2015, 3:52am

Hi all,
I am trying to minimize an objective function where one of the components is trace_inv(G.’*G), where .’ denotes transpose (using matlab syntax) and I get the following back
"Only scalar quadratic forms can be specified in CVX"

I understand that CVX only takes specific functions and doesn’t accept other expressions not in DCP ruleset. I have also read the following post Trace of $S^{-2}$ in cvx which is very similar to my question but it doesn’t seem to help me much. I know I have to formulate my current expression into one that CVX can tolerate, but I am not sure how tho.

I was able to show that trace_inv(G.’*G) = ||vec(G^-1)||^2 but this expression is also not CVX friendly due to the inverse. I am not here for a solution to my problem (that would be nice tho ) but guidance to what my next step should be. I have been at this issue for a while and have started to become frustrated.

Help and comments are much appreciated.

EDIT: I should have mentioned G is a matrix.

Thank you,
Perplexabot

mcg · November 11, 2015, 8:10pm

Is it convex?(I don’t think so.) But if it is, I’m afraid I don’t have a solution for you. If you can’t express it according to the DCP rules, CVX can’t solve it, convex or not.

perplexabot · November 11, 2015, 9:25pm

Thank you for your time and reply. I believe ||vec(G^-1)||^2 IS convex. We know (or can show) that the norm squared of a vector is convex. If we assume G to be positive definite, then inv(G) is also positive definite, which means we can vectorize it. So norm squared of a vector is convex, as far as I can tell.

I will meditate some more upon this problem, maybe I can force feed it to CVX

Thanks again!

mcg · November 11, 2015, 9:36pm

I am afraid that is not a valid proof of convexity.

perplexabot · November 11, 2015, 9:43pm

Thank you, I will delve deeper into the abyss.

I can’t believe the cvx king himself is chatting with me
Thanks again, you are awesome!

perplexabot · November 11, 2015, 10:47pm

I found this online: http://math.stackexchange.com/questions/297635/is-the-trace-of-inverse-matrix-convex
That may be a valid proof for convexity, only difference is that my “G.’*G” is taken to be “S”.

I’ll continue to push forward.

mcg · November 11, 2015, 10:59pm

Yes, we support that in CVX; see trace_inv. But that matrix is symmetric; yours is not, and/or it involves a square. So that proof is insufficient for you.

perplexabot · November 12, 2015, 1:26am

Thank you for your time.

I played some more with my function and came to the following (nothing fancy, just using property of trace):
$$tr[(G^TG)^{-1}] = tr[G^{-1}G^{-T}] = tr[AA^T] = \sum_{i,j}A^2_{ij} = \sum_{i,j}G^{-2}_{ij}$$

Once I saw this form, I then said to myself, if we restrict the domain (the choice of elements of G) to be strictly positive ,then, if I am not mistaken, the objective function is convex.

What do you think?

mcg · November 12, 2015, 2:49am

Sorry, no. There simply is no general principle about the convexity of the inverse of the matrix. Even if the matrix is known to be positive definite.

mcg · November 12, 2015, 4:58pm

On reflection, I think it may be possible to adapt the technique offered by Johan Löfberg in this post to your problem. In this post, he converts \mathop{\textrm{Tr}}(S^{-2})\leq t to
$$\begin{bmatrix} X & I \ I & S \end{bmatrix} \succeq 0 \qquad
\begin{bmatrix} Z & X \ X & I \end{bmatrix} \succeq 0 \qquad \mathop{\textrm{Tr}}(Z)\leq t$$
So in your case, this might be the right reformulation:
$$\begin{bmatrix} X & G \G^T & I \end{bmatrix} \succeq 0 \qquad
\begin{bmatrix} Z & X \ X & I \end{bmatrix} \succeq 0 \qquad \mathop{\textrm{Tr}}(Z)\leq t$$
But I just don’t know, and won’t be able to tell without testing. And that I don’t have time to do If you try it, and it works, let us know!

perplexabot · November 12, 2015, 5:39pm

Thank you for the very helpful replies. Now I wonder why you disagree with my previous message. I understand you say it is wrong, and I take your word, but I would like to know why… I will try to figure that out.

I will go back to Johan’s reply and see what I can do. I will post my results if it works.

Thank you,
Perplexabot

mcg · November 12, 2015, 6:02pm

It is wrong because your proof relies on invalid principles. An incorrect proof of a true statement is still invalid.

And again, just proving convexity is not enough to use a function with CVX. You can only use the rules CVX knows about.

perplexabot · November 12, 2015, 7:28pm

I understand just proving convexity is not enough, but if I am able to do that then I can go about and try to mold my function into something acceptable by CVX with a little more confidence.

I still have a feeling it is convex (only when the domain is restricted, specifically as I said earlier, to having all elements of G to be strictly positive). So… when in doubt, take the hessian. Continuing with where I left off earlier with my math:

$$f(G) = tr[(G^TG)^{-1}] = tr[G^{-1}G^{-T}] = tr[AA^T] = \sum_{i,j}A^2_{ij} = \sum_{i,j}G^{-2}{ij}\ \text{where } G{ij} \text{ is the element in the ith row and jth column of G}$$
Taking the jacobian first $$\begin{equation}
\nabla f(G)=\begin{bmatrix}
-2G_{11}^{-3} \
-2G_{12}^{-3} \
\vdots \
-2G_{mn}^{-3}
\end{bmatrix}
\end{equation} \text{, where G is an m x n matrix}$$
Then taking the hessian
$$\nabla^2f(G) = \begin{bmatrix}
6G_{11}^{-4} & 0 & 0 & \dots & 0 \
0 & 6G_{12}^{-4} & 0 & \dots & 0 \
\vdots & & & \ddots & \
0 & 0 & 0 & \dots & 6G_{mn}^{-4}
\end{bmatrix}$$
Now if we use schur’s complement iteratively we can show that \nabla^2f(G) is positive semidefinite if we choose all elements of G to be strictly positive. Or in another way, since the hessian is a diaganol matrix, the elements along the diagonal are the eigenvalues, if we restrict the domain to make these values positive (all elements of G be positive) we see that the hessian is positive definite which means our objective function is convex!

What do you think of this!?

mcg · November 12, 2015, 8:02pm

There’s no way that the Hessian is diagonal. Can’t possibly be the case. Here’s a good reference for matrix calculus: http://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

perplexabot · November 12, 2015, 8:28pm

Hmmmm, I may have taken my jacobian and hessian with respect to vec(G) rather than G (without realizing). If I took the the jacobian and hessian with respect to G I would end up with a m x n x q dimensional matrix. I wonder if taking the derivative with respect to vec(G) instead of G can be done somehow to prove convexity.

Thanks for the matrix cookbook reference. I have that pdf already saved on my computer, I love that pdf : )

mcg · November 12, 2015, 8:58pm

Using vec or not should not change the result in any way except the dimension. The real problem seems to be an improper handling of the matrix inverse.

perplexabot · November 13, 2015, 12:38am

Ok… So first of all, I realize what you mean about not handling the inverse correctly, how dumb of me. Just embarrassing. Thank you for pointing that out.

So I have been trying to prove convexity (assuming that is even the case) in other ways and have been struck down every time. I’m completely drained for now. I will be back at it very soon tho. I am starting to wonder if I should take the fact that this post is already in the “nonconvex” catagory as a hint that it indeed is not convex/concave. I have a couple of methods on my mind that I will try soon. I wish there was a standard method to prove nonconvexity (other than by example, or by case).

mcg · November 13, 2015, 1:24am

Oh, let me clarify something. If the technique I referred to above, from that Stack Exchange post, works for your problem, then your problem is convex! And that is actually a proof. There are lots of ways to prove convexity. In fact, I think the likelihood that it is indeed convex is high enough that I am going to remove the nonconvex tag.

perplexabot · November 13, 2015, 3:31am

Thank you, thank you, thank you!
I will now devote my time to trying the above method.
I will report back when I have something!

perplexabot · November 16, 2015, 9:09pm

Hey again! So I got intimate with Schur complements over the weekend and have come to the following conclusions.

First, for the sake of clarity and aid, these are the important inequalities for the shur matrices in the \mathop{\textrm{Tr}}(S^{-2})\leq t case (only final answer is shown):

for the first matrix (the matrix with the S in the bottom right), the important inequality is X \geq S^{-1}
for the other matrix, the important inequality is Z \geq X^2

Note1: Seems understandable to me!
Note2: There is a statement in the original link that also confirms my findings.

Second, the case you provided, for \mathop{\textrm{Tr}}((G^TG)^{-1})\leq t:

for the first matrix (the matrix with G in it), the important inequality is X \geq GG^T
for the other matrix, the important inequality is Z \geq X^2

Note3: It seems to me that the first matrix needs G and G^T switched.
Note4: It seems to me that the second matrix achieves an inequality that is not needed in this case.

Third, is a case where I made the matrices according to how I see fit (learning from the first two cases), here is what I have:

$$\begin{bmatrix} X & G^T \G & I \end{bmatrix} \succeq 0 \qquad \begin{bmatrix} Z & I \ I & X \end{bmatrix} \succeq 0 \qquad \mathop{\textrm{Tr}}(Z)\leq t$$

for the first matrix (the matrix with G in it), the important inequality is X \geq G^TG
for the other matrix, the important inequality is Z \geq X^{-1}

Note5: I’m not so sure about this, still thinking about it…

So, I am currently not with my desktop computer to stick this in CVX and see what happens, I will be doing that later today. In the mean time, I am trying to familiarize myself with the math. Any pointers, comments, and remarks are greatly appreciated.

Thank you for reading : )

Minimizing trace inverse of grammian :/

Note3: It seems to me that the first matrix needs G and G^T switched. Note4: It seems to me that the second matrix achieves an inequality that is not needed in this case.

Note3: It seems to me that the first matrix needs G and G^T switched.
Note4: It seems to me that the second matrix achieves an inequality that is not needed in this case.