I am looking for a way to solve millions of small SDP problems in an extremely fast manner.
Each SDP problem is independent of the others and all problems are identical (except the data it self of course).
These SDP problems are quite simple but I guess that they are not simple enough to be solved analytically. They have exactly 6 variables each.

M is a symmetric 3x3 matrix that I would like to find. Due to symmetry there are 6 variables in this problem.
M0 is the data. It is another symmetric matrix (with constants).

I would like to find M that is symmetric positive definite matrix while minimizing the convex quadratic energy |M-M0|^2.

Using a for loop with many small CVX problems is not going to do it.
Using a large CVX system with many constraints might work better (even though it’s a waste not to tell the solver that the constraints are independent). But all these are unacceptable.
I need something way faster that can run on native C++ and even utilize parallelism.
As the problems are independent, achieving parallelism is quite trivial.

Moreover, all the problems have the same structure and same memory access pattern so this is even perfect for a GPU.

The best would be to solve this analytically but I am not sure this is possible and even so, it might be hard to implement and can eventually end out to be slower than a decent iterative solver.

The closest thing I can think of is CVXGEN which allows you to create custom C++ code solvers for small problems. It is fantastic. However it only supports Quadratic Programming (no SOCP or SDP).

BTW, accuracy is not that important to me at this point.

How about using MOSEK Fusion API for C++? Or if the overhead is too high doing that, then the MOSEK kow-level C API. Neither of these approaches would involve CVX, so this forum would not be appropriate for providing further implementation assistance.