A partial solution. First, we are separable, thankfully, so wlog let x be a scalar. The prox of x \log x is known, and it can be done using the Lambert W function (e.g. see Pesquet and Combettes’ review article which has a nice table of prox functions). I tried to use the Lambert W function for x \log(x) + (1-x)\log(1-x) using but without success. However, I think we can evaluate the function implicitly following the techniques used for Lambert (e.g., see Lambert W fcn evaluation ). This is probably much better than a Taylor series.
If this works well, we can incorporate it into a standard TFOCS atom.