I also found splitting interrupts between the two cores helps with latency, but even if one core has only a single interrupt, that interrupt latency is increased compared to a single core system with a single interrupt. I suspect this is at least partly because they only put a single fetch pipe between the instruction cache and the crossbar.
My playing with C3 betrayed that you have to use much larger buffers for things like i2s to make it work without glitching.