1. Anukari runs up to 16 entire copies of the physics model for polyphony, so 16...

1. Anukari runs up to 16 entire copies of the physics model for polyphony, so 16 * 1024 * 48K (I should update the blog post)

2. Users can arbitrarily connect objects to one another, so each object has to read connections and do processing for N other entities

3. Using the full CPU requires synchronization across cores at each physics step, which is slow

4. Processing per object is relatively large, lots of transcendentals (approx OK) but also just a lot of features, every parameter can be modulated, needs to be NaN-proof, so on

5. Users want to run multiple copies of Anukari in parallel for multiple tracks, effects, etc

Another way to look at it is: 4 GHz / (16 voice * 1024 obj * 4 connections * 48,000 sample) = 1.3 cycles per thing

The GPU eats this workload alive, it's absolutely perfect for it. All 16 voice * 1024 obj can be done fully in parallel, with trivial synchronization at each step and user-managed L1 cache.