If you have 2 unicorn servers and you happen to get 3 slow requests routed to it, you are still screwed, right? Seems to me like it will still queue on that dyno.
That's exactly what happened to us - switching to unicorn bought us a little time and a bit of performance, but we hit the exact same problems again after a couple more weeks of growth.
Yeah, the only real question is whether or not it's true that they no longer do intelligent routing. If that is the case, then regardless of anything else the problem exists once you pass a certain scale/request cost. It won't matter if that one dyno can handle hundreds of requests at once, it will still queue stupidly.
This is true - unicorn masks the symptoms for a period of time but does not solve the underlying problem in the way a global request queue would.
Also, if the unicorn process is doing something cpu intensive (vs waiting on a 3rd party service or io etc) then it won't serve 3 requests simultaneously as fast as single processes would.
One of the hidden costs of Unicorn is spin-up time. Unicorn takes a long time to start, then fork. We would get a ton of request timeouts during this period. Switching back to Thin, we never got timeouts during deploys - even under very heavy load.
Maybe this is a stupid question, but with unicorn it forks the request and can process multiple requests at the same time. Previously it seems that only one request could be handled by the dyno so requests had to queue on the dynamic routing layer but with multiple request support with unicorn or whatever, wouldn't it be more efficient to dump all the requests to dynos? Followup question, also how would intelligent routing work if it just previously checked to see if which dyno had no requests? That seems like an easy thing to do, now you would have to check CPU/IO whatever and route based on load. Not specifically targeted at you but to everyone reading the thread.
> Previously it seems that only one request could be handled by the dyno so requests had to queue on the dynamic routing layer but with multiple request support with unicorn or whatever, wouldn't it be more efficient to dump all the requests to dynos?
It would be if all requests were equal. If all your requests always take 100ms, spreading them equally would work fine.
But consider if one of them takes longer. Doesn't have to be much, but the effect will be much more severe if you e.g. have a request that grinds the disk for a few seconds.
Even if each dyno can handle more than one requests, since those requests share resources, if some of them slows down due to some long running request, response times for the other requests are likely to increase, and as response times increase, it's queue is likely to increase further, and it gets more likely to pile up more long running requests.
> Followup question, also how would intelligent routing work if it just previously checked to see if which dyno had no requests? That seems like an easy thing to do, now you would have to check CPU/IO whatever and route based on load. Not specifically targeted at you but to everyone reading the thread.
There is no perfect answer. Just routing by least connections is one option. it will hurt some queries that will end up being piled up on servers processing a heavy request in high load situations, but pretty soon any heavily loaded servers will have enough connections all the time that most new requests will go to lighter loaded servers.
Adding "buckets" of servers for different types of requests is one option to improve it, if you can easily tell by url which requests will be slow.
That gets pretty unlikely, especially if you have many dynos and a low frequency of slow requests. The main reason unicorn can drastically reduce queue times here is that it does not use random routing internally.
Oh so the server process hosting rails is itself queueing? Is that what they refer to as "dyno queueing"? I thought perhaps there was another server between the router and your apps server process.