Heroku上のRailsアプリで容量の大きい画像をアップロードする際、ボタンを連打するとレスポンスが全然返ってこなくなりました。New Relicを見てみると、Request queueingという処理に50秒ほど取られていることが判明。その解決方法を調べるため、文献を調査しました。
Dynos and requests
A single dyno can serve thousands of requests per second, but performance depends > greatly on the language and framework you use.
A single-threaded, non-concurrent web framework (like Rails 3 in its default configuration) can process one request at a time. For an app that takes 100ms on average to process each request, this translates to about 10 requests per second per dyno, which is not optimal.
Single threaded backends are not recommended for production applications for their inefficient handling of concurrent requests. Choose a concurrent backend whenever developing and running a production service.
Multi-threaded or event-driven environments like Java, Unicorn, EventMachine, and Node.js can handle many concurrent requests. Load testing your app is the only realistic way to determine request throughput.
Each router maintains an internal per-app request queue. For Cedar apps, routers limit the number of active requests per dyno to 50 and queue additional requests. There is no coordination between routers however, so this request limit is per router. The request queue on each router has a maximum backlog size of 50n (n = the number of web dynos your app has running). If the request queue on a particular router fills up, subsequent requests to that router will immediately return an H11 (Backlog too deep) response.
The herokuapp.com routing stack allows many concurrent connections to web dynos. For production apps, you should always choose an embedded webserver that allows multiple concurrent connections to maximize the responsiveness of your app. You can also take advantage of concurrent connections for long-polling requests.
Almost all modern web frameworks and embeddable webservers support multiple concurrent connections. Examples of webservers that allow concurrent request processing in the dyno include Unicorn (Ruby), Goliath (Ruby), Puma (JRuby), Gunicorn (Python), and Jetty (Java).
Request queueing efficiency
Request timeouts can also be caused by queueing of TCP connections inside the dyno. Some languages and frameworks process only one connection at a time, but it’s possible for the routers to send more than one request to a dyno concurrently. In this case, requests will queue behind the one being processed by the app, causing those subsequent requests to take longer than they would on their own. You can get some visibility into request queue times with the New Relic addon. This problem can be ameliorated by the following techniques, in order of typical effectiveness:
- Run more processes per dyno, with correspondingly fewer dynos. If you’re using a framework that can handle only one request at a time (for example, Ruby on Rails), try a tool like Unicorn with more worker processes. This keeps your app’s total concurrency the same, but dramatically improves request queueing efficiency by sharing each dyno queue among more processes.
- Make slow requests faster by optimizing app code. To do this effectively, focus on the 99th percentile and maximum service time for your app. This decreases the amount of time requests will spend waiting behind other, slower requests.
- Run more dynos, thus increasing total concurrency. This slightly decreases the probability that any given request will get stuck behind another one.