At Exvo we were experiencing problems while trying to send an email to all of our 150K+ users. This lead to a careful investigation of what is exactly happening during this process. So we setup our loggers (heroku logs -t | tee output.log
), run heroku consoles, increased to 20 dynos and 10 workers and begun the sending process.
It all started with a POST request to send those emails:
app[web.10]: Started POST ...
Which timed out and there was no:
app[web.10]: Completed 200 OK ...
or similar log entry later on, just silence.
This is Heroku’s killing machine in action. It silently kills every process, which runs for longer than 30 seconds (we’re using the Bamboo stack). This is more or less how Heroku deals with long running requests. I don’t blame them for this behavior, even though I very much dislike it. But I digress.
When Heroku kills such process it does so in a very nice way, i.e. it lets it finish its business first. In other words the process will continue in the background and after finishing it will just die/stop/restart. Our email collecting job (selecting 150K users from the database…) run uninterrupted for over 40 minutes. So this is the good part.
The bad part is that when this killed process is still running in the background the dyno which it was connected to will stop serving new requests until this process finally dies, but Heroku will keep sending new requests to it! This is observed by the infamous H12 timeouts:
heroku[router]: Error H12 (Request timeout) -> GET auth.exvo.com/users/sign_in dyno=web.10 queue= wait= service=30000ms status=503 bytes=0
Also notice the 503 error code (Unavailable
), which is returned by the Heroku router.
So this is really bad as all sorts of different/random web requests will just keep failing without you knowing what’s going on.
So while I greatly appreciate that Heroku lets such processes still run in the background, I really don’t like that it still keeps sending traffic their way.
PS: Yes, sending emails should be done as a background action by the Heroku workers. I know. But it’s not. Yet.
FWIW: Things don’t work the same on Cedar. The request is actually killed. To simulate this behavior on Bamboo, wrap requests in a Timeout block or use Rack::Timeout.
And yes, don’t do stuff like that in a request, or have the request block pending completion. Workers FTW.
I’ve suspected things might be different on Cedar. Not really cool If the actual request is being killed (instead of letting it finish in the background). The effects are worse than we have currently on Bamboo.
I spent my entire summer porting to Heroku. I know thats surprising, but when you have a large system, and its just one person writing it, and you were an early rails adopter, the modernization it took was daunting.
To Port to Heroku, I had to
* Switch from SVN to GIT
* Upgrade from Rails 2.2.2 to 2.3.14 (I wouldn’t dare go to 3)
* Convert from full frozen GEMS to Bundler
* Because I used Bundler, I had to upgrade my PDF library (prawn)
* Because I upgraded my PDF library, I had to literally re-write all my PDF generation pages because ruby/rails people seem to not think backwards compatibility is necessary
* Migrate from file-system uploaded user content to Amazon S3
* Migrated my entire DB to Amazon RDS
* Migrated all my data from my old host to Amazon RDS
* Then I had to learn how you worked, dealt with SSL, dealt with custom domains, email, etc
NOW, i have a request that takes 173 seconds.. which is my admin PDF creation which is the ENTIRE POINT of my system. Naturally, I didn’t think to check if the platform did not allow this. And I’m getting Error H12 (Request timeout).
So the advantage of Heroku (fast to market, supposedly easy) .. I dont see it. Marc Andreesen should pull out his investment. I honestly had it up to HERE with Heroku. Only advantage is avoiding deployment chef scripts and maybe not dealing with SSL.. but whoopdedoo. Deploy directly to Amazon will take me 1 week nights and weekends.. which pales in comparision with the investment I took to get to to this point.
Hi
Is there any solution for this time out issue, I have the similar problem with sending file upolad from heroku to s3 using carrierwave. Please help