At Exvo we were experiencing problems while trying to send an email to all of our 150K+ users. This lead to a careful investigation of what is exactly happening during this process. So we setup our loggers (
heroku logs -t | tee output.log), run heroku consoles, increased to 20 dynos and 10 workers and begun the sending process.
It all started with a POST request to send those emails:
app[web.10]: Started POST ...
Which timed out and there was no:
app[web.10]: Completed 200 OK ...
or similar log entry later on, just silence.
This is Heroku’s killing machine in action. It silently kills every process, which runs for longer than 30 seconds (we’re using the Bamboo stack). This is more or less how Heroku deals with long running requests. I don’t blame them for this behavior, even though I very much dislike it. But I digress.
When Heroku kills such process it does so in a very nice way, i.e. it lets it finish its business first. In other words the process will continue in the background and after finishing it will just die/stop/restart. Our email collecting job (selecting 150K users from the database…) run uninterrupted for over 40 minutes. So this is the good part.
The bad part is that when this killed process is still running in the background the dyno which it was connected to will stop serving new requests until this process finally dies, but Heroku will keep sending new requests to it! This is observed by the infamous H12 timeouts:
heroku[router]: Error H12 (Request timeout) -> GET auth.exvo.com/users/sign_in dyno=web.10 queue= wait= service=30000ms status=503 bytes=0
Also notice the 503 error code (
Unavailable), which is returned by the Heroku router.
So this is really bad as all sorts of different/random web requests will just keep failing without you knowing what’s going on.
So while I greatly appreciate that Heroku lets such processes still run in the background, I really don’t like that it still keeps sending traffic their way.
PS: Yes, sending emails should be done as a background action by the Heroku workers. I know. But it’s not. Yet.