Heroku memory quota killing machine

When I was migrating a suite of Exvo apps to Heroku‘s Cedar stack I’ve noticed a very strange behaviour. For two apps during first “run” the main process spun out of control, trying to consume way too much memory. After doing a manual restart everything went back to normal and the problem was basically gone (we haven’t seen it reoccuring).

Here is the log of such an out of control worker:

2012-04-02T11:52:32+00:00 heroku[worker.1]: Process running mem=550M(107.5%)
2012-04-02T11:52:32+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:52:52+00:00 heroku[worker.1]: Process running mem=811M(158.5%)
2012-04-02T11:52:52+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:53:12+00:00 heroku[worker.1]: Process running mem=858M(167.7%)
2012-04-02T11:53:12+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:53:32+00:00 heroku[worker.1]: Process running mem=1099M(214.8%)
2012-04-02T11:53:32+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:53:52+00:00 heroku[worker.1]: Process running mem=1539M(300.6%)
2012-04-02T11:53:52+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:54:12+00:00 heroku[worker.1]: Process running mem=1580M(308.7%)
2012-04-02T11:54:12+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:54:32+00:00 heroku[worker.1]: Process running mem=1709M(333.8%)
2012-04-02T11:54:32+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:54:53+00:00 heroku[worker.1]: Process running mem=2115M(413.1%)
2012-04-02T11:54:53+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:55:13+00:00 heroku[worker.1]: Process running mem=2301M(449.6%)
2012-04-02T11:55:13+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:55:33+00:00 heroku[worker.1]: Process running mem=2302M(449.6%)
2012-04-02T11:55:33+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:55:53+00:00 heroku[worker.1]: Process running mem=2747M(536.6%)
2012-04-02T11:55:53+00:00 heroku[worker.1]: Error R15 (Memory quota vastly exceeded)
2012-04-02T11:55:53+00:00 heroku[worker.1]: Stopping process with SIGKILL
2012-04-02T11:55:54+00:00 heroku[worker.1]: Process exited with status 137

This is just a worker, but I’ve observed the same for web processes as well. Be aware that as soon as your application starts serving R14/R15 errors it stops processing any new requests (they simply time out).

Previously: Introduction to Heroku H12 timeouts.

Compromised

This blog has been. But no more! Now restored from the dead (i.e. fresh installation and customization of the Kubrick template).

It seems that running and outdated WordPress installation is not a very bright idea. Who would have thought?

Oh WordPress and PHP…
how much I loathe thee!

Introduction to Heroku H12 timeouts

At Exvo we were experiencing problems while trying to send an email to all of our 150K+ users. This lead to a careful investigation of what is exactly happening during this process. So we setup our loggers (heroku logs -t | tee output.log), run heroku consoles, increased to 20 dynos and 10 workers and begun the sending process.

It all started with a POST request to send those emails:

app[web.10]: Started POST ...

Which timed out and there was no:

app[web.10]: Completed 200 OK ...

or similar log entry later on, just silence.

This is Heroku’s killing machine in action. It silently kills every process, which runs for longer than 30 seconds (we’re using the Bamboo stack). This is more or less how Heroku deals with long running requests. I don’t blame them for this behavior, even though I very much dislike it. But I digress.

When Heroku kills such process it does so in a very nice way, i.e. it lets it finish its business first. In other words the process will continue in the background and after finishing it will just die/stop/restart. Our email collecting job (selecting 150K users from the database…) run uninterrupted for over 40 minutes. So this is the good part.

The bad part is that when this killed process is still running in the background the dyno which it was connected to will stop serving new requests until this process finally dies, but Heroku will keep sending new requests to it! This is observed by the infamous H12 timeouts:

heroku[router]: Error H12 (Request timeout) -> GET auth.exvo.com/users/sign_in dyno=web.10 queue= wait= service=30000ms status=503 bytes=0

Also notice the 503 error code (Unavailable), which is returned by the Heroku router.

So this is really bad as all sorts of different/random web requests will just keep failing without you knowing what’s going on.

So while I greatly appreciate that Heroku lets such processes still run in the background, I really don’t like that it still keeps sending traffic their way.

PS: Yes, sending emails should be done as a background action by the Heroku workers. I know. But it’s not. Yet.

Ruby 1.9.2 via rvm installation woes

I’ve just had to upgrade Ruby to 1.9.2 (from 1.8.7) on the EC2 instance, but ran into a weird error while running rvm install 1.9.2:

Compiling yaml in /home/ubuntu/.rvm/src/yaml-0.1.4.
ERROR: Error running 'make ', please read /home/ubuntu/.rvm/log/ruby-1.9.2-p290/yaml/make.log
Installing yaml to /home/ubuntu/.rvm/usr
ERROR: Error running 'make install', please read /home/ubuntu/.rvm/log/ruby-1.9.2-p290/yaml/make.install.log

and in the log:

src/Makefile.am:2: Libtool library used but `LIBTOOL' is undefined
src/Makefile.am:2:   The usual way to define `LIBTOOL' is to add `AC_PROG_LIBTOOL'
src/Makefile.am:2:   to `configure.ac' and run `aclocal' and `autoconf' again.
src/Makefile.am:2:   If `AC_PROG_LIBTOOL' is in `configure.ac', make sure
src/Makefile.am:2:   its definition is in aclocal's search path.

Fortunately the solution was rather simple, although Google was not really that helpful this time, hence this post:

sudo aptitude install libtool

Basically the libtool library was not installed.

Painful ruby 1.9.2-p180 to 1.9.2-p290 upgrade

I did the recommended upgrade of my current p180 to the new p290 using rvm:

$ rvm upgrade ruby-1.9.2-p180 ruby-1.9.2-p290

First annoyance – moving gems from one gemset to the new one took over 40 minutes. Have really no idea why.

But after it was done, whenever I tried to run ‘gem’ or ‘rake’ or ‘bundle’ I got:

Invalid gemspec in [/home/jkl/.rvm/gems/ruby-1.9.2-p290/specifications/guard-0.8.1.gemspec]: invalid date format in specification: "2011-09-29 
00:00:00.000000000Z"
Invalid gemspec in [/home/jkl/.rvm/gems/ruby-1.9.2-p290/specifications/json-1.6.1.gemspec]: invalid date format in specification: "2011-09-18 0
0:00:00.000000000Z"
Invalid gemspec in [/home/jkl/.rvm/gems/ruby-1.9.2-p290/specifications/heroku-2.8.4.gemspec]: invalid date format in specification: "2011-09-23
 00:00:00.000000000Z"
Invalid gemspec in [/home/jkl/.rvm/gems/ruby-1.9.2-p290/specifications/guard-0.8.4.gemspec]: invalid date format in specification: "2011-10-03 
00:00:00.000000000Z"
Invalid gemspec in [/home/jkl/.rvm/gems/ruby-1.9.2-p290/specifications/multi_xml-0.4.0.gemspec]: invalid date format in specification: "2011-09
-06 00:00:00.000000000Z"
Invalid gemspec in [/home/jkl/.rvm/gems/ruby-1.9.2-p290/specifications/heroku-2.8.1.gemspec]: invalid date format in specification: "2011-09-21
 00:00:00.000000000Z"
Invalid gemspec in [/home/jkl/.rvm/gems/ruby-1.9.2-p290/specifications/metrical-0.0.7.gemspec]: invalid date format in specification: "2011-09-
11 00:00:00.000000000Z"
...
and so on...

The above is another manifestation of the YAML engine switch from Syck to Psych and all of the incompatibilities it has brought. The problem is that now you have to reinstall all of your gems, because all installed gems have wrong gemspec specification. D’oh.

I fixed it by running:

$ rvm gemset empty

And then bundling in each project…

$ bundle

Some more reading material.

Bug of the day

Completely bad code follows, beware.

Silent error in Ruby 1.8.7:

x = [:a, :b]
=> [:a, :b]

x.slice!(:a)
=> nil

x
=> [:a, :b]

Explicit error (resulting in a failing test) in Ruby 1.9.2:

x = [:a, :b]
=> [:a, :b]

x.slice!(:a)
TypeError: can't convert Symbol into Integer

Just yet another incompatibility, but for the better!