Heroku memory quota killing machine

When I was migrating a suite of Exvo apps to Heroku‘s Cedar stack I’ve noticed a very strange behaviour. For two apps during first “run” the main process spun out of control, trying to consume way too much memory. After doing a manual restart everything went back to normal and the problem was basically gone (we haven’t seen it reoccuring).

Here is the log of such an out of control worker:

2012-04-02T11:52:32+00:00 heroku[worker.1]: Process running mem=550M(107.5%)
2012-04-02T11:52:32+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:52:52+00:00 heroku[worker.1]: Process running mem=811M(158.5%)
2012-04-02T11:52:52+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:53:12+00:00 heroku[worker.1]: Process running mem=858M(167.7%)
2012-04-02T11:53:12+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:53:32+00:00 heroku[worker.1]: Process running mem=1099M(214.8%)
2012-04-02T11:53:32+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:53:52+00:00 heroku[worker.1]: Process running mem=1539M(300.6%)
2012-04-02T11:53:52+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:54:12+00:00 heroku[worker.1]: Process running mem=1580M(308.7%)
2012-04-02T11:54:12+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:54:32+00:00 heroku[worker.1]: Process running mem=1709M(333.8%)
2012-04-02T11:54:32+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:54:53+00:00 heroku[worker.1]: Process running mem=2115M(413.1%)
2012-04-02T11:54:53+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:55:13+00:00 heroku[worker.1]: Process running mem=2301M(449.6%)
2012-04-02T11:55:13+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:55:33+00:00 heroku[worker.1]: Process running mem=2302M(449.6%)
2012-04-02T11:55:33+00:00 heroku[worker.1]: Error R14 (Memory quota exceeded)
2012-04-02T11:55:53+00:00 heroku[worker.1]: Process running mem=2747M(536.6%)
2012-04-02T11:55:53+00:00 heroku[worker.1]: Error R15 (Memory quota vastly exceeded)
2012-04-02T11:55:53+00:00 heroku[worker.1]: Stopping process with SIGKILL
2012-04-02T11:55:54+00:00 heroku[worker.1]: Process exited with status 137

This is just a worker, but I’ve observed the same for web processes as well. Be aware that as soon as your application starts serving R14/R15 errors it stops processing any new requests (they simply time out).

Previously: Introduction to Heroku H12 timeouts.

Introduction to Heroku H12 timeouts

At Exvo we were experiencing problems while trying to send an email to all of our 150K+ users. This lead to a careful investigation of what is exactly happening during this process. So we setup our loggers (heroku logs -t | tee output.log), run heroku consoles, increased to 20 dynos and 10 workers and begun the sending process.

It all started with a POST request to send those emails:

app[web.10]: Started POST ...

Which timed out and there was no:

app[web.10]: Completed 200 OK ...

or similar log entry later on, just silence.

This is Heroku’s killing machine in action. It silently kills every process, which runs for longer than 30 seconds (we’re using the Bamboo stack). This is more or less how Heroku deals with long running requests. I don’t blame them for this behavior, even though I very much dislike it. But I digress.

When Heroku kills such process it does so in a very nice way, i.e. it lets it finish its business first. In other words the process will continue in the background and after finishing it will just die/stop/restart. Our email collecting job (selecting 150K users from the database…) run uninterrupted for over 40 minutes. So this is the good part.

The bad part is that when this killed process is still running in the background the dyno which it was connected to will stop serving new requests until this process finally dies, but Heroku will keep sending new requests to it! This is observed by the infamous H12 timeouts:

heroku[router]: Error H12 (Request timeout) -> GET auth.exvo.com/users/sign_in dyno=web.10 queue= wait= service=30000ms status=503 bytes=0

Also notice the 503 error code (Unavailable), which is returned by the Heroku router.

So this is really bad as all sorts of different/random web requests will just keep failing without you knowing what’s going on.

So while I greatly appreciate that Heroku lets such processes still run in the background, I really don’t like that it still keeps sending traffic their way.

PS: Yes, sending emails should be done as a background action by the Heroku workers. I know. But it’s not. Yet.

On Heroku

While Heroku is nice and all, it suffers from the same disease as every other hosted-for-you solution:

$ git push heroku
(...)
-----> Gemfile detected, running Bundler version 1.0.7
(...)

$ bundle --version
Bundler version 1.0.10

It’s a whole three patch versions behind! Now I don’t really know (and maybe don’t even really want to know) what has changed between .7 and .10, but I can easily imagine things braking precisely because those minor inconsistencies in Bundler versions.

And sure, I could downgrade to .7, but to do that just because of Heroku? Thank you, but no. I want to ride the latest.

The downsides of Virtuozzo when used with mongrel

The scenario goes more or less like this. You have your Virtuozzo powered VPS. You have your RAM limit. You have your mongrels behind nginx plus mysql taking on average 60-70% of your available RAM (quite sensible limit). Now imagine the server gets hammered (not necessairly your VPS). Load goes over 10 (or even over 50). Mongrels stop responding. Queue builds up (inside mongrels), mongrels consume more and more of memory. Load still over 10. Memory limit reached. One of the mongrels is killed by Virtuozzo (too bad if it was the only one). Load still high. At some point other mongrels stop accepting new requests (queue limit reached?) and when load goes down they are unable to process the built up queue. Effect: your website is returning a 500 error code and you have to manually restart the mongrels (they are hanged up). I’ve been seeing similar behaviour too often lately…

Possible solutions: xen (does not kill your children), passenger (spawns new children whenever needed), haproxy (prolongs the life of your children), god (brings dead children back to life).

I’m going with passenger for now, thinking about moving to xen in the future.

On being tired

And deploying code which let anyone using any login and password (and by any I mean really any combination, even asdf/asdf worked) authenticate. And have access to the administration panel. No fun. At least at first, when I shook my head with disbelief over the deployed code. How could I not check it… How could I not write even the simplest unit test… Quick fix and few minutes later the site was fixed. After that I’ve simply burst in laughter over my stupidity.

Thankfully hardly anyone ever tries to login to this particular site (login page has both no-index and no-follow so it does not attract google scripters) so despite the fact that this bug has been live for a little over 12 hours no one broke in.

Shuffle your Dreamhost rewards page

Back at the day Jesse Ruderman wrote a script putting the block containing “You currently have $X.XX of Rewards payments built up!” at the top of the Dreamhost rewards page. Now some people might find it useful, but there are also other people (like me), without any payments built up (yet…), who would like to see the “Referral Reports” block at the top. Here is the greasemonkey script that does exactly that: shuffle.your.dreamhost.rewards.page.user.js

While considering the code of those two similar scripts, with all due respect to Jesse, I think that my solution is more general and simpler at the same time (always KISS), meaning that it allows you to reorder all of the blocks in any way you want (and it’s done in a painless way too).

There’s also a word of advice to myself (and maybe to others as well) – or in other words, a reminder. When creating greasemonkey scripts, the simpler hack you use, the better your script will be. It’s really futile trying to protect your script from the HTML changes. You just can’t predict every single one of them. All of those scripts are just hacks, that can be broken with just a slight change in the HTML code. So you shouldn’t worry about your script not being robust enough. That’s ok. You will have to fix it at some point in the future anyway.

Have fun!

Trace your referrers in real-time

One of the magic things about Dreamhost is the ability to login to your account using ssh. Apart from standard things that you can do with it, like compile and install any program that you like (excluding those, which require root access, of course), you can also experience a little bit of magic if you run this command:

tail -f current-httpd-accesslog

There is really something special about tracking your website’s visitors as they come. And, of course, the most interesting thing is actually knowing where they come from (also known as referrers). The problem with this command is that it prints lots of garbage on the screen (like timestamps, response codes, sizes, etc.). The problem is that you cannot grep your real-time tailed log. Why? Don’t ask me, I’m not a Linux guru. You just can’t and that’s it. You can, however, write a script, which deals with that in its own way. And this is what I’ve been writing for the whole day. It was both fun and painful to learn for the n-th time all those shell hacks and quirks. Was it worth it? Sure it was! As a result I came up with this little bash script to trace your referrers in real-time. You can download it or view it below. I must warn you, though. It is highly addictive. Really…

#!/bin/bash

# ========================================================================
# REAL-TIME WEBSITE REFERERS TRACER
# ========================================================================
#
# What?
#   Real-time website referrers tracer is a shell script that lets you
#   trace your visitors as they come. Script works in an ultra compact
#   four-columns view :)
#
# Why?
#   Because you cannot do 'tail -f access_log | grep something' and you
#   really want to grep out most of the stuff that your httpd puts in
#   the logs.
#
# Requirements:
#   - website (with not too low and not too high traffic),
#   - shell account on the server where your website is hosted,
#   - access to httpd logs that use the COMBINED format.
#
# Installation:
#   - copy anywhere in your home directory,
#   - edit the script and set the 'log' variable so it actually
#     points to your current httpd log,
#   - make sure the script has execute rights (chmod +x trace-referers).
#
# Running:
#   - just run the script and watch the screen.
#
#
# Version: 0.2 (2005-06-18)
# Author: Paweł Gościcki, http://pawelgoscicki.com
#
# No copyright rights. You can do whatever you want with this. You may even
# claim this scrip has been written by you from the very beginning ;)
#
# If you, however, improve it, send me a copy (paul_AT_pawelgoscicki.com).
#
# Based heavily on the tgrep script by Ed Morton (morton_at_lsupcaemnt.com):
# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-01/0818.html


# CONFIGURATION
# =============

# Where your httpd log file is
log="current-httpd-accesslog"

# What files to exclude (request for those files won't be shown, regexp syntax)
exclude="\.gif|\.jpg|\.png|\.ico|\.css|\.js"

# Width of request and referrer columns (set it to match your terminal's width)
col_width=35


# MAIN SCRIPT
# ===========

# Check if log file actually exists (and is readable)
if [ ! -r "${log}" ]; then
echo "Cannot access log file: $log"
exit 0
fi

# After startup we will output few lines
start=`wc -l < "${log}"`
start=$(( $start - 30 ))
if (( ${start} < 0 ))
then start=$((0))
fi

# Main loop
while :
do
  end=`wc -l < "${log}"`
  end="${end##* }"
  if (( ${end} > ${start} ))
  then
    start=$(( $start + 1 ))
    sed -n "${start},${end}p" "${log}" | egrep -v "${exclude}" | \
    awk -v col_width=$col_width '{

      # we are only interested in GET/POST requests
      if ( match($0, /\"(GET|POST).*?\"/) > 0 )
      {
        split($0, fields, "\"")

        # IP_ADDRESS
        tmp = $1
        while ( length(tmp) < 15 ) tmp = tmp " "
        printf "%s", tmp " "
    
        # HTTP_REQUEST (GET/POST)
        tmp = substr(fields[2], 0, index(fields[2], "HTTP/") - 1 )
        tmp = substr(tmp, index(tmp, " ") + 1, col_width)
        while ( length(tmp) < col_width ) tmp = tmp " "
        printf "%s", tmp " "
    
        # REFERER (the juice)
        tmp = fields[4]
        while ( length(tmp) < col_width ) tmp = tmp " "
        printf "%s", tmp " "
    
        # USER_AGENT
        printf "%s", fields[6]
    
        # new line at the end
        printf "\n"
      }
    }'

    start=${end}
  fi

  # this is an endless loop executed every second
  sleep 1
done

Your current hosting provider does not support ssh access? You might then want to read my other post about hosting with dreamhost for as little as 9$/year. Have fun!

Rolling with Ruby on Rails on Dreamhost

As of yet it is unofficial, but Dreamhost has indeed added support for Ruby on Rails (v0.12.1) together with FastCGI!

I have just tested it briefly and it works as it is supposed to.

If you want to turn FastCGI on, you must log in into your Dreamhost panel and go to Domains -> Web. Single checkbox and FastCGI is enabled. Couldn’t be simpler.

Dreamhost, regarded by many as the hosting company is indeed staying on the cutting edge, being the first major player to support Rails.

UPDATE: It’s now official.

(via)