The magic of Ubuntu

Ubuntu Linux

After three years (has it been that long?) the time has finally come to replace my very much beloved RedHat 8.0 with something more up-to-date. The choice was fairly simple. Yes, Ubuntu. Praised by everyone, loved by many aka the most popular linux distro around.

I got to know a little bit of Debian magic, by installing apt-get for my RedHat. It had very limited use as the rpm repositories were rarely updated. Only using Debian you can experience its full potential. And yes, Ubuntu is a Debian clone.

The installation was fairly simple, although I missed the “select packages” screen (I suppose there is no option to choose the installed packages before the installation). Next stop: configuration. Turning off graphical login, disabling useless services (cups, alsa, ppp and so on…) – all the usual stuff. Hour after hour my system was shaping up.

At some point I have noticed that I did not have identd running. Without any hesitation I executed apt-get install oidentd. One minute later it was up and running.

And now for the magic part. It has automatically discovered the non-routable IP address (in the 192.168.x.x range) assigned to the eth0 interface (which was also the default routing target) together with the IP address of the gateway. Based on those two IP addresses it found out that my machine was sitting behind a NAT, so it automatically added the -A gateway_ip option to the command line (needed for oidentd to work behind a NAT). That’s just pure magic.

I know it’s just a relatively simple installation script, but still, it’s those simple things which make all the difference. And, sometimes, also make my jaw drop.

10 points for Ubuntu.

Trace your referrers in real-time

One of the magic things about Dreamhost is the ability to login to your account using ssh. Apart from standard things that you can do with it, like compile and install any program that you like (excluding those, which require root access, of course), you can also experience a little bit of magic if you run this command:

tail -f current-httpd-accesslog

There is really something special about tracking your website’s visitors as they come. And, of course, the most interesting thing is actually knowing where they come from (also known as referrers). The problem with this command is that it prints lots of garbage on the screen (like timestamps, response codes, sizes, etc.). The problem is that you cannot grep your real-time tailed log. Why? Don’t ask me, I’m not a Linux guru. You just can’t and that’s it. You can, however, write a script, which deals with that in its own way. And this is what I’ve been writing for the whole day. It was both fun and painful to learn for the n-th time all those shell hacks and quirks. Was it worth it? Sure it was! As a result I came up with this little bash script to trace your referrers in real-time. You can download it or view it below. I must warn you, though. It is highly addictive. Really…

#!/bin/bash

# ========================================================================
# REAL-TIME WEBSITE REFERERS TRACER
# ========================================================================
#
# What?
#   Real-time website referrers tracer is a shell script that lets you
#   trace your visitors as they come. Script works in an ultra compact
#   four-columns view :)
#
# Why?
#   Because you cannot do 'tail -f access_log | grep something' and you
#   really want to grep out most of the stuff that your httpd puts in
#   the logs.
#
# Requirements:
#   - website (with not too low and not too high traffic),
#   - shell account on the server where your website is hosted,
#   - access to httpd logs that use the COMBINED format.
#
# Installation:
#   - copy anywhere in your home directory,
#   - edit the script and set the 'log' variable so it actually
#     points to your current httpd log,
#   - make sure the script has execute rights (chmod +x trace-referers).
#
# Running:
#   - just run the script and watch the screen.
#
#
# Version: 0.2 (2005-06-18)
# Author: Paweł Gościcki, http://pawelgoscicki.com
#
# No copyright rights. You can do whatever you want with this. You may even
# claim this scrip has been written by you from the very beginning ;)
#
# If you, however, improve it, send me a copy (paul_AT_pawelgoscicki.com).
#
# Based heavily on the tgrep script by Ed Morton (morton_at_lsupcaemnt.com):
# http://unix.derkeiler.com/Newsgroups/comp.unix.shell/2004-01/0818.html


# CONFIGURATION
# =============

# Where your httpd log file is
log="current-httpd-accesslog"

# What files to exclude (request for those files won't be shown, regexp syntax)
exclude="\.gif|\.jpg|\.png|\.ico|\.css|\.js"

# Width of request and referrer columns (set it to match your terminal's width)
col_width=35


# MAIN SCRIPT
# ===========

# Check if log file actually exists (and is readable)
if [ ! -r "${log}" ]; then
echo "Cannot access log file: $log"
exit 0
fi

# After startup we will output few lines
start=`wc -l < "${log}"`
start=$(( $start - 30 ))
if (( ${start} < 0 ))
then start=$((0))
fi

# Main loop
while :
do
  end=`wc -l < "${log}"`
  end="${end##* }"
  if (( ${end} > ${start} ))
  then
    start=$(( $start + 1 ))
    sed -n "${start},${end}p" "${log}" | egrep -v "${exclude}" | \
    awk -v col_width=$col_width '{

      # we are only interested in GET/POST requests
      if ( match($0, /\"(GET|POST).*?\"/) > 0 )
      {
        split($0, fields, "\"")

        # IP_ADDRESS
        tmp = $1
        while ( length(tmp) < 15 ) tmp = tmp " "
        printf "%s", tmp " "
    
        # HTTP_REQUEST (GET/POST)
        tmp = substr(fields[2], 0, index(fields[2], "HTTP/") - 1 )
        tmp = substr(tmp, index(tmp, " ") + 1, col_width)
        while ( length(tmp) < col_width ) tmp = tmp " "
        printf "%s", tmp " "
    
        # REFERER (the juice)
        tmp = fields[4]
        while ( length(tmp) < col_width ) tmp = tmp " "
        printf "%s", tmp " "
    
        # USER_AGENT
        printf "%s", fields[6]
    
        # new line at the end
        printf "\n"
      }
    }'

    start=${end}
  fi

  # this is an endless loop executed every second
  sleep 1
done

Your current hosting provider does not support ssh access? You might then want to read my other post about hosting with dreamhost for as little as 9$/year. Have fun!