Running Puppeteer on AWS Lambda in a Docker container

The aim of this guide is to provide a working solution to generating a PDF version of a webpage using Puppeteer running in a Docker container as a Lambda function. The Docker container approach is used to bypass the 50MB Lambda code size limit. The other option is to use something like chrome-aws-lambda.

We’ll start with the Dockerfile, which assumes Lambda function with a node.js v16 engine called index.js with a named handler export:

FROM public.ecr.aws/lambda/nodejs:16

# Required for puppeteer to run
RUN yum install -y amazon-linux-extras
RUN amazon-linux-extras install epel -y

# Chromium dependencies
RUN yum install -y \
  GConf2.x86_64 \
  alsa-lib.x86_64 \
  atk.x86_64 \
  cups-libs.x86_64 \
  gtk3.x86_64 \
  ipa-gothic-fonts \
  libXScrnSaver.x86_64 \
  libXcomposite.x86_64 \
  libXcursor.x86_64 \
  libXdamage.x86_64 \
  libXext.x86_64 \
  libXi.x86_64 \
  libXrandr.x86_64 \
  libXtst.x86_64 \
  pango.x86_64 \
  xorg-x11-fonts-100dpi \
  xorg-x11-fonts-75dpi \
  xorg-x11-fonts-Type1 \
  xorg-x11-fonts-cyrillic \
  xorg-x11-fonts-misc \
  xorg-x11-utils

RUN yum update -y nss

# Chromium needs to be installed as a system dependency, not via npm; otherwise there will be an error about missing libatk-1.0
RUN yum install -y chromium

COPY index.js package.json package-lock.json ${LAMBDA_TASK_ROOT}

RUN npm ci --omit=dev

CMD [ "index.handler" ]

The above Dockerfile assures all required dependencies are in place. The next step is to setup the Puppeteer’s launch. Here is the relevant snippet from the Lambda function code:

import puppeteer from 'puppeteer'

const viewportOptions = {
  args: [
    // Flags for running in Docker on AWS Lambda
    // https://www.howtogeek.com/devops/how-to-run-puppeteer-and-headless-chrome-in-a-docker-container
    // https://github.com/alixaxel/chrome-aws-lambda/blob/f9d5a9ff0282ef8e172a29d6d077efc468ca3c76/source/index.ts#L95-L118
    // https://github.com/Sparticuz/chrome-aws-lambda/blob/master/source/index.ts#L95-L123
    '--allow-running-insecure-content',
    '--autoplay-policy=user-gesture-required',
    '--disable-background-timer-throttling',
    '--disable-component-update',
    '--disable-dev-shm-usage',
    '--disable-domain-reliability',
    '--disable-features=AudioServiceOutOfProcess,IsolateOrigins,site-per-process',
    '--disable-ipc-flooding-protection',
    '--disable-print-preview',
    '--disable-setuid-sandbox',
    '--disable-site-isolation-trials',
    '--disable-speech-api',
    '--disable-web-security',
    '--disk-cache-size=33554432',
    '--enable-features=SharedArrayBuffer',
    '--hide-scrollbars',
    '--ignore-gpu-blocklist',
    '--in-process-gpu',
    '--mute-audio',
    '--no-default-browser-check',
    '--no-first-run',
    '--no-pings',
    '--no-sandbox',
    '--no-zygote',
    '--single-process',
    '--use-angle=swiftshader',
    '--use-gl=swiftshader',
    '--window-size=1920,1080',
  ],
  defaultViewport: null,
  headless: true,
}

const browser = await puppeteer.launch(viewportOptions)

try {
  const page = await browser.newPage()

  const url = 'https://...'

  await page.goto(url, { waitUntil: ['domcontentloaded', 'networkidle0'] })
  await page.emulateMediaType('print')

  const pdf = await page.pdf({})
} catch (error) {
  ...
}

Testing useNavigate() / navigate() from react-router v6

Testing navigate() is slightly more problematic with the latest v6 (as of writing this post) react-router than just asserting on history.push() as it was the case in the previous versions. Let’s say we have this ButtonHome component:

import { useNavigate } from 'react-router-dom'

const ButtonHome = () => {
  const navigate = useNavigate()

  const onClick = () => navigate('/home')

  return (
    <button onClick={onClick}>
      Home
    </button>
  )
}

I would write a test for this component using the react-testing-library in the following way:

import * as router from 'react-router'
import { render } from '@testing-library/react'
import userEvent from '@testing-library/user-event'

import ButtonHome from './ButtonHome'

describe('ButtonHome', () => {
  const ui = userEvent.setup()
  const navigate = jest.fn()

  beforeEach(() => {
    jest.spyOn(router, 'useNavigate').mockImplementation(() => navigate)
  })

  it('renders the button and navigates to /home upon click', async () => {
    render(withRouter(<ButtonHome />))
    
    await ui.click(screen.queryByText('Home'))

    expect(navigate).toHaveBeenCalledWith('/home')
  })
})

The relevant bits just for testing the router are as follows:

import * as router from 'react-router'

const navigate = jest.fn()

beforeEach(() => {
  jest.spyOn(router, 'useNavigate').mockImplementation(() => navigate)
})

it('...', () => {
  expect(navigate).toHaveBeenCalledWith('/path')
})

The test also requires the following withRouter() helper, which I have in jest.setup.js:

import { Route, Router, Routes } from 'react-router-dom'
import { createBrowserHistory } from 'history'

const history = createBrowserHistory()

const withRouter = (children, opts = {}) => {
  const { path, route } = opts

  if (path) {
    history.push(path)
  }

  return (
    <Router location={history.location} navigator={history}>
      <Routes>
        <Route
          path={route || path || '/'}
          element={children}
        />
      </Routes>
    </Router>
  )
}

global.withRouter = withRouter

Debugging Safari web application on iOS devices

The idea here is to be able to use Safari’s developer tools connected to the web page/application running on a real iOS device. You can achieve similar effect by using XCode’s Simulator app, but certain things don’t work there (looking at you, Apple Pay).

Enable USB internet sharing

  • Open System Preferences > Sharing
  • Select Internet Sharing
  • If the checkbox next to Internet Sharing is enabled, uncheck it
  • Check iPhone USB on the right side
  • Check Internet Sharing on the left side. You will have to confirm it
  • Find message saying: “Computers on your local network can access your computer at: xxxx.local”. The xxxx part is the host. You will need it later.

Make your application available on the new host

Now it’s necessary to make your application available on the new xxxx.local host. In nginx it is a matter of adding xxxx.local to the server_name directive:

server_name existing.server xxxx.local;

For rails server it’s necessary to bind to all local interfaces:

rails server -b 0.0.0.0

Connect your iOS device

  • on iOS go to Settings -> Safari -> Advanced and toggle “Web Inspector”
  • on Mac open Safari and go to Preferences -> Advanced and check “Show Develop menu in menu bar”
  • connect iPhone via USB cable
  • on Mac restart Safari
  • on iPhone open http://xxxx.local
  • on Mac open Safari and go to the Develop menu. You will now see the iOS device you connected to your Mac. Click on it to start debugging.

Troubleshooting

Sometimes Safari won’t show the connected iOS device. If that is the case, you can start with restarting Safari both on the iOS device and Mac. If that doesn’t work, restart the iOS device and then the Mac.

Creating a self-signed, wildcard SSL certificate for Chrome 58+

Chrome 58+ requires Subject Alternative Name to be present in the SSL certificate for the domain name you want to secure. This is supposed to be a replacement for Common Name, which has some security holes (like being able to define a certificate for *.co.uk, which is not possible with SAN).

I’ll be using MacOS and OpenSSL v1.1.1d installed via brew.

Recent OpenSSL versions add the basicConstraints=critical,CA:TRUE x509v3 SAN extension by default, which prevents such generated certificate to work in Chrome 58+. We need to disable that first.

Edit /usr/local/etc/openssl@1.1/openssl.cnf (your path may vary) and comment the following line:

[ req ]
# x509_extensions = v3_ca # The extensions to add to the self signed cert

And then you are off to generate the certificate. I’ll be using the *.example.net domain name here.

/usr/local/Cellar/openssl@1.1/1.1.1d/bin/openssl req \
  -x509 \
  -newkey rsa:4096 \
  -sha256 \
  -days 7000 \
  -nodes \
  -out cert.pem \
  -keyout key.pem \
  -subj "/C=US/O=Org/CN=*.example.net" \
  -addext "basicConstraints=critical,CA:FALSE" \
  -addext "authorityKeyIdentifier=keyid,issuer" \
  -addext "keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment" \
  -addext "subjectAltName=DNS:example.net,DNS:*.example.net"

This will generate two files. key.pem, which is the private key, without passphrase and cert.pem, which is the actual certificate.

Verify that the actual certificate has required x509v3 SAN extensions:

$ openssl x509 -in cert.pem -noout -text

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            70:4c:28:...
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=Org, CN=*.example.net
        Validity
            Not Before: Oct  2 15:48:10 2019 GMT
            Not After : Dec  1 15:48:10 2038 GMT
        Subject: C=US, O=Org, CN=*.example.net
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (4096 bit)
                Modulus:
                    00:a7:b5:01...
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Authority Key Identifier:
                DirName:/C=US/O=Org/CN=*.example.net
                serial:70:4C:...

            X509v3 Key Usage:
                Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment
            X509v3 Subject Alternative Name:
                DNS:example.net, DNS:*.example.net
    Signature Algorithm: sha256WithRSAEncryption
         59:1d:96:...

The last step is to import the certificate (cert.pem) into the keychain (I’m using the login keychain) and trust it.

So easy. So hard.

Ruby date is (ir)rational

Consider the following:

[DEV] main:0> Date.today
Fri, 05 Apr 2019

[DEV] main:0> (Date.today + 0.5)
Fri, 05 Apr 2019

[DEV] main:0> (Date.today + 0.5) == Date.today
false

The above was executed in a Rails application and was the reason for a quite long wtf moment. Things become more clear when I executed this:

[DEV] main:0> Date.today + 0.5 - Date.today
1/2

Turns out that the Date holds a Rational offset inside of it. This becomes more evident when you execute the above in irb:

irb(main):003:0> Date.today
=> #<date: 2019-04-05 ((2458579j,0s,0n),+0s,2299161j)>

irb(main):004:0> Date.today + 0.5
=> #<date: 2019-04-05 ((2458579j,43200s,0n),+0s,2299161j)>

LEFT OUTER JOIN in ActiveRecord

I always forget how to construct those queries in ActiveRecord, so here it goes.

Assuming we have the following structure:

class User < ActiveRecord::Base
  has_many :authentications
end

class Authentication < ActiveRecord::Base
  belongs_to :user
end

We can generate the LEFT OUTER JOIN SQL query in the following way (to see, for example, if we have dangling user references in authentications):

Authentication.joins('LEFT OUTER JOIN users ON authentications.user_id = users.id').where('users.id IS NULL').where('authentications.user_id IS NOT NULL')

Will generate the following SQL:

SELECT `authentications`.* FROM `authentications` LEFT OUTER JOIN users ON authentications.user_id = users.id WHERE (users.id IS NULL) AND (authentications.user_id IS NOT NULL)

i.e. it will select all authentications with incorrect (dangling) user_id references.

Setting up mocha with sinon and chai

I was unable to quickly find a solution for this, so here’s a little guide on how to set it up together in a proper way.

First, install the libraries:

npm install mocha --save-dev
npm install sinon --save-dev
npm install chai --save-dev

I come from the Ruby world, so I expect to have a spec command, spec_helper.js file and specs living inside spec/ directory (with a nested structure).

Inside package.json file define the spec command:

"scripts" : {
  "spec": "mocha --opts spec/mocha.opts"
}

We will be using BDD style (expect().to()) of chai. Inside spec/mocha.opts add:

--recursive **/*_spec.js
--require spec/spec_helper.js
--ui bdd

Create spec/spec_helper.js, which will require chai and sinon and we will require `spec_helper.js` inside all specs (similarly to how RSpec in Ruby world works).

const sinon = require('sinon')
const expect = require('chai').expect

global.sinon = sinon
global.expect = expect

And now create your spec file (spec/module_spec.js). You should not be required to include any libraries there. Now you can run your specs:

npm run spec

References:

Writing NullObject in Ruby to please Rubocop gods

Lets get right to it. Here’s how one can write a NullObject in a modern, Rubocop-friendly manner:

class NullObject
  def method_missing method, *args, &block
    if respond_to? method
      nil
    else
      super
    end
  end

  def respond_to_missing? _name, _include_private = false
    true
  end
end

One of the things you should support is the fallback to super. I achieve that by the if/else block. In practice super is never reached, so this feels a little bit like a waste.

The other thing is to declare whether your object (NullObject) should respond to methods it doesn’t have. A true null object should respond to all methods, so I set respond_to_missing? to true.

Just for posterity, here’s how you could write specs for it:

describe NullObject do
  it 'returns nil for any method call' do
    null = NullObject.new

    expect(null.missing_method).to be_nil
    expect(null.some_other_missing_method(1, 2, 3)).to be_nil
  end

  it 'responds to missing methods' do
    null = NullObject.new

    expect(null.respond_to?(:missing_method)).to be true
  end
end

Additional reading about the NullObject pattern:

  • http://wiki.c2.com/?NullObject
  • http://www.virtuouscode.com/2011/05/30/null-objects-and-falsiness/

Transproc is awesome!

I’m a big fan of the Elixir programming language, so when I saw at one of the Trójmiasto Ruby Users Group meetings that its pipe operator (|>) can be imitated in Ruby, I was immediately hooked. In Elixir you use it like this:

1..100_000
  |> Enum.map(&(&1 * 3))
  |> Enum.filter(odd?)
  |> Enum.sum

Which means that the result of the preceding expression/function will be passed as the first argument to the next function. This allows writing code, which looks beautiful. Sure, it’s just a syntactic sugar, but what a beautiful one.

Now in Ruby, you can achieve a very similar thing if you use the transproc gem by our own Piotr Solnica. It’s a little bit more cumbersome as this is not built in into Ruby.

Let’s say you have a Hash, which you would like to transform to deep symbolise the keys and rename one key. Here’s how you do it:

require 'transproc/all'

# create your own local registry for transformation functions
module Functions
  extend Transproc::Registry

  import Transproc::HashTransformations
end

class SomeClass
  def transform hash
    transformation.call hash
  end

  private

  def transformation
    t(:deep_symbolize_keys) >> t(:rename_keys, user_name: :name)
  end

  def t *args
    Functions[*args]
  end
end

And if you now run it you’d get this as a result:

irb(main):022:0> hash = { 'user_name' => 'Paweł', 'country' => 'PL' }
=> { "user_name" => "Paweł", "country" => "PL" }

irb(main):023:0> SomeClass.new.transform hash
=> { :country => "PL", :name=> "Paweł" }

Neat, isn’t it?

Run Rails 2.3 application using Ruby 2.1

There is a way to have your old Rails 2.3.something application running using latest Ruby from the 2.1 branch. However, it is moderately complex and requires quite a few hacks. Not everything works perfect with this setup, though. Common exceptions are performance tests and some of the generators, but I regard those as minor annoyances.

I’m using 2-3-stable branch of Rails, which means version 2.3.18 plus some additional unreleased patches on top of it and a recently released Ruby 2.1.8.

This guide assumes that you are using Bundler to manage your gems. If not, please follow this guide first.

Gemfile

gem 'rails', :github => 'rails/rails', :branch => '2-3-stable'
gem 'rake'
gem 'json'
gem 'iconv'

group :test do
  gem 'test-unit', '1.2.3'
end

And then:

$ bundle update rails rake json iconv test-unit

Rakefile

Remove the following line:

require 'rake/rdoctask'

config/boot.rb

Change the following block after rescue Gem::LoadError => load_error to look like this:

if load_error.message =~ /Could not find RubyGem rails/
  STDERR.puts %(Missing the Rails #{version} gem. Please `gem install -v=#{version} rails`, update your RAILS_GEM_VERSION setting in config/environment.rb for the Rails version you do have installed, or comment out RAILS_GEM_VERSION to use the latest version installed.)
  exit 1
else
  raise
end

config/deploy.rb

Update your Ruby version used for Capistrano deployments. Only if you’re using Capistrano with RVM.

set :rvm_ruby_string, '2.1.8'

config/environment.rb

Change hardcoded Rails version:

RAILS_GEM_VERSION = '2.3.18' unless defined? RAILS_GEM_VERSION

Before Rails::Initializer.run block add the following:

# Rails 2.3 and Ruby 2.0+ compatibility hack
# http://blog.lucascaton.com.br/index.php/2014/02/28/have-a-rails-2-app-you-can-run-it-on-the-newest-ruby/
if RUBY_VERSION >= '2.0.0'
  module Gem
    def self.source_index
      sources
    end

    def self.cache
      sources
    end

    SourceIndex = Specification

    class SourceList
      # If you want vendor gems, this is where to start writing code.
      def search(*args); []; end
      def each(&block); end
      include Enumerable
    end
  end
end

Additional initializers

config/initializers/active_record_callbacks_ruby2.rb:

# ActiveRecord::Callbacks compatibility fix
# http://blog.lucascaton.com.br/index.php/2014/02/28/have-a-rails-2-app-you-can-run-it-on-the-newest-ruby/

module ActiveRecord
  module Callbacks

    private

    def callback(method)
      result = run_callbacks(method) { |result, object| false == result }

      # The difference is here, the respond_to must check protected methods.
      if result != false && respond_to_without_attributes?(method, true)
 result = send(method)
      end

      notify(method)

      return result
    end
  end
end

config/initializers/backport_3937.rb:

# Encoding issues with Ruby 2+
# backport #3937 to Rails 2.3.8
# https://github.com/rails/rails/pull/3937/files

class ERB
  module Util
    def html_escape(s)
      s = s.to_s
      if s.html_safe?
        s
      else
        silence_warnings { s.gsub(/[&"'><]/n) { |special| HTML_ESCAPE[special] }.html_safe }
      end
    end
  end
end

config/initializers/i18n_ruby2.rb:

# Patch to make i18n work with Ruby 2+
# http://blog.lucascaton.com.br/index.php/2014/02/28/have-a-rails-2-app-you-can-run-it-on-the-newest-ruby/

module I18n
  module Backend
    module Base
      def load_file(filename)
        type = File.extname(filename).tr('.', '').downcase

        raise UnknownFileType.new(type, filename) unless respond_to?(:"load_#{type}", true)

        data = send(:"load_#{type}", filename) # TODO raise a meaningful exception if this does not yield a Hash

        data.each { |locale, d| store_translations(locale, d) }
      end
    end
  end
end

config/initializers/rails_generators.rb:

# This is a very important monkey patch to make Rails 2.3.18 to work with Ruby 2+
# If you're thinking to remove it, really, don't, unless you know what you're doing.
# http://blog.lucascaton.com.br/index.php/2014/02/28/have-a-rails-2-app-you-can-run-it-on-the-newest-ruby/

if Rails::VERSION::MAJOR == 2 && RUBY_VERSION >= '2.0.0'
  require 'rails_generator'
  require 'rails_generator/scripts/generate'

  Rails::Generator::Commands::Create.class_eval do
    def template(relative_source, relative_destination, template_options = {})
      file(relative_source, relative_destination, template_options) do |file|
        # Evaluate any assignments in a temporary, throwaway binding
        vars = template_options[:assigns] || {}
        b = template_options[:binding] || binding
        # this no longer works, eval throws "undefined local variable or method `vars'"
        # vars.each { |k, v| eval "#{k} = vars[:#{k}] || vars['#{k}']", b }
        vars.each { |k, v| b.local_variable_set(:"#{k}", v) }

        # Render the source file with the temporary binding
        ERB.new(file.read, nil, '-').result(b)
      end
    end
  end
end

config/initializers/ruby2.rb:

# This is a very important monkey patch to make Rails 2.3.18 to work with Ruby 2+
# If you're thinking to remove it, really, don't, unless you know what you're doing.
# http://blog.lucascaton.com.br/index.php/2014/02/28/have-a-rails-2-app-you-can-run-it-on-the-newest-ruby/

if Rails::VERSION::MAJOR == 2 && RUBY_VERSION >= '2.0.0'
  module ActiveRecord
    module Associations
      class AssociationProxy
        def send(method, *args)
          if proxy_respond_to?(method, true)
            super
          else
            load_target
            @target.send(method, *args)
          end
        end
      end
    end
  end
end

config/initializers/string_encodings.rb:

# Set default encoding for everything coming in and out of the app
# TODO this could/should be removed when upgrading to Rails 3+
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8

config/initializers/utf8_params.rb

# convert all params into UTF-8 (from ASCII-8BIT)
# http://jasoncodes.com/posts/ruby19-rails2-encodings

raise "Check if this is still needed on " + Rails.version unless Rails.version == '2.3.18'

class ActionController::Base
  def force_utf8_params
    traverse = lambda do |object, block|
      if object.kind_of?(Hash)
        object.each_value { |o| traverse.call(o, block) }
      elsif object.kind_of?(Array)
        object.each { |o| traverse.call(o, block) }
      else
        block.call(object)
      end
      object
    end
    force_encoding = lambda do |o|
      o.force_encoding(Encoding::UTF_8) if o.respond_to?(:force_encoding)
    end
    traverse.call(params, force_encoding)
  end

  before_filter :force_utf8_params
end

References: