docker – Paweł Gościcki

The aim of this guide is to provide a working solution to generating a PDF version of a webpage using Puppeteer running in a Docker container as a Lambda function. The Docker container approach is used to bypass the 50MB Lambda code size limit. The other option is to use something like chrome-aws-lambda.

We’ll start with the Dockerfile, which assumes Lambda function with a node.js v16 engine called index.js with a named handler export:

FROM public.ecr.aws/lambda/nodejs:16

# Required for puppeteer to run
RUN yum install -y amazon-linux-extras
RUN amazon-linux-extras install epel -y

# Chromium dependencies
RUN yum install -y \
  GConf2.x86_64 \
  alsa-lib.x86_64 \
  atk.x86_64 \
  cups-libs.x86_64 \
  gtk3.x86_64 \
  ipa-gothic-fonts \
  libXScrnSaver.x86_64 \
  libXcomposite.x86_64 \
  libXcursor.x86_64 \
  libXdamage.x86_64 \
  libXext.x86_64 \
  libXi.x86_64 \
  libXrandr.x86_64 \
  libXtst.x86_64 \
  pango.x86_64 \
  xorg-x11-fonts-100dpi \
  xorg-x11-fonts-75dpi \
  xorg-x11-fonts-Type1 \
  xorg-x11-fonts-cyrillic \
  xorg-x11-fonts-misc \
  xorg-x11-utils

RUN yum update -y nss

# Chromium needs to be installed as a system dependency, not via npm; otherwise there will be an error about missing libatk-1.0
RUN yum install -y chromium

COPY index.js package.json package-lock.json ${LAMBDA_TASK_ROOT}

RUN npm ci --omit=dev

CMD [ "index.handler" ]

The above Dockerfile assures all required dependencies are in place. The next step is to setup the Puppeteer’s launch. Here is the relevant snippet from the Lambda function code:

import puppeteer from 'puppeteer'

const viewportOptions = {
  args: [
    // Flags for running in Docker on AWS Lambda
    // https://www.howtogeek.com/devops/how-to-run-puppeteer-and-headless-chrome-in-a-docker-container
    // https://github.com/alixaxel/chrome-aws-lambda/blob/f9d5a9ff0282ef8e172a29d6d077efc468ca3c76/source/index.ts#L95-L118
    // https://github.com/Sparticuz/chrome-aws-lambda/blob/master/source/index.ts#L95-L123
    '--allow-running-insecure-content',
    '--autoplay-policy=user-gesture-required',
    '--disable-background-timer-throttling',
    '--disable-component-update',
    '--disable-dev-shm-usage',
    '--disable-domain-reliability',
    '--disable-features=AudioServiceOutOfProcess,IsolateOrigins,site-per-process',
    '--disable-ipc-flooding-protection',
    '--disable-print-preview',
    '--disable-setuid-sandbox',
    '--disable-site-isolation-trials',
    '--disable-speech-api',
    '--disable-web-security',
    '--disk-cache-size=33554432',
    '--enable-features=SharedArrayBuffer',
    '--hide-scrollbars',
    '--ignore-gpu-blocklist',
    '--in-process-gpu',
    '--mute-audio',
    '--no-default-browser-check',
    '--no-first-run',
    '--no-pings',
    '--no-sandbox',
    '--no-zygote',
    '--single-process',
    '--use-angle=swiftshader',
    '--use-gl=swiftshader',
    '--window-size=1920,1080',
  ],
  defaultViewport: null,
  headless: true,
}

const browser = await puppeteer.launch(viewportOptions)

try {
  const page = await browser.newPage()

  const url = 'https://...'

  await page.goto(url, { waitUntil: ['domcontentloaded', 'networkidle0'] })
  await page.emulateMediaType('print')

  const pdf = await page.pdf({})
} catch (error) {
  ...
}

Tag: docker

Running Puppeteer on AWS Lambda in a Docker container