Running Puppeteer under Docker

Puppeteer is an API enabling browser automation via Chrome/Chromium. Quoting its homepage:

“Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol. Puppeteer runs in headless mode by default, but can be configured to run in full (non-headless) Chrome/Chromium.”

I’ve already shown it in action a couple of years ago in my article, “Gathering Net Salary Data with Puppeteer“, where I used it for web scraping. Browser automation is also very common for automated testing of web applications, and may also be used for a lot of other things.

As with any other piece of software, it is sometimes convenient to package a Puppeteer script in a Docker container. However, since deploying a browser is fundamentally more complicated than your average API, Puppeteer and Docker are a little tricky to get working together. In this article, we’ll see why this combination is problematic and how to solve it.

A Minimal Puppeteer Example

Before we embark upon our Docker journey, we need a simple Puppeteer program we can test with. The easiest thing we can do is use Puppeteer to open a webpage and take a screenshot of it. We can do this quite easily as follows.

First, we need to create a folder, install prerequisites, and create a file in which to put our JavaScript code. The following bash script takes care of all this, assuming you already have Node.js installed. Please note that I am working on Linux Kubuntu 22.04, so if you’re using a radically different operating system, the steps may vary a little.

mkdir pupdock
cd pupdock
npm install puppeteer
touch main.js

Next, open main.js with your favourite text editor or IDE and use the following code (adapted from “Gathering Net Salary Data with Puppeteer“):

const puppeteer = require('puppeteer');
 
(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.programmersranch.com');
  await page.screenshot({ path: 'screenshot.png' });
  await browser.close();
})();

To see that it works, run this script using the following command:

node main.js

A file called screenshot.png should be saved in the same folder:

A screenshot of my older tech blog, Programmer’s Ranch, complete with Blogger’s stupid cookie banner.

An Initial Attempt with Docker

Now that we know that the script works, let’s try and make a Docker image out of it. Add a Dockerfile with the following contents:

FROM node:19-alpine3.16
WORKDIR /puppeteer
COPY main.js package.json package-lock.json ./
RUN npm install
CMD ["node", "main.js"]

What we’re doing here is starting from a recent (at the time of writing this article) Node Docker image. Then we set the current working directory to a folder called /puppeteer. We copy the script along with the list of package dependencies (Puppeteer, basically) into the image, install those dependencies, and set up the image to execute node main.js when it is run.

We can build the Docker image using the following command, giving it the tag of “pupdock” so that we can easily find it later. The sudo command is only necessary if you’re running on Linux.

sudo docker build -t pupdock .

Once the image is built, we can run a container based on this image using the following command:

sudo docker run pupdock

Unfortunately, we hit a brick wall right away:

/puppeteer/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserRunner.js:300
            reject(new Error([
                   ^

Error: Failed to launch the browser process! spawn /root/.cache/puppeteer/chrome/linux-1108766/chrome-linux/chrome ENOENT


TROUBLESHOOTING: https://pptr.dev/troubleshooting

    at onClose (/puppeteer/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserRunner.js:300:20)
    at ChildProcess.<anonymous> (/puppeteer/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserRunner.js:294:24)
    at ChildProcess.emit (node:events:512:28)
    at ChildProcess._handle.onexit (node:internal/child_process:291:12)
    at onErrorNT (node:internal/child_process:483:16)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

Node.js v19.8.1

The error and stack trace are both pretty cryptic, and it’s not clear why this failed. There isn’t much you can do other than beg Stack Overflow for help.

Fortunately, I’ve done this already myself, and I can tell you that it fails because Chrome (which Puppeteer is trying to launch) needs more security permissions than Docker provides by default. We’ll need to learn a bit more about Docker security to make this work.

Dealing with Security Restrictions

So how do we give Chrome the permissions it needs, without compromising the security of the system it is running on? We have a few options we could try.

  • Disable the sandbox. Chrome uses a sandbox to isolate potentially harmful web content and prevent it from gaining access to the underlying operating system (see Sandbox and Linux Sandboxing docs to learn more). Many Stack Overflow answers suggest getting around errors by disabling this entirely. Unless you know what you’re doing, this is probably a terrible idea. It’s far better to relax security a little to allow exactly the permissions you need than to disable it entirely.
  • Use the Puppeteer Docker image. Puppeteer’s documentation on Docker explains how to use a Puppeteer’s own Docker images (available on the GitHub Container Registry) to run arbitrary Puppeteer scripts. While there’s not much info on how to work with these (e.g. which folder to mount as a volume in order to grab the generated screenshot), what stands out is that this approach requires the SYS_ADMIN capability, which exposes more permissions than we need.
  • Build your own Docker image. Puppeteer’s Troubleshooting documentation (also available under Chrome Developers docs and Puppeteer docs) has a section on running Puppeteer in Docker. We’ll follow this method, as it is the one that worked best for me.

A Second Attempt Based on the Docs

Our initial attempt with a simple Dockerfile didn’t go very well, but now we have a number of other Dockerfiles we could start with, including the Puppeteer Docker image’s Dockerfile, a couple in the aforementioned doc section on running Puppeteer in Docker (one built on a Debian-based Node image, and another based on an Alpine image), and several other random ones scattered across Stack Overflow answers.

My preference is the Alpine one, not only because Alpine images tend to be smaller than their Debian counterparts, but also because I had more luck getting it to work across Linux, Windows Subsystem for Linux and an M1 Mac than I did with the Debian one. So let’s replace our Dockerfile with the one in the Running on Alpine section, with a few additions at the end:

FROM alpine

# Installs latest Chromium (100) package.
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      harfbuzz \
      ca-certificates \
      ttf-freefont \
      nodejs \
      yarn

# Tell Puppeteer to skip installing Chrome. We'll be using the installed package.
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Puppeteer v13.5.0 works with Chromium 100.
RUN yarn add puppeteer@13.5.0

# Add user so we don't need --no-sandbox.
RUN addgroup -S pptruser && adduser -S -G pptruser pptruser \
    && mkdir -p /home/pptruser/Downloads /app \
    && chown -R pptruser:pptruser /home/pptruser \
    && chown -R pptruser:pptruser /app

# Run everything after as non-privileged user.
USER pptruser

WORKDIR /puppeteer
COPY main.js ./
CMD ["node", "main.js"]

Because the given Dockerfile already installs the puppeteer dependency and that’s all we need here, I didn’t even bother to do the usual npm install here, although a more complex script might possibly have additional dependencies to install.

At this point, we can build the Dockerfile and run the resulting image as before:

sudo docker build -t pupdock .
sudo docker run pupdock

The result is that it still doesn’t work, but the error is something that is more easily Googled than the one we have before:

/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:237
            reject(new Error([
                   ^

Error: Failed to launch the browser process!
Failed to move to new namespace: PID namespaces supported, Network namespace supported, but failed: errno = Operation not permitted


TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md

    at onClose (/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:237:20)
    at ChildProcess.<anonymous> (/node_modules/puppeteer/lib/cjs/puppeteer/node/BrowserRunner.js:228:79)
    at ChildProcess.emit (node:events:525:35)
    at ChildProcess._handle.onexit (node:internal/child_process:291:12)

Node.js v18.14.2

A Little Security Lesson

Googling that error about PID namespaces is what led me to the solution, but it still took a while, because I had to piece together many clues scattered in several places, including:

  • Answer by usethe4ce suggests downloading some chrome.json file and passing it to Docker, but it’s not immediately clear what this is/does.
  • Answer by Riccardo Manzan, based on the one by usethe4ce, provides an example Dockerfile based on Node (not Alpine) and also shows how to pass the chrome.json file both in docker run and docker-compose.
  • GitHub Issue by WhisperingChaos explains what that chrome.json is about.
  • Answer by hidev lists the five system calls that Chrome needs.

To understand all this confusion, we first need to take a step back and understand something about Docker security. Like the Chrome sandbox, Docker has its ways of restricting the extent to which a running Docker container can interact with the host.

Like any Linux process, a container requests whatever it needs from the operating system kernel using system calls. However, a container could be used to wreak havoc on the host if it is allowed to run whatever system calls it wants and then is successfully breached by an attacker. Fortunately, Linux provides a feature called seccomp that can restrict system calls to only the ones that are required, minimising the attack surface.

In Docker, this restriction is applied by means of a seccomp profile, basically a JSON file whitelisting the system calls to be allowed. Docker’s default seccomp profile restricts access to system calls enough that it prevents many known exploits, but this also prevents more complex applications that need additional system calls – such as Chrome – from working under Docker.

That chrome.json I mentioned earlier is a custom seccomp profile painstakingly created by one Jess Frazelle, intended to allow the system calls that Chrome needs but no more than necessary. This should be more secure than disabling the sandbox or running Chrome with the SYS_ADMIN capability.

A Third Attempt with chrome.json

Let’s give it a try. Download chrome.json and place it in the same folder as main.js, the Dockerfile and everything else. Then, run the container as follows:

sudo docker run --security-opt seccomp=path/to/chrome.json pupdock

This time, there’s no output at all – which is good, because it means the errors are gone. To ensure that it really worked, we’ll grab the screenshot from inside the stopped container. First, get the container ID by running:

sudo docker container ls -a

Then, copy the screenshot from the container to the current working directory as follows, taking care to replace 7af0d705a751 with the actual ID of the container:

sudo docker cp 7af0d705a751:/puppeteer/screenshot.png ./screenshot.png
Puppeteer worked: the screenshot contains the full length of the page.

Additional Notes

I omitted a few things to avoid breaking the flow of this article, so I’ll mention them here briefly:

  • That SYS_ADMIN capability we saw mentioned earlier belongs to another Linux security feature: capabilities, which are also related to seccomp profiles.
  • If you need to pass a custom seccomp profile in a docker-compose.yaml file, see Riccardo Manzan’s Stack Overflow answer for an example.
  • If you run into other issues, use the dumpio setting to get more verbose output. It’s a little hard to separate the real errors from the noise, but it does help. Be sure to run in headless mode (it doesn’t make sense to run the GUI browser under Docker), and disable the GPU (--disable-gpu) if you see related errors.

Conclusion

Running Puppeteer under Docker might sound like an unusual requirement, perhaps overkill, but it does open up a window onto the interesting world of Docker security. Chrome’s complexity requires that it be granted more permissions than the typical Docker container.

It is unfortunate that the intricacies around getting Puppeteer to work under Docker are so poorly documented. However, once we learn a little about Docker security – and the Linux security features that it builds on – the solution of using a custom seccomp profile begins to make sense.

One thought on “Running Puppeteer under Docker”

  1. Following this tutorial today (October 18th, 2023) and didn’t download the chrome.json file.

    The second Dockerfile and docker run pupdock just works.

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *