DevOopsie Newsletter
Posts
Issue 2022-09-01: Unnecessarily Large Docker Images

Issue 2022-09-01: Unnecessarily Large Docker Images

What causes large images and how to reduce their size

Keith Alpichi
September 01, 2022

Picture this. You are building a Docker image. You create a Dockerfile. You set it up to do the following:

use the latest NodeJS image as the base
copy in the source code
install some OS dependencies
install the project dependencies
copy in test data to test against
configure and run tests
set the entrypoint to start a NodeJS server

You go to build it and check its size. Holy crap, it’s over one gigabyte! Oopsie. How though? Let’s see why and what you can do to keep the image size as small as possible.

Learnings

Every file added to the image adds to it's size. The first and simplest thing to do is to not install or copy unnecessary files or libraries in the Docker image. If the running container doesn’t require it don’t include it.

Leverage the use of a .dockerignore. It works just like .gitignore. Specify files you don’t want copied into the image. This way you can keep the structure of your source code hierarchy as it is and be confident Docker will safely ignore files you don’t want in the image when you provide the COPY instruction.

If for some reason you need an image for testing and development with debugging tools like `jq` and `curl`, build a separate image for that. You can do this by leveraging multi-stage builds. More on that later.

Build containers that have a single responsibility. A container with only a single service will produce a smaller image than one with many services in it. If you have an API and a DB service, put them into separate containers. If other processes need to run alongside your service can they be better served as a side-car, a container that runs next to it? Single responsibility containers are also easier to scale, debug, and update independently.

Know how caching works. RUN, COPY, ADD instructions create what is called layers. Each layer adds to the size of the image. If you’ve got many of these commands try to distill them to a few as possible. Can more than one RUN commands be combined with the bash token `&&`? Can you perform COPY once?

Use multi-stage builds. I won’t go into depth explaining this. It deserves it's own issue. You can read more about it in Docker’s documentation. However, I will explain the gist of it. Your Dockerfile will contain more than one `FROM` instruction. Between each looks like any other Dockerfile. Each is a stage. You can build up to each stage or, by default, until the last stage in the file. You can also copy files from previous stages into the current one. The interesting points is that you can treat each stage differently and build up to a stage independently. For example:

One stage could be setup to install dependencies. This stage might require many additional dependencies related to compiling source, fetching libraries over HTTP, expanding archives, etc. A later stage could simply copy the core dependencies from that stage, leaving the undesirable dependencies behind.
One stage could be used for running tests. Set it up to allow volume bind mounting for test data. You can then build up to this stage and use that in a test pipeline. When you build the image you want to run in production, configure the final stage to not include this test stage.
Another stage could be responsible for compiling the source code to a binary. The last stage could simply be responsible for using a small base image, like `scratch`, `busybox`, or `alpine`, and copying the binary onto it.

Recent Oopsies

And some other interesting things.

Use the tools in this article to figure out who executed sudo, what arguments were provided, and when it was run.
Falsehoods programmers believe about time. I realized I don't know shit about time.
Some really good advice on Hackernews about handling errors. Oh, so you don't just ignore them?
A compiled list of links to public failure stories related to Kubernetes. Be warned, it's a long list.
A recent phishing attack accessed over 10k credentials since March this year. Wow! This is nuts. Take phishing seriously. Understand how it works. Be cautious of suspicious links and sites from unknown senders. Report them to your security personnel immediately.
The rate of `curl` CVE issues have increased this year. If you use `curl`, I mean who doesn't, you should keep an eye on this.
Are you comfortable saying you don't do DevOps?

Thank you for reading! Got an oopsie you'd like to share? Hit me up on Twitter (link below)