20 December 2019
Simple Dockerfile Performance Improvements (Part 1)
Tags: Docker - Dockerfile - Kubernetes - Oracle JET
As a developer, I find that one of the easiest ways to ship an application is to create a Docker image via a custom Dockerfile in order for it to be deployed usually on AWS/Azure/Oracle cloud Kuberenetes (AKS, EKS, OKE). In recent times I’ve spent some time looking into the docker image created and attempted to find ways to shrink the final image size and also make the build quicker by caching docker image layers, this post is Part 1 of my approaches and also the final outcomes. This particular Dockerfile builds an Oracle JET UI Application.
Okay so a lot of people don’t seem to be aware that the docker build process actually caches the output of each command in the Dockerfile, by default Docker will try to use the cache where possible but, and this is the key thing, if the output of a command is different to the one cached by Docker, then no further commands will use the cache, so this means if your first or second command’s output changed, then the rest of your Dockerfile wouldn’t use the cache meaning the image is essentially built from scratch each time.
To visualize this, let’s start with a basic dockerfile that people might use to build a JET application:
FROM node:alpine
# Copy source of UI into container
RUN mkdir -p /usr/src/ui
COPY . /usr/src/ui
# Set working dir
WORKDIR /usr/src/ui
############ Install dependencies
RUN npm -g install @oracle/ojet-cli
RUN npm install
RUN ojet build --release
EXPOSE 8080
CMD [ "ojet", "serve", "--release", "--server-port=8080"]
Starting from the top, Line 1 specifies the base docker image we are using for our custom image, note that the version specified here is alpine but it doesn’t include a particualr Node version, meaning that Docker will use the latest version available of alpine, based on what we learned earlier, this means that if the version of node:alpine
changes, then the rest of the Docker file won’t use the cache for each command.
Tip: Always use a specific version for your base docker image. i.e. node:14.1-alpine
Skipping ahead to Line 6, we can see that it is copying the entire current directory .
into the image, now this should start ringing alarm bells, all we need to copy into the docker image is the bare minimum to build/deploy the application, for example what about the files/folders generated from local testing, your dockerfile, or even the node_modules folder? These additional files/folders can bloat your image for no reason.
Tip: Only copy into the container the bare minimum required.
The docker history <IMAGE_ID>
command is useful for seeing the impact each layer has on your docker file, the dockerfile shown above results in the following:
IMAGE CREATED CREATED BY SIZE
83c891fbc056 44 minutes ago /bin/sh -c #(nop) CMD ["ojet" "serve" "--re… 0B
8048dfe85969 44 minutes ago /bin/sh -c #(nop) EXPOSE 8080 0B
ef3f8312b489 44 minutes ago /bin/sh -c ojet build --release 32.8MB
95a3cd0c00db 44 minutes ago /bin/sh -c npm install 405B
73675b2dfeee 44 minutes ago /bin/sh -c npm -g install @oracle/ojet-cli 12.6MB
bd32bde943d8 44 minutes ago /bin/sh -c #(nop) WORKDIR /usr/src/ui 0B
67027668583e 44 minutes ago /bin/sh -c #(nop) COPY dir:2a70c154d910cb66c… 172MB
044280031d1e About an hour ago /bin/sh -c mkdir -p /usr/src/ui 0B
b850b4746cd9 2 days ago /bin/sh -c #(nop) CMD ["node"] 0B
<missing> 2 days ago /bin/sh -c #(nop) ENTRYPOINT ["docker-entry… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY file:238737301d473041… 116B
<missing> 2 days ago /bin/sh -c apk add --no-cache --virtual .bui… 5.35MB
<missing> 2 days ago /bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 0B
<missing> 2 days ago /bin/sh -c addgroup -g 1000 node && addu… 99.8MB
<missing> 2 days ago /bin/sh -c #(nop) ENV NODE_VERSION=13.4.0 0B
<missing> 9 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 9 days ago /bin/sh -c #(nop) ADD file:fe1f09249227e2da2… 5.55MB
The items that have <missing> for their IMAGE tag are usually the ones created by the base docker image, this means anything without an image tag is out of your control and cannot be changed, if you see that the base docker image itself is huge then it’s often worth searching if a smaller docker image meets your needs, often images are suffixed with alpine or slim.
Looking at the result of our Dockerfile, we can see that on Line 8, the SIZE value is 172MB, this means that our COPY command added 172MB to the final image size! Using the above guidance on only copying what we need, let’s change the command to copy the src folder along with any other relevant files required.
FROM node:alpine
# Copy source of UI into container
RUN mkdir -p /usr/src/ui
COPY src /usr/src/ui/src
COPY ./*.json /usr/src/ui/
COPY scripts /usr/src/ui/scripts
# Set working dir
WORKDIR /usr/src/ui
############ Install dependencies
RUN npm -g install @oracle/ojet-cli
RUN npm install
RUN ojet build --release
EXPOSE 8080
CMD [ "ojet", "serve", "--release", "--server-port=8080"]
IMAGE CREATED CREATED BY SIZE
e5d74bc87e30 About a minute ago /bin/sh -c #(nop) CMD ["ojet" "serve" "--re… 0B
3200be4b6b61 About a minute ago /bin/sh -c #(nop) EXPOSE 8080 0B
24277fdb7930 About a minute ago /bin/sh -c ojet build --release 32.9MB
adf8fb740f9f About a minute ago /bin/sh -c npm install 239MB
db30d785dd47 About a minute ago /bin/sh -c npm -g install @oracle/ojet-cli 12.6MB
08f3ddead178 About a minute ago /bin/sh -c #(nop) WORKDIR /usr/src/ui 0B
b28c7bfe122b About a minute ago /bin/sh -c #(nop) COPY dir:faaafe40f98dfa51b… 14.8kB
f05c5bde59dc About a minute ago /bin/sh -c #(nop) COPY multi:35ce946aad6a189… 113kB
6bfddd7accbc 4 minutes ago /bin/sh -c #(nop) COPY dir:dbbeaee68b9308dfc… 133kB
044280031d1e 2 hours ago /bin/sh -c mkdir -p /usr/src/ui 0B
b850b4746cd9 2 days ago /bin/sh -c #(nop) CMD ["node"] 0B
<missing> 2 days ago /bin/sh -c #(nop) ENTRYPOINT ["docker-entry… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY file:238737301d473041… 116B
<missing> 2 days ago /bin/sh -c apk add --no-cache --virtual .bui… 5.35MB
<missing> 2 days ago /bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 0B
<missing> 2 days ago /bin/sh -c addgroup -g 1000 node && addu… 99.8MB
<missing> 2 days ago /bin/sh -c #(nop) ENV NODE_VERSION=13.4.0 0B
<missing> 9 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 9 days ago /bin/sh -c #(nop) ADD file:fe1f09249227e2da2… 5.55MB
Two things to point out, firstly, there are now 3 COPY commands meaning that we have 3 more layers in the docker file and secondly, the final image size has increased due to npm install
installing various node modules (more on this in Part 2).
If we ignore the additional layers created by the 3 COPY commands, we can see that the size of those layers are 133KB, 113KB and 14.8KB, this is a huge decrease compared to the original 172MB of data copied over.
Docker also has the concept of a .dockerignore
file that can be used to explicitly specify which files should NOT be copied over into the image.
web/
themes/
staged-themes/
node_modules/
.gitignore
.dockerignore
Dockerfile
Using the original Dockerfile, with the .dockerignore file results in the following layers:
IMAGE CREATED CREATED BY SIZE
fc8fae68fcb9 About a minute ago /bin/sh -c #(nop) CMD ["ojet" "serve" "--re… 0B
05da8f40e26d About a minute ago /bin/sh -c #(nop) EXPOSE 8080 0B
58cb66ce921e About a minute ago /bin/sh -c ojet build --release 32.9MB
94b52e7a2e23 About a minute ago /bin/sh -c npm install 239MB
d2098f27406c About a minute ago /bin/sh -c npm -g install @oracle/ojet-cli 12.6MB
fc6df7370b28 About a minute ago /bin/sh -c #(nop) WORKDIR /usr/src/ui 0B
6ec3fc0a109a About a minute ago /bin/sh -c #(nop) COPY dir:fe19b872548fc3591… 261kB
044280031d1e 2 hours ago /bin/sh -c mkdir -p /usr/src/ui 0B
b850b4746cd9 2 days ago /bin/sh -c #(nop) CMD ["node"] 0B
<missing> 2 days ago /bin/sh -c #(nop) ENTRYPOINT ["docker-entry… 0B
<missing> 2 days ago /bin/sh -c #(nop) COPY file:238737301d473041… 116B
<missing> 2 days ago /bin/sh -c apk add --no-cache --virtual .bui… 5.35MB
<missing> 2 days ago /bin/sh -c #(nop) ENV YARN_VERSION=1.21.1 0B
<missing> 2 days ago /bin/sh -c addgroup -g 1000 node && addu… 99.8MB
<missing> 2 days ago /bin/sh -c #(nop) ENV NODE_VERSION=13.4.0 0B
<missing> 9 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 9 days ago /bin/sh -c #(nop) ADD file:fe1f09249227e2da2… 5.55MB
Now we can see we only have the 1 COPY layer and it copied over 261KB vs 172MB originally, you might be thinking well what’s the point in having separate COPY commands if I can just use the .dockerignore file, it all comes down to the first point raised in this article about when a commands output changes, the rest of the commands don’t use the cache, so I always advise that you should place the COPY commands with a slowly changing output towards the top and then the COPY command with a frequently changing output towards the end.
For example how often does your package-lock.json file change compared to your actual source code? Using this example, the package-lock.json would be copied towards the top of the image and the source code copied at the end of the image.
One final thought, I have seen examples where people have copied over a local tar archive file, then installed some form of tar package and then unpacked the copied archive. Stop. Doing. This. This is a prime example of not reading the documentation, Docker provides two commands to transfer data into the image, ADD and COPY, an often overlooked difference between the two is that ADD will automatically unpack a tar archive as a directory, this means you don’t need to install any tar packages or manually unpack the archives as Docker can do it for you!
Tip: Using ADD automatically unpacks a tar archive
But wait there’s more! This post identified small changes you can make to your Dockerfile that result in dramatic changes in image size and also build time, the next post covers the concept of multi-stage docker builds which builds on the lessons learned in this post to result in an even smaller and quicker docker image.
Check out Part 2 here
TL;DR:- Ordering your commands in terms of slowly-changing outputs to more frequently increasing output has a huge impact on build time, explicitly specifying files to copy instead of copying everything results in image size reduction and finally using ADD for local tar archives means Docker will automatically unpack the archive for you, no more manual unpacking! The above might sound like common sense but they’re often overlooked, even in production ready applications.
Useful Links: