22 January 2020

Simple Dockerfile Performance Improvements (Part 2)

Tags: Docker - Dockerfile - Kubernetes - Oracle JET

In Part 1 we learned about how the order of your Dockerfile commands matters along with copying only what you need, using these principles, this post dives deeper into a Dockerfile and reveals a great strategy called multi-stage builds that provide immense power when used correctly.

Note: Docker 17.05 or higher is required to enable Multi-Stage builds

This post uses the same Oracle JET Dockerfile from Part 1, it’s worth noting that the techniques in Part 1 and Part 2 can be applied to any Dockerfile, not just Oracle JET applications.

FROM node:13.4-alpine

# Copy source of UI into container
RUN mkdir -p /usr/src/ui

COPY src /usr/src/ui/src

COPY ./*.json /usr/src/ui/

COPY scripts /usr/src/ui/scripts

# Set working dir
WORKDIR /usr/src/ui

############ Install dependencies
RUN npm -g install @oracle/ojet-cli
RUN npm install

RUN ojet build --release

EXPOSE 8080

CMD [ "ojet", "serve", "--release", "--server-port=8080"]

Breaking down the Dockerfile, the only thing really needed in the final image is the result of the command ojet build --release which takes the src folder, runs various processing before creating a web folder used by the ojet serve command, everything before line 19 can be classified as “build” or “preparation” in the sense that none of that information is required for the image to actually run.

For example, the output of the ojet build command creates a web folder, so why do we need the src folder in the image anymore?

Instead of deleting the files and folders from the final image, we can utilise something called Multi-Stage builds which allows us greater control of what’s in the final image. A stage can be classified as starting at a FROM command and finishing before the next FROM command or the end of the file, whichever comes first.

Tip: Each stage can have a different base image

A good practice I follow when using multi-stage builds is to identify what is 100% required in the final image and what is optional, using the above Dockerfile, the following items are required for the image to start:

A base docker image to host the application i.e. node:alpine
ojet-cli installed for the ojet serve command to execute successfully
The output of ojet build --release so that we have our application to run
A port to be exposed EXPOSE 8080
ojet serve acting as our command to start the container (CMD)

Tip: Identify and separate what is used to build the application vs what is required to RUN the application

Based on the above, only 5 out of 11 commands in the Dockerfile are required to run the application in our final docker image, this information is a great start and means the Dockerfile can be split into the following two stages (One to build and One to run):

# Stage 0 Starts Here
FROM node:13.4-alpine

# Copy source of UI into container
RUN mkdir -p /usr/src/ui

COPY src /usr/src/ui/src

COPY ./*.json /usr/src/ui/

COPY scripts /usr/src/ui/scripts

# Set working dir
WORKDIR /usr/src/ui

############ Install dependencies
RUN npm -g install @oracle/ojet-cli
RUN npm install

RUN ojet build --release
# Stage 0 Ends Here

# Stage 1 Starts Here
FROM node:13.4-alpine

RUN npm -g install @oracle/ojet-cli

?????? the output of `ojet build --release` ??????

EXPOSE 8080

CMD [ "ojet", "serve", "--release", "--server-port=8080"]
# Stage 1 Ends Here

Ah, we need to copy data from an earlier stage of the Dockerfile but how do we do this?

We can be achieve this by using an integer to represent each stage, i.e. 0 for the first section, 1 for the second etc; this is okay but you can also use strings to actually name a stage (phew) which is way more useful!

If you wish to name a stage it can be done by doing the following:

FROM node:13.4-alpine AS build-container

Tip: Please name your stages, a few extra characters for greater readability wins every time

This allows you to reference the stage via the name build-container meaning we can copy data from the stage using the normal COPY command.

COPY --from=build-container /path/to/source/name /path/to/target/name
# Or if you decided to not use a name for your stage
COPY --from=0 /path/to/source/name /path/to/target/name

In the Dockerfile example, the output of ojet build --release is required to be copied over to the final stage, this can be done as follows:

# Stage 0 Starts Here
FROM node:13.4-alpine AS build-container

# Copy source of UI into container
RUN mkdir -p /usr/src/ui

COPY src /usr/src/ui/src

COPY ./*.json /usr/src/ui/

COPY scripts /usr/src/ui/scripts

# Set working dir
WORKDIR /usr/src/ui

############ Install dependencies
RUN npm -g install @oracle/ojet-cli
RUN npm install

RUN ojet build --release
# Stage 0 Ends Here

# Stage 1 Starts Here
FROM node:13.4-alpine

RUN npm -g install @oracle/ojet-cli

# Copy the compiled application from the previous stage
COPY --from=build-container /usr/src/ui/web .

EXPOSE 8080

CMD [ "ojet", "serve", "--release", "--server-port=8080"]
# Stage 1 Ends Here

If we execute docker history <IMAGE ID> against our new Dockerfile we see the following:

IMAGE               CREATED              CREATED BY                                      SIZE
7ea9587765de        About a minute ago   /bin/sh -c #(nop)  CMD ["ojet" "serve" "--re…   0B
6d5f9fe58419        About a minute ago   /bin/sh -c #(nop)  EXPOSE 8080                  0B
f79e131f215b        About a minute ago   /bin/sh -c #(nop) COPY dir:ce5976f3086596de8…   25.7MB
4b3abb27fb85        About a minute ago   /bin/sh -c npm -g install @oracle/ojet-cli      12.6MB
b850b4746cd9        4 months ago         /bin/sh -c #(nop)  CMD ["node"]                 0B
<missing>           4 months ago         /bin/sh -c #(nop)  ENTRYPOINT ["docker-entry…   0B
<missing>           4 months ago         /bin/sh -c #(nop) COPY file:238737301d473041…   116B
<missing>           4 months ago         /bin/sh -c apk add --no-cache --virtual .bui…   5.35MB
<missing>           4 months ago         /bin/sh -c #(nop)  ENV YARN_VERSION=1.21.1      0B
<missing>           4 months ago         /bin/sh -c addgroup -g 1000 node     && addu…   99.8MB
<missing>           4 months ago         /bin/sh -c #(nop)  ENV NODE_VERSION=13.4.0      0B
<missing>           6 months ago         /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>           6 months ago         /bin/sh -c #(nop) ADD file:fe1f09249227e2da2…   5.55MB

Wow, our final image only has 5 layers and look at the size! By copying only the output of the build/compilation process it means only 25.7MB has been transferred to the image as opposed to 200MB+ in Part 1.

It’s worth highlighting that whilst there are less layers in the final image, the actual build command i.e. docker build processes each command in your Dockerfile meaning that potentially the build process could take longer, however, there is a solution to this!

Check-out Part 3 where stage inheritance is looked at in greater detail and shows how you can use the layer cache to quickly build a multi-stage Docker image.

TL;DR:- Whilst slightly more advanced, using multi-stage builds is often the ultimate Dockerfile improvement, this allows you to run intermediate containers to perform processing whilst only keeping the end result or compiled artifacts in your final docker image, thus resulting in a significantly smaller image.

Useful Links:

Dockerfile Documentation

Multi-Stage Docker Builds Documentation

Oracle JET