/ docker

Docker images and files chown

I never though before about the Docker containers and the results the chown (change the ownership of a file) command can have on the final image.
I think the chown should be forbidden (as mush as possible) because it will produce bigger images than necessary.

But lets review together how Docker works to understand what I just said.

Docker image layers

As you surely know, any Docker image consists of overlapping layers
In the Docker official documentation you can find

A Docker image is built up from a series of layers. Each layer represents an instruction in the image’s Dockerfile. Each layer except the very last one is read-only.
Each layer is only a set of differences from the layer before it.

For any instruction contained in your Dockerfile (and in all the FROM images) you will result in an added layer.
You can check the status of a specific image just executing the history command on it:

docker history mmornati/docker-ghostblog:1.16.0
/bin/sh -c #(nop)  CMD ["/bin/sh" "/ghost/...   0B
/bin/sh -c #(nop)  VOLUME [/ghost-override]     0B
/bin/sh -c #(nop)  HEALTHCHECK &{["CMD-SHE...   0B
/bin/sh -c #(nop)  EXPOSE 2368                  0B
/bin/sh -c #(nop)  ENV NODE_ENV=production      0B
/bin/sh -c #(nop) WORKDIR /ghost                0B
/bin/sh -c #(nop)  ENV HOME=/ghost              0B
/bin/sh -c #(nop)  USER ghost                   0B
/bin/sh -c chown -R ghost:ghost /ghost && ...   108MB
/bin/sh -c #(nop) COPY dir:ec27a30893731c0...   108MB
/bin/sh -c addgroup -S -g 1276 ghost &&   ...   5.13kB
/bin/sh -c #(nop)  LABEL maintainer=Marco ...   0B
/bin/sh -c #(nop)  CMD ["node"]                 0B
/bin/sh -c apk add --no-cache --virtual .b...   3.91MB
/bin/sh -c #(nop)  ENV YARN_VERSION=0.27.5      0B
/bin/sh -c addgroup -g 1000 node     && ad...   45.6MB
/bin/sh -c #(nop)  ENV NODE_VERSION=6.11.3      0B
/bin/sh -c #(nop)  ENV NPM_CONFIG_LOGLEVEL...   0B
/bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
/bin/sh -c #(nop) ADD file:89e72bfc19e8162...   4.81MB

In the output you can see the list of commands and the space disk consumed by it. The result should be read bottom-up: the lower layer is the last one.

Docker copy-on-write (CoW) strategy

To manage the files in the multiple layers, Docker is using the copy on write strategy.

Copy-on-write is a strategy of sharing and copying files for maximum efficiency. If a file or directory exists in a lower layer within the image, and another layer (including the writable layer) needs read access to it, it just uses the existing file. The first time another layer needs to modify the file (when building the image or running the container), the file is copied into that layer and modified. This minimizes I/O and the size of each of the subsequent layers. These advantages are explained in more depth below.

So simply, when you need to change a file contained in another layer, Docker copies the file in the layer where you are modifying it, with the result that in your final image the file is present 2 times and is taken 2 time the disk space (one per layer).

What about if you need to delete a file present in another layer?
Following what we just see, the file is "hidden" in the final layer because you delete it, but the disk space is taken because the file is still present in the previous layer! So it is "useless" to remove a file.

This is the reason why, for example, the clean all in the following command is useless if your target is to prevend waste of disk space:

RUN yum -y install php
RUN yum clean all

This other one instead will produce a single layer with the php package installed and with the yum cache correctly cleaned.

RUN yum -y install php && yum clean all

Docker ChOwn

What about the chown? It is following the same rules: applying a change ownership to a file, for Docker means copy that file in the new layer and change the ownership. Anytime you are using it you are taking more space than you need.
In the history I pasted before there is the example about the image I was using for this blog:

/bin/sh -c chown -R ghost:ghost /ghost && ...   108MB
/bin/sh -c #(nop) COPY dir:ec27a30893731c0...   108MB

The first line (reading bottom top) COPY a folder into the container (1 layer) the second one is changing the ownership of the file (1 layer). Result: 108Mb the copy + 108Mb to chown.
I was wasting 108Mb of space because of the chown!!

How to fix it?

To prevent waste of disk space we have to always think how we can reduce the number of layer.
Specifically for the chown problem, in the recent versions of Docker, you can add the files ownership as parameter of the COPY command (--chown=-user-):

COPY --from=plugin-builder --chown=node /builder/cloudinary-store

which is producing the copy and the ownership in the same command and so in a single layer.

The result refactoring this blog image (~270Mb-108Mb):

In the end, when you are building images for production it is important to think to the disk space and to keep in mind that you goal is to reduce the number of layer.