Elasticsearch: Reduce the surface area of the Elasticsearch Docker image

Created on 30 Jan 2020  路  17Comments  路  Source: elastic/elasticsearch

Today our Docker image is based on the centos:7 base image. This leaves a large surface area of binaries and libraries that we don鈥檛 need, but exposes us to noisy vulnerability scans (with issues that don鈥檛 actually impact the security of the image). We haven鈥檛 made much of an effort to slim this surface area down.

One reason we chose this image over others (e.g., Ubuntu-derived images) is perceived better support of the JDK, because Red Hat has long been heavily involved in OpenJDK. This reason is a non-factor, now that we use the bundled JDK in the images. There was also a desire to have consistency with other images in the stack. I鈥檓 less convinced of the value of this compared to other factors but it is something to keep in our minds.

Note that a non-goal for this issue is to reduce the physical size of the image. While that is something to consider and will likely result from reducing the surface area of the image, it is separate to this issue to consider reducing the physical size of the image.

:DeliverPackaging Delivery

All 17 comments

Pinging @elastic/es-core-infra (:Core/Infra/Packaging)

Would using Alpine Linux make sense? A tiny distro with security in mind.

Here's some comments about performance:
https://nickjanetakis.com/blog/benchmarking-debian-vs-alpine-as-a-base-docker-image

Our images use to be based on the official Alpine Linux images. We had significant challenges with JVM bugs on the musl C library, and some other issues, so moved away. As of JDK 13 (the latest major release of the JDK), there is not official support for the JVM on Alpine Linux.

docker-slim looks promising here. It analyzes a running image to see what files are actually used, and removes everything else. Unfortunately, https://github.com/docker-slim/docker-slim/issues/26 gets in the way here. I'm attempting to work around it, because the reduction in surface area (and therefore image size) is significant.

docker-slim looks promising here.

If it literally looks at the loaded dependencies of the running processes at a moment in time then please make sure an ML job is opened at that moment in time, otherwise it may well remove dependencies that ML needs.

Yeah, I am keeping ML in mind. I'm just exploring options at the moment.

I also found GoogleContainerTools/distroless, which is interesting but probably unsuitable for us.

  • It's Debian-based
  • It's super minimal - you don't even get a shell or a package manager

Yeah, distroless has been raised for discussion before, but we need a Bash shell.

Have we considered using distroless and installing bash in it? Should still be lighter than full OS. Or maybe even BusyBox would suffice? It uses ash so unless we have something bash-specific we should be fine

Well, you can run gcr.io/distroless/base-debian10:debug which adds busybox. Might be worth an experiment.

OK, I managed to hack code enough to build an image, but it was pretty awful and in the env Java wouldn't start due to a missing libz.so.1. So I think we can rule that out for now.

I did try their Java version, but it seems unuseable unless you have a fat jar ready to run.

The maintainer(s) of docker-slim are keen to make it work for us, so I'll see how that goes.

I did try hacking something up (1) with a custom Rust / itnotify tool, and then (b) a bunch of bash and the inotifywait tool, but I couldn't get the resulting tar of accessed files to load into an image. I suspect docker-slim is doing something more clever here.

I just realised that we also need to factor in the UBI Docker builds - I'm honestly not sure what the impact would be right now. Possibly little to none, since it largely just swapped the base image, but it's something to be aware of.

I just realised that we also need to factor in the UBI Docker builds - I'm honestly not sure what the impact would be right now. Possibly little to none, since it largely just swapped the base image, but it's something to be aware of.

I think that we can ignore that for now. We have removed those images from our builds exactly to avoid it having an impact on maintenance/being a constraint while the needs for it sit in an unclear state.

I happened to find this issue while debugging https://github.com/openzipkin/zipkin/pull/3044

Just in case it helps, Zipkin has a stripped down base image that uses distroless that we run elasticsearch (and others like zipkin itself, cassandra, kafka) on

https://github.com/openzipkin/docker-jre-full/blob/master/Dockerfile
https://github.com/openzipkin/zipkin/blob/master/docker/storage/elasticsearch7/Dockerfile

The key point is using jlink to create a JRE bundle - since elasticsearch supports plugins it would probably want to link in all modules instead of our opinionated whitelist, but otherwise the pattern for extracting dependencies like libz / ca-certificates out of the distroless-java image would probably apply similarly.

I notice #52519 taking an approach using centos with busybox. Unfortunately, #49612 has caused the startup scripts to fail to parse on busybox - it'd be nice if busybox became officially supported by the startup scripts, but I'm kind of guessing that may not happen due to the tradeoff vs development speed.

Anyways, just wanted to share in case it's helpful - good luck on slimming down the image!

We've decided to proceed with creating a custom base image. The final image size is pretty similar, and we feel that it gives us the most control over how the image is built. I'll keep updating this issue as this work progresses.

After discussions with Docker, I'm wondering again about swapping the base image to something more "typical", specifically Debian. Debian have started publishing their own "slim" images, and are typically better at published security updates more quickly (according to Docker, anyway).

Regarding size and glibc, debian:buster-slim is 69.2MB (although adding curl bumps it to 96.5MB) and ships with 2.28-10, versus centos:7 which is 203MB with 2.17-307.el7.1.

If we went down this path, we could still opt to construct a bespoke image at a later point?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dawi picture dawi  路  3Comments

clintongormley picture clintongormley  路  3Comments

Praveen82 picture Praveen82  路  3Comments

dadoonet picture dadoonet  路  3Comments

martijnvg picture martijnvg  路  3Comments