Rasa: weightless docker image

Created on 12 Jul 2019  路  12Comments  路  Source: RasaHQ/rasa

Description of Problem: rasa/rasa image is big (~1.3GB). I'm trying to shrink the image, would you be interested if the weight difference is important ?

Overview of the Solution: build requirements from scratch on an alpine image and remove non-necessary runtime system dependencies

Examples (if relevant):

Blockers (if relevant):

Definition of Done:

  • [ ] install scipy / openblas https://github.com/scipy/scipy/issues/9481#issuecomment-510540522
  • [ ] install tensorflow / bazel
  • [ ] clean up non-necessary runtime dependencies once builded
  • stale type

    All 12 comments

    @tormath1 sure, that would be interesting to see. Not sure how many of the requirements you can actually get rid of. Is this something you want to test out yourself?

    Yes, totally. I would be glad to share the result with you !

    @tormath1 Thanks for your offer! Could you please also check with https://github.com/RasaHQ/rasa/pull/3332 ? In this pr I'm trying to achieve the same :-) Let me know if you have suggestions!

    @tormath1 One more question:
    Building tensorflow from scratch in an alpine image is quite tedious. Don't you think the resulting image definition would be quite complex / hard to maintain?

    @wochinge, well I guess I won't be harder than any dependencies. Actually, tf should release an Alpine wheel. :sweat_smile:
    Building dependencies from sources could optimize built wheels as we are compiling through GCC. I'll let you know how's going.
    Good work for your slim image. :+1:

    This is my first stage:

    FROM python:3.7.3-alpine AS builder
    
    RUN apk add --update \
            musl-dev \
            libffi-dev \
            freetype-dev \
            alpine-sdk \
            gfortran \
            perl \
            openjdk8 \
            zip \
            bash \
            git
    
    # install openblas: numpy requirements
    RUN wget https://github.com/xianyi/OpenBLAS/archive/v0.3.6.tar.gz \
            && tar -xf v0.3.6.tar.gz \
            && cd OpenBLAS-0.3.6/ \
            && make BINARY=64 FC=$(which gfortran) USE_THREAD=1 \
            && make PREFIX=/usr/lib/openblas install
    
    # install bazel: tensorflow requirements
    RUN wget https://github.com/bazelbuild/bazel/releases/download/0.28.0/bazel-0.28.0-dist.zip \
            && unzip bazel-0.28.0-dist.zip -d bazel \
            && cd bazel \
            && JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh \
            && mv output/bazel /usr/local/lib/
    
    # patch numpy to use openblas
    RUN pip download numpy==1.16.4 \
            && unzip numpy-1.16.4.zip \
            && cd numpy-1.16.4 \
            && echo -e '[DEFAULT]\n\
    library_dirs = /usr/lib/openblas/lib\n\
    include_dirs = /usr/lib/openblas/lib\n\n\
    [atlas]\n\
    atlas_libs = openblas\n\
    libraries = openblas\n\n\
    [openblas]\n\
    libraries = openblas\n\
    library_dirs = /usr/lib/openblas/lib\n\
    include_dirs = /usr/lib/openblas/lib' \
            >> site.cfg \
            && python setup.py build --fcompiler=$(gfortran) \
            && python setup.py install
    
    # install tensorflow
    RUN git clone https://github.com/tensorflow/tensorflow.git \
            && cd tensorflow \
            && git checkout v1.13.1
    # ./configure steps and install are missing
    
    # fetch runtime dependencies
    RUN scanelf --needed --nobanner --recursive /venv \
            | awk '{ gsub(/,/, "\nso:", $2); print "so: " $2 }' \
            | sort -u \
            | xargs -r apk info --installed \
            | sort -u \
            >> runtime.txt
    

    This is my first stage:

    Do you have some estimates regarding the image size?

    Hi guys, I have already built an alpine version and the estimated size is a bit under 2G decompressed as opposed to the 2.7G from rasa/rasa:latest-full. I think it's not a huge reduction and definitely not worth the build time if it's just for space optimization, however, there are other arguments in favor of alpine based images, particularly for enterprises. In our case, our private registry vulnerability scan complains about anything but alpine and from scratch. Too paranoid? Maybe, as I say it takes about an hour to build on CI/CD runners.

    @kronos-cm seems about the space reduction I got in https://github.com/RasaHQ/rasa/pull/3332

    Hi guys, I have already built an alpine version and the estimated size is a bit under 2G decompressed as opposed to the 2.7G from rasa/rasa:latest-full. I think it's not a huge reduction and definitely not worth the build time if it's just for space optimization, however, there are other arguments in favor of alpine based images, particularly for enterprises. In our case, our private registry vulnerability scan complains about anything but alpine and from scratch. Too paranoid? Maybe, as I say it takes about an hour to build on CI/CD runners.

    @kronos-cm: We are running into the same vulnerability scan issues and most of the issues are around Debian OS. I am trying to built on alpine version. It seems you were able to successfully built BOT Servers on alpine. Could you please share your docker file. Thank you.

    Hi @HariTeneti, yes, we were able to do it compiling tf==1.14 from scratch with bazel, and rasa=1.2.3. Unfortunately, per my contract, I am not able to share any files publicly, though I would love to give you a hand if possible.

    This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

    This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.

    Was this page helpful?
    0 / 5 - 0 ratings