Redash: Improve/optimize current Dockerfile and split containers

Created on 8 Mar 2016 · 1Comment · Source: getredash/redash

Is there any reason the redash docker container runs both the app and celery?

Seems like it'd be better to break celery out into it's own container, especially since there's an official Docker Celery image: https://hub.docker.com/_/celery/

Tech Debt

Source

jeffwidman

👍1

Most helpful comment

The reason it's done the way it's done right now is because it was the shortest path to success. It should be changed, but not to separate images but rather separate containers. The workers and web server can still share the same image, just have a different command when executing.

Also, the Dockerfile should be reorganized to improve caching. For the hosted version I'm using the following Dockerfile:

FROM ubuntu:trusty

EXPOSE 5000

RUN useradd --system --comment " " --create-home redash

# Ubuntu packages
RUN apt-get update && \
    apt-get install -y python-pip python-dev curl build-essential libffi-dev sudo wget \
    # Postgres client
    libpq-dev \
    # Additional packages required for data sources:
    libssl-dev libmysqlclient-dev freetds-dev && \
    # Cleanup
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN pip install -U setuptools

# Set the WORKDIR to /app so all following commands run in /app
WORKDIR /app

COPY requirements.txt requirements_dev.txt requirements_all_ds.txt ./
RUN pip install -r requirements.txt -r requirements_dev.txt -r requirements_all_ds.txt

# Adding the whole repository to the container
COPY . ./
RUN chown -R redash /app

ENTRYPOINT ["/app/bin/start"]

For example, COPYing the requirement files before copying the rest, ensures that as long as we don't change dependencies we can reuse the cached layer.

It's also simpler than the existing one because we don't install the frontend (Node.js) dependencies. But this probably should be kept for people who want to build locally.

The start script is something like the following:

#!/bin/bash
set -e

get_config() {
  ENV_NAME=${ENV_NAME:-production}

  if [ "$ENV_NAME" = "production" ]
  then
    # redacted...
  fi
}

worker() {
  WORKERS_COUNT=${WORKERS_COUNT:-2}
  QUEUES=${QUEUES:-queries,scheduled_queries,celery}

  echo "Starting $WORKERS_COUNT workers for queues: $QUEUES..."
  exec sudo -E -u redash /usr/local/bin/celery worker --app=redash.worker -c$WORKERS_COUNT -Q$QUEUES -linfo --maxtasksperchild=10 -Ofair
}

scheduler() {
  WORKERS_COUNT=${WORKERS_COUNT:-1}
  QUEUES=${QUEUES:-celery}

  echo "Starting scheduler and $WORKERS_COUNT workers for queues: $QUEUES..."

  exec sudo -E -u redash /usr/local/bin/celery worker --app=redash.worker --beat -c$WORKERS_COUNT -Q$QUEUES -linfo --maxtasksperchild=10 -Ofair
}

api() {
  exec sudo -E -u redash /usr/local/bin/gunicorn -b 0.0.0.0:5000 -k gevent --name redash -w4 redash.wsgi:app
}


help() {
  echo "Usage: "
  echo "`basename "$0"` {worker, scheduler, api}"
}

case "$@" in
  worker)
    get_config
    shift
    worker
    ;;
  api)
    get_config
    shift
    api
    ;;
  scheduler)
    get_config
    shift
    scheduler
    ;;
  *)
    help
    ;;
esac

And then usage is something like: docker run redash/redash worker or docker run redash/redash api.

arikfr on 8 Mar 2016

👍4

>All comments