Is there any reason the redash docker container runs both the app and celery?
Seems like it'd be better to break celery out into it's own container, especially since there's an official Docker Celery image: https://hub.docker.com/_/celery/
The reason it's done the way it's done right now is because it was the shortest path to success. It should be changed, but not to separate images but rather separate containers. The workers and web server can still share the same image, just have a different command when executing.
Also, the Dockerfile should be reorganized to improve caching. For the hosted version I'm using the following Dockerfile:
FROM ubuntu:trusty
EXPOSE 5000
RUN useradd --system --comment " " --create-home redash
# Ubuntu packages
RUN apt-get update && \
apt-get install -y python-pip python-dev curl build-essential libffi-dev sudo wget \
# Postgres client
libpq-dev \
# Additional packages required for data sources:
libssl-dev libmysqlclient-dev freetds-dev && \
# Cleanup
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN pip install -U setuptools
# Set the WORKDIR to /app so all following commands run in /app
WORKDIR /app
COPY requirements.txt requirements_dev.txt requirements_all_ds.txt ./
RUN pip install -r requirements.txt -r requirements_dev.txt -r requirements_all_ds.txt
# Adding the whole repository to the container
COPY . ./
RUN chown -R redash /app
ENTRYPOINT ["/app/bin/start"]
For example, COPYing the requirement files before copying the rest, ensures that as long as we don't change dependencies we can reuse the cached layer.
It's also simpler than the existing one because we don't install the frontend (Node.js) dependencies. But this probably should be kept for people who want to build locally.
The start script is something like the following:
#!/bin/bash
set -e
get_config() {
ENV_NAME=${ENV_NAME:-production}
if [ "$ENV_NAME" = "production" ]
then
# redacted...
fi
}
worker() {
WORKERS_COUNT=${WORKERS_COUNT:-2}
QUEUES=${QUEUES:-queries,scheduled_queries,celery}
echo "Starting $WORKERS_COUNT workers for queues: $QUEUES..."
exec sudo -E -u redash /usr/local/bin/celery worker --app=redash.worker -c$WORKERS_COUNT -Q$QUEUES -linfo --maxtasksperchild=10 -Ofair
}
scheduler() {
WORKERS_COUNT=${WORKERS_COUNT:-1}
QUEUES=${QUEUES:-celery}
echo "Starting scheduler and $WORKERS_COUNT workers for queues: $QUEUES..."
exec sudo -E -u redash /usr/local/bin/celery worker --app=redash.worker --beat -c$WORKERS_COUNT -Q$QUEUES -linfo --maxtasksperchild=10 -Ofair
}
api() {
exec sudo -E -u redash /usr/local/bin/gunicorn -b 0.0.0.0:5000 -k gevent --name redash -w4 redash.wsgi:app
}
help() {
echo "Usage: "
echo "`basename "$0"` {worker, scheduler, api}"
}
case "$@" in
worker)
get_config
shift
worker
;;
api)
get_config
shift
api
;;
scheduler)
get_config
shift
scheduler
;;
*)
help
;;
esac
And then usage is something like: docker run redash/redash worker or docker run redash/redash api.
Most helpful comment
The reason it's done the way it's done right now is because it was the shortest path to success. It should be changed, but not to separate images but rather separate containers. The workers and web server can still share the same image, just have a different command when executing.
Also, the Dockerfile should be reorganized to improve caching. For the hosted version I'm using the following Dockerfile:
For example,
COPYing the requirement files before copying the rest, ensures that as long as we don't change dependencies we can reuse the cached layer.It's also simpler than the existing one because we don't install the frontend (Node.js) dependencies. But this probably should be kept for people who want to build locally.
The
startscript is something like the following:And then usage is something like:
docker run redash/redash workerordocker run redash/redash api.