Gunicorn: Gunicorn + Flask + Tensorflow in Docker container doesnt work

Created on 3 Oct 2019  路  23Comments  路  Source: benoitc/gunicorn

Hello

I have a TensorFlow 2.0 project that has a tiny Flask API in front of it so I can make requests to the model through HTTP calls with data preprocessing already done in the API. I chose Gunicorn to run my Flask/TensorFlow application in a docker container. Sadly tho, the worker process that Gunicorn creates hangs in the container until its killed by Gunicorn. The server never comes up and I canno tmake requests to it. Moreover, the same Gunicorn setup works flawlessly outside docker, in my host machine.

Docker logs (It just hangs there and prints a timeout error after a long time)

[2019-10-03 18:03:05 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2019-10-03 18:03:05 +0000] [1] [INFO] Listening at: http://127.0.0.1:8000 (1)
[2019-10-03 18:03:05 +0000] [1] [INFO] Using worker: sync
[2019-10-03 18:03:05 +0000] [8] [INFO] Booting worker with pid: 8
2019-10-03 18:03:08.126584: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-03 18:03:08.130017: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3392000000 Hz
2019-10-03 18:03:08.130306: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55fbb23fb2d0 executing computations on platform Host. Devices:
2019-10-03 18:03:08.130365: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version

dockerfile:

FROM python

RUN pip install gunicorn

WORKDIR /usr/src/app

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD [ "gunicorn", "--chdir", "src", "api:app" ]

api.py:

from flask import Flask, request
import inference

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def predict():
    if request.method == 'GET':
        return 'POST a json payload of {"imageBase64": "base64base64base64"} to this address to predict.'
    try:
        result = inference.run(request.json['imageBase64'])
        return result
    except Exception as e:
        return {'error': str(e)}, 500

if __name__ == "__main__":
    app.run()
else:
    print('\n * Server ready!')

inference.py

# Import packages
from __future__ import absolute_import, division, print_function, unicode_literals

import os
import tensorflow as tf
from tensorflow import keras
import PIL
import numpy as np
from io import BytesIO
import base64
import json

print("TensorFlow version is ", tf.__version__)

# Set variables
##########################################################################################
##########################################################################################

model_name = 'catsdogs'

base_dir = os.path.join(os.path.dirname(__file__), '..')
model_dir = os.path.join(base_dir, 'models')

##########################################################################################
##########################################################################################

# Load model
model = keras.models.load_model(os.path.join(model_dir, model_name + '.h5'))

# Load metadata
with open(os.path.join(model_dir, model_name + '_metadata.json')) as metadataFile:
    metadata = json.load(metadataFile)

# Split metadata
labels = metadata['training_labels']
image_size = metadata['image_size']

# Exported function for inference
def run(imgBase64):
    # Decode the base64 string
    image = PIL.Image.open(BytesIO(base64.b64decode(imgBase64)))

    # Pepare image
    image = image.resize((image_size, image_size), resample=PIL.Image.BILINEAR)
    image = image.convert("RGB")

    # Run prediction
    tensor = tf.cast(np.array(image), tf.float32) / 255.
    tensor = tf.expand_dims(tensor, 0, name=None)
    result = model.predict(tensor, steps=1)

    # Combine result with labels
    labeledResult = {}
    for i, label in enumerate(labels):
        labeledResult[label] = float(result[0][labels[label]])

    return labeledResult

I've searched for a solution to this for ages and havent managed to come up with anything, any help would be hugely appreciated.

Thanks!

Feedback Requested FeaturWorker FeaturIPC PlatforDocker

Most helpful comment

Had the same issue. As far as I can guess from my own logs, it looks like tensorflow is using gevent, and you cannot use gevent at the same time in gunicorn. The --workers and --threads flags don't make any difference for me, but changing from --worker-class=gevent to --worker-class=gthread fixed the issue for me. Thanks @javabrett

All 23 comments

Is your Docker setup limiting the maximum memory available to the container?

Experiencing the same. I don't think Gunicorn is to blame though. I get the same error when running python3 api.py from a bash shell in the container.

@tlaanemaa can you confirm what @mackdelany says?

Hey. Sorry for disappearing like that.

My setup is limiting Docker's RAM a little bit but the same thing happened even when I removed the limitation.

I will try out running the api file without gunicorn and r eport back.

Thanks!

@tlaanemaa any news about it?

@benoitc Heya
Sorry, I've been carried away with other stuff and havent had time to go further with this.
I'll try to poke this today and get back to you

So I tried running the app without gunicorn in the container and that worked.
Below is the CMD bit of my Dockerfile

Works:

CMD [ "python", "src/api.py" ]

Logs:

2019-12-02 11:40:45.649503: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-02 11:40:45.653496: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-12-02 11:40:45.653999: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f969cf6a40 executing computations on platform Host. Devices:
2019-12-02 11:40:45.654045: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
TensorFlow version is  2.0.0
 * Serving Flask app "api" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Doesnt work:

CMD [ "gunicorn", "--chdir", "src", "api:app" ]

Logs:

[2019-12-02 11:39:22 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2019-12-02 11:39:22 +0000] [1] [INFO] Listening at: http://127.0.0.1:8000 (1)
[2019-12-02 11:39:22 +0000] [1] [INFO] Using worker: sync
[2019-12-02 11:39:22 +0000] [9] [INFO] Booting worker with pid: 9
2019-12-02 11:39:24.041188: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-02 11:39:24.046495: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-12-02 11:39:24.047129: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5623e18b5200 executing computations on platform Host. Devices:
2019-12-02 11:39:24.047183: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version

Also, I've made the repository open so you can poke around if you want.
Could be helpful

https://gitlab.com/tlaanemaa/image-classifier

Listening at: http://127.0.0.1:8000 (1)

Could the problem be that gunicorn is listening to localhost inside a container so it cannot be reached from the outside?

I dont think so because the flask app was doing the same and that was working.
Also, the gunicorn version doesn't log tensorflow version which sort of suggests that the problem happens fore that log line in code. When running without gunicorn, just flask, then it logs that.
TensorFlow version is 2.0.0

what does it say on debug level?

@tlaanemaa how is your Docker daemon networking configured? Per comment from @CaselIT it seems likely that your client isn't able to reach the Gunicorn port over the Docker network.

Can you try starting Gunicorn with the arg -b 0.0.0.0:8000?

I dont think the problem lies in the network because it seems, from the logs at least, that the server is not starting at all since it never hits log lines that come after tensorflow import

Nevertheless I tried your suggestion but it gives me an error

CMD [ "gunicorn", "-b", "0.0.0.0:8000", "--chdir", "src", "api:app" ]

_Log_

usage: gunicorn [OPTIONS] [APP_MODULE]
gunicorn: error: unrecognized arguments: -d

If you want to try yourself then the comtainer image is available at registry.gitlab.com/tlaanemaa/image-classifier

@tlaanemaa can you repost your updated Dockerfile, image build command and container run command?

@javabrett Sure

_Dockerfile at the time of posting:_

FROM python:3.7

RUN pip install gunicorn

WORKDIR /usr/src/app

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD [ "gunicorn", "-b", "0.0.0.0:8000", "--chdir", "src", "api:app" ]

what's the full log of docker, can you paste the command line it is finally using?

Unless it is doing something that can't be foregone during debug of this issue, can you run it without Portainer for now?

This works for me, Docker Desktop for Mac 2.1.0.5:

docker build -t tlaanemaa/image-classifier .
docker run -it --rm -p 8000:8000 tlaanemaa/image-classifier

Accepts POST requests.

Please run and post full output and result.

I tried it and it works now.
Could it have been the -b flag that fixed it?

Thanks alot!

Whats interesting now tho is that when I do POST requests then thpose are fast but GET requests are super slow. After a while of doing GET requests these get fast but then POST gets super slow and the worker times out. Once it responds to that POST, POSTs are fast again and GETs are slow. It seems as if it can do one fast and it takes time for it to switch :D

these are the logs when GET is fast and POST is slow because worker times out:

[2020-01-10 09:34:46 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:72)
[2020-01-10 09:34:46 +0000] [72] [INFO] Worker exiting (pid: 72)
[2020-01-10 09:34:47 +0000] [131] [INFO] Booting worker with pid: 131
TensorFlow version is  2.0.0
2020-01-10 09:34:48.946351: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-10 09:34:48.951124: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2020-01-10 09:34:48.951612: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56481dbabd80 executing computations on platform Host. Devices:
2020-01-10 09:34:48.951665: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version

 * Server ready!

Also, in some situations the * Server ready! log doesnt seem to come through in docker logs. That could have been misleading too. Not sure what's causing that tho

The current server in your Docker will be configured single/sync threaded, which will be trivial to make busy/blocking, so it is likely you are seeing that. Try adding some args like --workers=2 --threads=4 --worker-class=gthread.

Thanks @javabrett
That fixed it!

Had the same issue. As far as I can guess from my own logs, it looks like tensorflow is using gevent, and you cannot use gevent at the same time in gunicorn. The --workers and --threads flags don't make any difference for me, but changing from --worker-class=gevent to --worker-class=gthread fixed the issue for me. Thanks @javabrett

Hi! As both the maintainer of gevent and a contributor to this project I can categorically state that gevent and gunicorn work well together. Various libraries may interfere, but that鈥檚 not the fault of either gunicorn or gevent. Please open a new issue if that鈥檚 not the case for you. Thanks!

Was this page helpful?
0 / 5 - 0 ratings