Hello
I have a TensorFlow 2.0 project that has a tiny Flask API in front of it so I can make requests to the model through HTTP calls with data preprocessing already done in the API. I chose Gunicorn to run my Flask/TensorFlow application in a docker container. Sadly tho, the worker process that Gunicorn creates hangs in the container until its killed by Gunicorn. The server never comes up and I canno tmake requests to it. Moreover, the same Gunicorn setup works flawlessly outside docker, in my host machine.
Docker logs (It just hangs there and prints a timeout error after a long time)
[2019-10-03 18:03:05 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2019-10-03 18:03:05 +0000] [1] [INFO] Listening at: http://127.0.0.1:8000 (1)
[2019-10-03 18:03:05 +0000] [1] [INFO] Using worker: sync
[2019-10-03 18:03:05 +0000] [8] [INFO] Booting worker with pid: 8
2019-10-03 18:03:08.126584: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-03 18:03:08.130017: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3392000000 Hz
2019-10-03 18:03:08.130306: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55fbb23fb2d0 executing computations on platform Host. Devices:
2019-10-03 18:03:08.130365: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
dockerfile:
FROM python
RUN pip install gunicorn
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD [ "gunicorn", "--chdir", "src", "api:app" ]
api.py:
from flask import Flask, request
import inference
app = Flask(__name__)
@app.route('/', methods=['GET', 'POST'])
def predict():
if request.method == 'GET':
return 'POST a json payload of {"imageBase64": "base64base64base64"} to this address to predict.'
try:
result = inference.run(request.json['imageBase64'])
return result
except Exception as e:
return {'error': str(e)}, 500
if __name__ == "__main__":
app.run()
else:
print('\n * Server ready!')
inference.py
# Import packages
from __future__ import absolute_import, division, print_function, unicode_literals
import os
import tensorflow as tf
from tensorflow import keras
import PIL
import numpy as np
from io import BytesIO
import base64
import json
print("TensorFlow version is ", tf.__version__)
# Set variables
##########################################################################################
##########################################################################################
model_name = 'catsdogs'
base_dir = os.path.join(os.path.dirname(__file__), '..')
model_dir = os.path.join(base_dir, 'models')
##########################################################################################
##########################################################################################
# Load model
model = keras.models.load_model(os.path.join(model_dir, model_name + '.h5'))
# Load metadata
with open(os.path.join(model_dir, model_name + '_metadata.json')) as metadataFile:
metadata = json.load(metadataFile)
# Split metadata
labels = metadata['training_labels']
image_size = metadata['image_size']
# Exported function for inference
def run(imgBase64):
# Decode the base64 string
image = PIL.Image.open(BytesIO(base64.b64decode(imgBase64)))
# Pepare image
image = image.resize((image_size, image_size), resample=PIL.Image.BILINEAR)
image = image.convert("RGB")
# Run prediction
tensor = tf.cast(np.array(image), tf.float32) / 255.
tensor = tf.expand_dims(tensor, 0, name=None)
result = model.predict(tensor, steps=1)
# Combine result with labels
labeledResult = {}
for i, label in enumerate(labels):
labeledResult[label] = float(result[0][labels[label]])
return labeledResult
I've searched for a solution to this for ages and havent managed to come up with anything, any help would be hugely appreciated.
Thanks!
Is your Docker setup limiting the maximum memory available to the container?
Experiencing the same. I don't think Gunicorn is to blame though. I get the same error when running python3 api.py
from a bash shell in the container.
@tlaanemaa can you confirm what @mackdelany says?
Hey. Sorry for disappearing like that.
My setup is limiting Docker's RAM a little bit but the same thing happened even when I removed the limitation.
I will try out running the api file without gunicorn and r eport back.
Thanks!
@tlaanemaa any news about it?
@benoitc Heya
Sorry, I've been carried away with other stuff and havent had time to go further with this.
I'll try to poke this today and get back to you
So I tried running the app without gunicorn in the container and that worked.
Below is the CMD bit of my Dockerfile
Works:
CMD [ "python", "src/api.py" ]
Logs:
2019-12-02 11:40:45.649503: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-02 11:40:45.653496: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-12-02 11:40:45.653999: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f969cf6a40 executing computations on platform Host. Devices:
2019-12-02 11:40:45.654045: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
TensorFlow version is 2.0.0
* Serving Flask app "api" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Doesnt work:
CMD [ "gunicorn", "--chdir", "src", "api:app" ]
Logs:
[2019-12-02 11:39:22 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2019-12-02 11:39:22 +0000] [1] [INFO] Listening at: http://127.0.0.1:8000 (1)
[2019-12-02 11:39:22 +0000] [1] [INFO] Using worker: sync
[2019-12-02 11:39:22 +0000] [9] [INFO] Booting worker with pid: 9
2019-12-02 11:39:24.041188: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-02 11:39:24.046495: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-12-02 11:39:24.047129: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5623e18b5200 executing computations on platform Host. Devices:
2019-12-02 11:39:24.047183: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
Also, I've made the repository open so you can poke around if you want.
Could be helpful
Listening at: http://127.0.0.1:8000 (1)
Could the problem be that gunicorn is listening to localhost inside a container so it cannot be reached from the outside?
I dont think so because the flask app was doing the same and that was working.
Also, the gunicorn version doesn't log tensorflow version which sort of suggests that the problem happens fore that log line in code. When running without gunicorn, just flask, then it logs that.
TensorFlow version is 2.0.0
what does it say on debug level?
@tlaanemaa how is your Docker daemon networking configured? Per comment from @CaselIT it seems likely that your client isn't able to reach the Gunicorn port over the Docker network.
Can you try starting Gunicorn with the arg -b 0.0.0.0:8000
?
I dont think the problem lies in the network because it seems, from the logs at least, that the server is not starting at all since it never hits log lines that come after tensorflow import
Nevertheless I tried your suggestion but it gives me an error
CMD [ "gunicorn", "-b", "0.0.0.0:8000", "--chdir", "src", "api:app" ]
_Log_
usage: gunicorn [OPTIONS] [APP_MODULE]
gunicorn: error: unrecognized arguments: -d
If you want to try yourself then the comtainer image is available at registry.gitlab.com/tlaanemaa/image-classifier
@tlaanemaa can you repost your updated Dockerfile
, image build command and container run command?
@javabrett Sure
docker build -t tlaanemaa/image-classifier .
_Dockerfile at the time of posting:_
FROM python:3.7
RUN pip install gunicorn
WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD [ "gunicorn", "-b", "0.0.0.0:8000", "--chdir", "src", "api:app" ]
what's the full log of docker, can you paste the command line it is finally using?
Unless it is doing something that can't be foregone during debug of this issue, can you run it without Portainer for now?
This works for me, Docker Desktop for Mac 2.1.0.5:
docker build -t tlaanemaa/image-classifier .
docker run -it --rm -p 8000:8000 tlaanemaa/image-classifier
Accepts POST
requests.
Please run and post full output and result.
I tried it and it works now.
Could it have been the -b
flag that fixed it?
Thanks alot!
Whats interesting now tho is that when I do POST requests then thpose are fast but GET requests are super slow. After a while of doing GET requests these get fast but then POST gets super slow and the worker times out. Once it responds to that POST, POSTs are fast again and GETs are slow. It seems as if it can do one fast and it takes time for it to switch :D
these are the logs when GET is fast and POST is slow because worker times out:
[2020-01-10 09:34:46 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:72)
[2020-01-10 09:34:46 +0000] [72] [INFO] Worker exiting (pid: 72)
[2020-01-10 09:34:47 +0000] [131] [INFO] Booting worker with pid: 131
TensorFlow version is 2.0.0
2020-01-10 09:34:48.946351: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-10 09:34:48.951124: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2020-01-10 09:34:48.951612: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56481dbabd80 executing computations on platform Host. Devices:
2020-01-10 09:34:48.951665: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
* Server ready!
Also, in some situations the * Server ready!
log doesnt seem to come through in docker logs. That could have been misleading too. Not sure what's causing that tho
The current server in your Docker will be configured single/sync threaded, which will be trivial to make busy/blocking, so it is likely you are seeing that. Try adding some args like --workers=2 --threads=4 --worker-class=gthread
.
Thanks @javabrett
That fixed it!
Had the same issue. As far as I can guess from my own logs, it looks like tensorflow
is using gevent
, and you cannot use gevent
at the same time in gunicorn
. The --workers
and --threads
flags don't make any difference for me, but changing from --worker-class=gevent
to --worker-class=gthread
fixed the issue for me. Thanks @javabrett
Hi! As both the maintainer of gevent and a contributor to this project I can categorically state that gevent and gunicorn work well together. Various libraries may interfere, but that鈥檚 not the fault of either gunicorn or gevent. Please open a new issue if that鈥檚 not the case for you. Thanks!
Most helpful comment
Had the same issue. As far as I can guess from my own logs, it looks like
tensorflow
is usinggevent
, and you cannot usegevent
at the same time ingunicorn
. The--workers
and--threads
flags don't make any difference for me, but changing from--worker-class=gevent
to--worker-class=gthread
fixed the issue for me. Thanks @javabrett