I am trying to deploy a logistic regression model with sagemaker sklearn. When I train with 1/10 of the data I can deploy without problem using the commands below. When I train with all the data, the training is OK and my model is around 800mo . But the deployment is falling with these erros
"in the jupyter notebook"
ValueError: Error hosting endpoint sagemaker-scikit-learn-2019-01-17-12-59-16-371: Failed Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.
"in the clouwatch console"
2019/01/17 14:29:00 [error] 25#25: *47 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.32.0.2, server: , request: "GET /ping HTTP/1.1", upstream: "http://unix:/tmp/gunicorn.sock/ping", host: "model.aws.local:8080"
script_path = 'sklearn_sentiment.py'
sklearn_preprocessor = SKLearn(
entry_point=script_path,
role=role,
train_instance_type="ml.m4.4xlarge",
sagemaker_session=sagemaker_session)
sklearn_preprocessor.fit({'train' : data_location})
predictor = sklearn_preprocessor.deploy(initial_instance_count=1, instance_type="ml.c5.4xlarge")
Hello,
Sorry for the delay in response. Is the problem you are experiencing reproducible? Do you happen to have data/entry point script that we can use to reproduce the problem?
In my experience, the time out issues are usually related to OOM issues, but based on the size of the model (800mb) and the amount of memory in the chosen instance, it may be due to something else.
Hello,
I have a similar problem.
I tried different instances but it seems to be no OOM problem. I guess it's related to loading time of the model.
The important question (I was not able to find an answer yet):
Is it possible to load models "asynchronously" after passing the health check but before invoking the endpoint? I'm having problems loading large models into memory fast enough. This seems to be an absurd limitation.
Maybe I got something wrong but I was not able to find any information related to this.
Any thoughts on that?
Thanks a lot :)
Hi,
I got the same issue with a smaller model of ~288MB, when trying to deploy it on a ml.m4.2xlarge. Note that my entr_point also downloads some nltk resources (stopwords and worldnet) at "the root" of the .py file. But when I deployed it on a ml.m5.12xlarge it worked.
According to the logs (when it fails deploying with the timeout), it happens when loading the model. Note that import, hyperameters, download nltk resources, loading model... are log messages I added...
...
2020-08-11 16:26:05.458,"2020-08-11 16:26:04,910 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)"
2020-08-11 16:26:05.459,import
2020-08-11 16:26:06.485,hyperameters
2020-08-11 16:26:06.485,download nltk resources
2020-08-11 16:26:09.581,create ntlk instances
2020-08-11 16:26:09.581,loading model...
2020-08-11 16:26:10.585,"2020/08/11 16:26:09 [error] 22#22: *193 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.32.0.2, server: , request: ""GET /ping HTTP/1.1"", upstream: ""http://unix:/tmp/gunicorn.sock/ping"", host: ""model.aws.local:8080"""
2020-08-11 16:26:10.585,"10.32.0.2 - - [11/Aug/2020:16:26:09 +0000] ""GET /ping HTTP/1.1"" 504 192 ""-"" ""AHC/2.0"""
2020-08-11 16:26:15.609,"2020/08/11 16:26:14 [error] 22#22: *195 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.32.0.2, server: , request: ""GET /ping HTTP/1.1"", upstream: ""http://unix:/tmp/gunicorn.sock/ping"", host: ""model.aws.local:8080"""
2020-08-11 16:26:15.609,"10.32.0.2 - - [11/Aug/2020:16:26:14 +0000] ""GET /ping HTTP/1.1"" 504 192 ""-"" ""AHC/2.0"""
2020-08-11 16:26:20.621,"2020/08/11 16:26:19 [error] 22#22: *197 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.32.0.2, server: , request: ""GET /ping HTTP/1.1"", upstream: ""http://unix:/tmp/gunicorn.sock/ping"", host: ""model.aws.local:8080"""
2020-08-11 16:26:20.621,"10.32.0.2 - - [11/Aug/2020:16:26:19 +0000] ""GET /ping HTTP/1.1"" 504 192 ""-"" ""AHC/2.0"""
2020-08-11 16:26:25.641,"2020/08/11 16:26:24 [error] 22#22: *199 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.32.0.2, server: , request: ""GET /ping HTTP/1.1"", upstream: ""http://unix:/tmp/gunicorn.sock/ping"", host: ""model.aws.local:8080"""
2020-08-11 16:26:25.641,"10.32.0.2 - - [11/Aug/2020:16:26:24 +0000] ""GET /ping HTTP/1.1"" 504 192 ""-"" ""AHC/2.0"""
2020-08-11 16:26:30.701,"2020/08/11 16:26:29 [error] 22#22: *201 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.32.0.2, server: , request: ""GET /ping HTTP/1.1"", upstream: ""http://unix:/tmp/gunicorn.sock/ping"", host: ""model.aws.local:8080"""
2020-08-11 16:26:30.701,"10.32.0.2 - - [11/Aug/2020:16:26:29 +0000] ""GET /ping HTTP/1.1"" 504 192 ""-"" ""AHC/2.0"""
2020-08-11 16:26:31.857,[2020-08-11 16:26:31 +0000] [50] [CRITICAL] WORKER TIMEOUT (pid:688)
2020-08-11 16:26:32.861,"2020-08-11 16:26:31,865 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)"
2020-08-11 16:26:32.861,loading model...
2020-08-11 16:26:32.861,[2020-08-11 16:26:32 +0000] [765] [INFO] Booting worker with pid: 765
2020-08-11 16:26:33.865,"2020-08-11 16:26:33,112 INFO - sagemaker-containers - No GPUs detected (normal if no gpus installed)"
2020-08-11 16:26:33.865,import
2020-08-11 16:26:34.869,hyperameters
2020-08-11 16:26:34.869,download nltk resources
....
Note that the log message loading model... appears but model loaded is never reached:
def model_fn(model_dir):
"""Loads the model for deployment
model_dir: (sting) specifies location of saved model
"""
print("loading model...", model_dir)
model = joblib.load(os.path.join(model_dir, "model.joblib"))
print("...model loaded")
return model
The fact that it works on a ml.m5.12xlarge but not on a smaller instance makes me think that there is a "race condition": the healthcheck is started but the healthcheck endpoint is not ready yet because the container is still loading the model...
I hope this helps.
=====
Additional information: I realized that .deploy() accepts model_server_workers which defaults to the number of vCPU (number of gunicorn workers I assume). I was able to deploy my model (size ~280MB) using model_server_workers=2 on a ml.m5.4xlarge.
Most helpful comment
Hello,
I have a similar problem.
I tried different instances but it seems to be no OOM problem. I guess it's related to loading time of the model.
The important question (I was not able to find an answer yet):
Is it possible to load models "asynchronously" after passing the health check but before invoking the endpoint? I'm having problems loading large models into memory fast enough. This seems to be an absurd limitation.
Maybe I got something wrong but I was not able to find any information related to this.
Any thoughts on that?
Thanks a lot :)