Sagemaker-python-sdk: Increasing the timeout for InvokeEndpoint

Created on 11 Nov 2019 · 4Comments · Source: aws/sagemaker-python-sdk

The current timeout for InvokeEndpoint is 60 seconds as specified here: https://docs.aws.amazon.com/en_pv/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html

Is there any way we can increase this limit, to say 120 seconds?

*EDIT

Just to be clear, I was able to keep the process on the server running by passing an environment variable in the Model definition like so

 model = MXNetModel(..., env = {'SAGEMAKER_MODEL_SERVER_TIMEOUT' : '300' })

Through CloudWatch, I was able to confirm that the task is still running even after 60 seconds. (For my usecase, I am processing a video frame by frame) My question is however, on the client side I am receiving this kind of error due to the timeout

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/nightingale-pose-estimation in account 552571371228 for more information.: ModelError

feature request

Source

velociraptor111

👍7

Most helpful comment

For disabling retries, you should be able to do something like the following (please note I havent tested this code myself, it serves as a reference):

import boto3
from botocore.config import Config
from sagemaker.session import Session

config = Config(
    read_timeout=80,
    retries={
        'max_attempts': 0
    }
)
sagemaker_runtime_client = boto3.client('sagemaker-runtime', config=config)
sagemaker_client = Session(sagemaker_runtime_client=sagemaker_runtime_client)

See:

In regards to the feature request, one option is to use SageMaker's batch transform option (https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html). May not fit your use case though...

sanjams2 on 12 Dec 2019

👍2

All 4 comments

Sagemaker model hosting engineer here. Thanks for your interest in our product!

With respect to your question -- currently, it is not possible to increase the the 60 seconds timeout.

wenzhaoAtAws on 11 Nov 2019

@wenzhaoAtAws Are there any plans in the future to allow customers to increase inference timeout?

Also, I notice that in the response object of

response = sagemaker_client.invoke_endpoint(EndpointName='pose-estimation',Body=request_body)
print(response)

This is the log

{'ResponseMetadata': {'RequestId': 'a0343e2a-5390-4af0-a7fd-ef63d576ca45', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'a0343e2a-5390-4af0-a7fd-ef63d576ca45', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Tue, 12 Nov 2019 01:30:32 GMT', 'content-type': 'application/json', 'content-length': '18'}, 'RetryAttempts': 2}, 'ContentType': 'application/json', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x10c60e358>}

Is it possible for me to make the RetryAttempts to 0?

velociraptor111 on 12 Nov 2019

For disabling retries, you should be able to do something like the following (please note I havent tested this code myself, it serves as a reference):

import boto3
from botocore.config import Config
from sagemaker.session import Session

config = Config(
    read_timeout=80,
    retries={
        'max_attempts': 0
    }
)
sagemaker_runtime_client = boto3.client('sagemaker-runtime', config=config)
sagemaker_client = Session(sagemaker_runtime_client=sagemaker_runtime_client)

See:

In regards to the feature request, one option is to use SageMaker's batch transform option (https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html). May not fit your use case though...

sanjams2 on 12 Dec 2019

👍2

@ajaykarpur Any update on this? Switching to batch transforms doesn‘t seem to be interesting for video input.