The current timeout for InvokeEndpoint is 60 seconds as specified here: https://docs.aws.amazon.com/en_pv/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html
Is there any way we can increase this limit, to say 120 seconds?
*EDIT
Just to be clear, I was able to keep the process on the server running by passing an environment variable in the Model definition like so
model = MXNetModel(..., env = {'SAGEMAKER_MODEL_SERVER_TIMEOUT' : '300' })
Through CloudWatch, I was able to confirm that the task is still running even after 60 seconds. (For my usecase, I am processing a video frame by frame) My question is however, on the client side I am receiving this kind of error due to the timeout
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/nightingale-pose-estimation in account 552571371228 for more information.: ModelError
Sagemaker model hosting engineer here. Thanks for your interest in our product!
With respect to your question -- currently, it is not possible to increase the the 60 seconds timeout.
@wenzhaoAtAws Are there any plans in the future to allow customers to increase inference timeout?
Also, I notice that in the response object of
response = sagemaker_client.invoke_endpoint(EndpointName='pose-estimation',Body=request_body)
print(response)
This is the log
{'ResponseMetadata': {'RequestId': 'a0343e2a-5390-4af0-a7fd-ef63d576ca45', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'a0343e2a-5390-4af0-a7fd-ef63d576ca45', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Tue, 12 Nov 2019 01:30:32 GMT', 'content-type': 'application/json', 'content-length': '18'}, 'RetryAttempts': 2}, 'ContentType': 'application/json', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x10c60e358>}
Is it possible for me to make the RetryAttempts to 0?
For disabling retries, you should be able to do something like the following (please note I havent tested this code myself, it serves as a reference):
import boto3
from botocore.config import Config
from sagemaker.session import Session
config = Config(
read_timeout=80,
retries={
'max_attempts': 0
}
)
sagemaker_runtime_client = boto3.client('sagemaker-runtime', config=config)
sagemaker_client = Session(sagemaker_runtime_client=sagemaker_runtime_client)
See:
In regards to the feature request, one option is to use SageMaker's batch transform option (https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html). May not fit your use case though...
@ajaykarpur Any update on this? Switching to batch transforms doesn鈥榯 seem to be interesting for video input.
Most helpful comment
For disabling retries, you should be able to do something like the following (please note I havent tested this code myself, it serves as a reference):
See:
In regards to the feature request, one option is to use SageMaker's batch transform option (https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html). May not fit your use case though...