Aws-sdk-net: AmazonLambdaClient A task was canceled.

Created on 22 Jan 2018  路  14Comments  路  Source: aws/aws-sdk-net

I am using 3.3.10 AWSSDK.Lambda.AmazonLambdaClient from Lambda .net in the following way

public class MyHandler
{
    private readonly AmazonLambdaClient _lambdaClient;
    public MyHandler()
    {
        _lambdaClient = new AmazonLambdaClient("us-west-2");
    }

    public Task HandleAsync(KinesisEvent kinesisEvent, ILambdaContext context)
    {
        return Task.WhenAll(kinesisEvent.Records.Select(r => ProcessStreamAsync(r.Kinesis.Data)));
    }


    private async Task ProcessStreamAsync(Stream stream)
    {
        //deserialize payload
        var envelope = JsonUtil.DeserializeFromStream<Envelope>(stream);

        try
        {
            switch (envelope.Name)
            {
                case MessageNames.MyFunctionName:
                    await _lambdaClient.InvokeAsync(new InvokeRequest {FunctionName = "MyFunctionName", Payload = envelope.JsonData});
                    break;
            }
        }
        catch (Exception ex)
        {
            //log exception
        }
    }
}

In the catch block of the above code I log out any exceptions and occasionally I will see a batch of "A task was canceled." exceptions come through. When I say a batch I mean for a 5 minute period I will see this in my logs 30-70 times, then it will go away for several hours or days. The stack trace is below. I don't have any throttles or time out events in cloudwatch when this happens. From reading the documentation it was my assumption that I was doing a "fire and forget" invocation of my lambda function therefore my calling function will not stick around and wait for a response. This started happening as of about the first of the year. In 2017 I didn't have any of these errors and I haven't changed the way I invoke lambda functions.

Not sure why my task is being canceled or what I should do about it. Any advice here would be appreciated.

StackTrace:   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Net.Http.HttpClient.<FinishSendAsync>d__58.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.HttpWebRequestMessage.<GetResponseAsync>d__20.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.HttpHandler`1.<InvokeAsync>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.Unmarshaller.<InvokeAsync>d__3`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.ErrorHandler.<InvokeAsync>d__5`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.CallbackHandler.<InvokeAsync>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.CredentialsRetriever.<InvokeAsync>d__7`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.RetryHandler.<InvokeAsync>d__10`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Amazon.Runtime.Internal.RetryHandler.<InvokeAsync>d__10`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.CallbackHandler.<InvokeAsync>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.CallbackHandler.<InvokeAsync>d__9`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.ErrorCallbackHandler.<InvokeAsync>d__5`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Amazon.Runtime.Internal.MetricsHandler.<InvokeAsync>d__1`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at MyNamespace.MyHandler.<ProcessStreamAsync>d__4.MoveNext()
guidance

Most helpful comment

I'm seeing this issue when executing a lambda function from a .NET Standard 2.0 project, referenced and executed from a .NET Core 2.0 project. On .NET Core, I get a TaskCanceledException, but the same code in a .NET Framework 4.7 project, referenced and executed from a .NET Framework 4.7 results in a detailed exception explaining that the lambda function timed out.

I believe, given the timings matching up with the function execution time limit, that the .NET Core TaskCanceledException is thrown in response to the lambda function timing out.

My code for .NET Core and .NET Framework is as follows:

public async Task<ExtractImageResponse> ExtractImageAsync(ExtractImageRequest request, CancellationToken cancellationToken = default(CancellationToken))
{
    var invokeRequest = new InvokeRequest() { FunctionName = FUNCTION_NAME };
    invokeRequest.Payload = JsonConvert.SerializeObject(request, _serializerSettings);

    try
    {
        var response = await _lambdaClient.InvokeAsync(invokeRequest, cancellationToken);
        return JsonConvert.DeserializeObject<ExtractImageResponse>(ReadPayload(response), _serializerSettings);
    }
    catch (AmazonLambdaException e)
    {
        throw new ExtractImageException($"Unable to extract image: {e.Message}", e);
    }
}

AWSSDK.Lambda version in .NET Core project - 3.3.19.4
AWSSDK version in .NET Framework project - 2.3.55.2
.NET Core project runs in Docker using the microsoft/dotnet:2.0-runtime base image.

Edit: I've updated to 3.3.19.9 and lowered my timeout to 10 seconds on .NET Core and now I get {"errorMessage":"2019-01-09T04:39:04.174Z 77996878-13c8-11e9-a3b5-0d9f71941609 Task timed out after 10.01 seconds"} - I don't know if this is because this fixes the issue or the shortened timeout. Normally my timeout is 2 minutes because I'm using Lambda to extract a thumbnail from a video. It's at the 2-minute mark on .NET Core that I see TaskCanceledException. Unfortunately, the issue occurs sporadically so I can't be sure that this fixes it until I've waited for some time to see if it reoccurs.

All 14 comments

This seems to be related to https://github.com/dotnet/corefx/issues/25800 and https://github.com/aws/aws-sdk-net/issues/796. We are still investing this class of problems that's happening in Linux environment.

Thanks for the update. I also sometimes get the error below instead of "task canceled". Guess this is related?

"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details."

Is there anything I can do on my end to fix this? Should I just retry the task?

I discussed this issue with my team. Could we get some more context? I am not sure if it's actually related to the corefx issue I linked above. The problem we think maybe caused by the usage of Task.WhenAll which creates way too many simultaneous http connections. Can you try logging how many Records are in per kinesisevent you receive?

Sorry for the delay in reply was on holiday. My lambda handler is setup to ingest at maximum 10 Kinesis events at a time. So at the most there would have been 10 happening at the same time. Last few times I got "A task has been canceled" error in the logs there were 2-3 records being processed.

Every time this function runs I log record count so it is easy to match up the Task is canceled error in the logs with the cloud watch record count at that time.

not sure if this is helpful, but i am using a 512mb lambda memory size. It is in the default VPC and it has 1 subnet. I have configured that subnet for outbound internet via NAT gateway.

I am happy to debug or trouble shoot this for you guys, I get this error a few times each day.

I am guessing this is running on Lambda .NET Core 1.0? Also, if I am reading the code correctly, there's a parent lambda function calling a bunch of child lambda functions. The exception is happening in the parent lambda function?

Yes I am using .NET Core 1.0, I didn't realize 2.0 was out. Any chance 2.0 will fix this? I have been waiting to upgrade to 2.0 for quite some time now so I will probably do that this week.

Yes where the comment says //Log Exception is where the Task has been canceled error is being called. The parent lambda function is where the exception is being thrown. This parent lambda function is invoked from a Kinesis stream and some of the messages are handled by the function, and others that are longer running it kicks off to another lambda function via async invocation. This async invocation is where the task being canceled error occurs. Sometimes instead of a task being canceled exception i will get "The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.".

I upgraded to .NET Core 2.0 and I still have this issue intermittently. It's been happening on one of my handlers a lot today. Started today around April 25th 2018, 19:58:28.054 UTC and is still ongoing as of April 26th 2018, 00:59:19.849 UTC

Any ideas or anything else I could be doing differently?

I'm also experiencing this issue intermittently. I'm calling InvokeAsync from within a policy handler which returns an Task.

I'm also facing this issue, not sure about the root cause,

Do you have an update on the resolution?

No. This seems to be an intermittent issue and we are having difficulty narrowing it down. If there's a consistent repro, it would help.

Also, if you could capture the request IDs for the invocation, we could take that to the service team to try to see what's going on service-side.

I'm seeing this issue when executing a lambda function from a .NET Standard 2.0 project, referenced and executed from a .NET Core 2.0 project. On .NET Core, I get a TaskCanceledException, but the same code in a .NET Framework 4.7 project, referenced and executed from a .NET Framework 4.7 results in a detailed exception explaining that the lambda function timed out.

I believe, given the timings matching up with the function execution time limit, that the .NET Core TaskCanceledException is thrown in response to the lambda function timing out.

My code for .NET Core and .NET Framework is as follows:

public async Task<ExtractImageResponse> ExtractImageAsync(ExtractImageRequest request, CancellationToken cancellationToken = default(CancellationToken))
{
    var invokeRequest = new InvokeRequest() { FunctionName = FUNCTION_NAME };
    invokeRequest.Payload = JsonConvert.SerializeObject(request, _serializerSettings);

    try
    {
        var response = await _lambdaClient.InvokeAsync(invokeRequest, cancellationToken);
        return JsonConvert.DeserializeObject<ExtractImageResponse>(ReadPayload(response), _serializerSettings);
    }
    catch (AmazonLambdaException e)
    {
        throw new ExtractImageException($"Unable to extract image: {e.Message}", e);
    }
}

AWSSDK.Lambda version in .NET Core project - 3.3.19.4
AWSSDK version in .NET Framework project - 2.3.55.2
.NET Core project runs in Docker using the microsoft/dotnet:2.0-runtime base image.

Edit: I've updated to 3.3.19.9 and lowered my timeout to 10 seconds on .NET Core and now I get {"errorMessage":"2019-01-09T04:39:04.174Z 77996878-13c8-11e9-a3b5-0d9f71941609 Task timed out after 10.01 seconds"} - I don't know if this is because this fixes the issue or the shortened timeout. Normally my timeout is 2 minutes because I'm using Lambda to extract a thumbnail from a video. It's at the 2-minute mark on .NET Core that I see TaskCanceledException. Unfortunately, the issue occurs sporadically so I can't be sure that this fixes it until I've waited for some time to see if it reoccurs.

I'm closing this out due to inactivity. There have been some updates to retry logic and many other areas of the codebase that may be helpful in resolving your issues. If you are still experiencing this frequently and think you can provide more guidance on reproduction steps, please open a new issue.

Also note that Lambda max timeouts have been increased since this issue was created. It seems some of your functions might just be long-running.

When seeing similar errors, it may be due to a lack of a needed VPC endpoint, but I wouldn't expect it to work some of the time if that was your problem.

Was this page helpful?
0 / 5 - 0 ratings