Aws-lambda-dotnet: Debugging Invocation Errors that don't appear in the logs

Created on 16 Mar 2018 · 24Comments · Source: aws/aws-lambda-dotnet

Does anyone have ideas how to debug invocation errors that don't appear in CloudWatch logs?

I am seeing cases where a lambda seemingly randomly will fail to invoke 1 to 7 times, incrementing the CloudWatch lambda error count, but no invocation (START, END, or REPORT) appears in the CloudWatch logs for the lambda. Nothing appears in the deadletter queue either.

I have 40ish similar lambdas and 8 of them had this same behavior at very similar times. These failures happen very infrequently, but when I do see them it is always in a similar pattern: multiple lambdas, cloudwatch error counters > 0, nothing in the cloudwatch logs.

I don't think this is a permissions issue with the lambda's ability to write to the logs as it will invoke correctly and START, END, REPORT etc do appear in the CloudWatch logs.

I assume it must be some issue setting up the environment -- the stuff that happens before invoke. How can I get to the bottom of this?

These lambdas are all .net core 1.0.

closed-for-staleness guidance modullambda-client-lib response-requested

Source

PaulColeman

👍1

Most helpful comment

This was happening to me as well in python 3.8.

Use the following query in Log Insights:
fields @timestamp, @message
| filter @message like "Process exited before completing request"
| sort @timestamp asc
| limit 20

It might be a memory problem causing the error. A timeout can also cause an error in lambda and you have to used a different query to find it.

marcioemiranda on 29 Jul 2020

👍2

All 24 comments

Would it be possible to narrow down the the time window when these happen and provide the account id used?

normj on 17 Mar 2018

Thanks Norm. On one occasion it happened between 2018-03-15 19:56:00
and 2018-03-15
20:13:00 UTC. I think it happen all clustered within seconds but I'm not
exactly sure where in that range it happened.

688458520130 is the id.

PaulColeman on 17 Mar 2018

Any update on this? Thanks for investigating.

On Sat, Mar 17, 2018, 4:28 AM Paul Coleman paul.coleman@gmail.com wrote:

Thanks Norm. It happened between 2018-03-15 19:56:00 and 2018-03-15
20:07:00 UTC. I think it happen all clustered within a few seconds but
I'm not exactly sure where in that range it happened.

The account id is 688458520130

On Sat, Mar 17, 2018 at 7:04 AM Norm Johanson notifications@github.com
wrote:

Would it be possible to narrow down the the time window when these happen
and provide the account id used?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/aws/aws-lambda-dotnet/issues/245#issuecomment-373899687,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABBxJcf-UXUu8GvymDPukY2NTvsoB7EDks5tfLWEgaJpZM4Stiw2
.

PaulColeman on 29 Mar 2018

Any updates on this issue? I'm also struggling with exactly the same issue. Invocation error reported, but nothing in the CloudWatch logs.

paul-zah on 30 Apr 2018

@PaulColeman and @paul-zah
Can you provide minimal code repros for your functions that are dying?
If you're able to isolate the code that's causing the issue it would help a lot.

vellozzi on 30 Apr 2018

I'm experiencing the same issue, load testing my function, I have an error count quite high, but cannot find any exceptions in my Lambda logs.

mbp on 18 May 2018

Just a thought - It might be worth trying to move the logging up to the LambdaEntryPoint (i.e. Program.cs) to handle any errors that might be thrown during bootstrapping the WebHost. I do this with serilog in non-lambda services following their guidance here (try / catch logging around program.cs): https://github.com/serilog/serilog-aspnetcore/blob/dev/samples/SimpleWebSample/Program.cs#L13

wv-jtowers on 25 May 2018

Noticing the same issue. All invocations from CloudWatch failing which causes the a spike in the invocation errors monitoring tab but errors do not appear in logs.

Grif-fin on 29 May 2018

I am seeing a similar issue where the invocation count is increasing infrequently and the cloudwatch logs don't have any error logs. Is this root caused?

SejalChauhan on 30 May 2018

I am facing the same issue. There are logs which does not show any error and seems working as expected. But I can see the Cloudwatch alarm for errors triggering up when the lambda is invoked.

VineeC on 27 Jun 2018

Hi Guys. I am facing the same issue. Any updates on this?

facundovs on 1 Nov 2018

This may be an issue with Lambda itself - I've been noticing this behavior for months now. I dont use dotnet, instead use node and aws-sdk.

You will see errors in the lambda Monitoring dashboard and clicking through time range logs you will see no trace of the error. This in my opinion is one of the internal lambda "quirks", similar to idempotency issues in aws lambda (where you cant guarantee your function will run exactly once... it can run multiple times, seconds/mins apart even when there is no error detected) - like the idempotency issue you will need to do some defensive coding in your app to account for internal errors, make sure you do proper error handling in your code and catch/throw errors with proper log tracing.

If you then see "internal errors" that seem to happen outside your error handling you should be able to discount them as anomalies or false positives as you are confident in your error handling coverage. (not ideal but one of the quirks of serverless computing, the issue is one someone else server :)

newbreedofgeek on 4 Mar 2019

I am seeing the same thing - on node 10 lambdas. The alarms are triggered on errors crossing a threshold, but there are no errors in the logs. I end up wasting quite a lot of time checking false positives, and I can't see a reason why this could be the desired behaviour, so would be great if the lambda team could improve this area.

matthewdenobrega on 8 Aug 2019

👍1

Happened to me as well...
using python.
My lambda is triggered every second, so up until now I was sure I cannot find it because I have so many logs and I am not looking for the right filter... never thought that there are simply no logs...
However, I dont think it is random. It usually happens when there are problems in the DB...

aya-givati on 29 Aug 2019

This happened to me as well. I have java sdk lambdas in 2 separate regions and both of them generated error metrics from 6:45 - 7:10 AM CDT but there are no ERROR logs in cloudwatch.

nidhi7 on 2 Oct 2019

Other stuff to look at:

API-GW logs
make sure lambda payload (request and response) do not exceed the limit
limit of concurrent lambda executions
etc...

adrai on 2 Oct 2019

This was happening to me as well in python 3.8.

Use the following query in Log Insights:
fields @timestamp, @message
| filter @message like "Process exited before completing request"
| sort @timestamp asc
| limit 20

It might be a memory problem causing the error. A timeout can also cause an error in lambda and you have to used a different query to find it.

marcioemiranda on 29 Jul 2020

👍2

Hi @PaulColeman,

Good morning.

I was going through the issue backlog and came across this guidance question. Please let me know if this is still an issue or else if this could be closed.

Thanks,
Ashish

ashishdhingra on 14 Sep 2020

This issue has not recieved a response in 2 weeks. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

github-actions[bot] on 29 Sep 2020

still an issue...

aya-givati on 29 Sep 2020

👍1

Hi @PaulColeman @aya-givati,

Please have a look at the article How do I troubleshoot Lambda function failures? and let me know if it helps.

Thanks,
Ashish

ashishdhingra on 30 Sep 2020

Hi @ashishdhingra,
Thank you fir your response.
Unfortunately it did NOT help me.
my problem is that my "Error" metric Alert is on and I cannt find the lines in the log that explain why

aya-givati on 1 Oct 2020

@aya-givati I'm not sure what to recommend here since the invocation errors occur outside of .NET SDK. As explained in the documentation link I shared, for any code related errors, CloudWatch is the option. However for invocation errors, Cloudtrail could be the option. I would suggest to contact CloudWatch support for more details for troubleshooting. I will try to see if I could find any guidance, but this doesn't appears to be the .NET SDK issue.

I do see that you are using Python SDK. So this issue appears to be service specific, not a specific SDK issue. Were you able to get guidance from Python SDK team which might be helpful?

ashishdhingra on 1 Oct 2020

This issue has not recieved a response in 2 weeks. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

github-actions[bot] on 16 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings