Aws-lambda-dotnet: Debugging Invocation Errors that don't appear in the logs

Created on 16 Mar 2018  Â·  24Comments  Â·  Source: aws/aws-lambda-dotnet

Does anyone have ideas how to debug invocation errors that don't appear in CloudWatch logs?

I am seeing cases where a lambda seemingly randomly will fail to invoke 1 to 7 times, incrementing the CloudWatch lambda error count, but no invocation (START, END, or REPORT) appears in the CloudWatch logs for the lambda. Nothing appears in the deadletter queue either.

I have 40ish similar lambdas and 8 of them had this same behavior at very similar times. These failures happen very infrequently, but when I do see them it is always in a similar pattern: multiple lambdas, cloudwatch error counters > 0, nothing in the cloudwatch logs.

I don't think this is a permissions issue with the lambda's ability to write to the logs as it will invoke correctly and START, END, REPORT etc do appear in the CloudWatch logs.

I assume it must be some issue setting up the environment -- the stuff that happens before invoke. How can I get to the bottom of this?

These lambdas are all .net core 1.0.

closed-for-staleness guidance modullambda-client-lib response-requested

Most helpful comment

This was happening to me as well in python 3.8.

Use the following query in Log Insights:
fields @timestamp, @message
| filter @message like "Process exited before completing request"
| sort @timestamp asc
| limit 20

It might be a memory problem causing the error. A timeout can also cause an error in lambda and you have to used a different query to find it.

All 24 comments

Would it be possible to narrow down the the time window when these happen and provide the account id used?

Thanks Norm. On one occasion it happened between 2018-03-15 19:56:00
and 2018-03-15
20:13:00
UTC. I think it happen all clustered within seconds but I'm not
exactly sure where in that range it happened.

688458520130 is the id.

Any update on this? Thanks for investigating.

On Sat, Mar 17, 2018, 4:28 AM Paul Coleman paul.coleman@gmail.com wrote:

Thanks Norm. It happened between 2018-03-15 19:56:00 and 2018-03-15
20:07:00
UTC. I think it happen all clustered within a few seconds but
I'm not exactly sure where in that range it happened.

The account id is 688458520130

On Sat, Mar 17, 2018 at 7:04 AM Norm Johanson notifications@github.com
wrote:

Would it be possible to narrow down the the time window when these happen
and provide the account id used?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/aws/aws-lambda-dotnet/issues/245#issuecomment-373899687,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABBxJcf-UXUu8GvymDPukY2NTvsoB7EDks5tfLWEgaJpZM4Stiw2
.

Any updates on this issue? I'm also struggling with exactly the same issue. Invocation error reported, but nothing in the CloudWatch logs.

@PaulColeman and @paul-zah
Can you provide minimal code repros for your functions that are dying?
If you're able to isolate the code that's causing the issue it would help a lot.

I'm experiencing the same issue, load testing my function, I have an error count quite high, but cannot find any exceptions in my Lambda logs.

Just a thought - It might be worth trying to move the logging up to the LambdaEntryPoint (i.e. Program.cs) to handle any errors that might be thrown during bootstrapping the WebHost. I do this with serilog in non-lambda services following their guidance here (try / catch logging around program.cs): https://github.com/serilog/serilog-aspnetcore/blob/dev/samples/SimpleWebSample/Program.cs#L13

Noticing the same issue. All invocations from CloudWatch failing which causes the a spike in the invocation errors monitoring tab but errors do not appear in logs.

I am seeing a similar issue where the invocation count is increasing infrequently and the cloudwatch logs don't have any error logs. Is this root caused?

I am facing the same issue. There are logs which does not show any error and seems working as expected. But I can see the Cloudwatch alarm for errors triggering up when the lambda is invoked.

Hi Guys. I am facing the same issue. Any updates on this?

This may be an issue with Lambda itself - I've been noticing this behavior for months now. I dont use dotnet, instead use node and aws-sdk.

You will see errors in the lambda Monitoring dashboard and clicking through time range logs you will see no trace of the error. This in my opinion is one of the internal lambda "quirks", similar to idempotency issues in aws lambda (where you cant guarantee your function will run exactly once... it can run multiple times, seconds/mins apart even when there is no error detected) - like the idempotency issue you will need to do some defensive coding in your app to account for internal errors, make sure you do proper error handling in your code and catch/throw errors with proper log tracing.

If you then see "internal errors" that seem to happen outside your error handling you should be able to discount them as anomalies or false positives as you are confident in your error handling coverage. (not ideal but one of the quirks of serverless computing, the issue is one someone else server :)

I am seeing the same thing - on node 10 lambdas. The alarms are triggered on errors crossing a threshold, but there are no errors in the logs. I end up wasting quite a lot of time checking false positives, and I can't see a reason why this could be the desired behaviour, so would be great if the lambda team could improve this area.

Happened to me as well...
using python.
My lambda is triggered every second, so up until now I was sure I cannot find it because I have so many logs and I am not looking for the right filter... never thought that there are simply no logs...
However, I dont think it is random. It usually happens when there are problems in the DB...

This happened to me as well. I have java sdk lambdas in 2 separate regions and both of them generated error metrics from 6:45 - 7:10 AM CDT but there are no ERROR logs in cloudwatch.

Other stuff to look at:

  • API-GW logs
  • make sure lambda payload (request and response) do not exceed the limit
  • limit of concurrent lambda executions
  • etc...

This was happening to me as well in python 3.8.

Use the following query in Log Insights:
fields @timestamp, @message
| filter @message like "Process exited before completing request"
| sort @timestamp asc
| limit 20

It might be a memory problem causing the error. A timeout can also cause an error in lambda and you have to used a different query to find it.

Hi @PaulColeman,

Good morning.

I was going through the issue backlog and came across this guidance question. Please let me know if this is still an issue or else if this could be closed.

Thanks,
Ashish

This issue has not recieved a response in 2 weeks. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

still an issue...

Hi @PaulColeman @aya-givati,

Please have a look at the article How do I troubleshoot Lambda function failures? and let me know if it helps.

Thanks,
Ashish

Hi @ashishdhingra,
Thank you fir your response.
Unfortunately it did NOT help me.
my problem is that my "Error" metric Alert is on and I cannt find the lines in the log that explain why

@aya-givati I'm not sure what to recommend here since the invocation errors occur outside of .NET SDK. As explained in the documentation link I shared, for any code related errors, CloudWatch is the option. However for invocation errors, Cloudtrail could be the option. I would suggest to contact CloudWatch support for more details for troubleshooting. I will try to see if I could find any guidance, but this doesn't appears to be the .NET SDK issue.

I do see that you are using Python SDK. So this issue appears to be service specific, not a specific SDK issue. Were you able to get guidance from Python SDK team which might be helpful?

This issue has not recieved a response in 2 weeks. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

Was this page helpful?
0 / 5 - 0 ratings