Firebase-functions: Node 8 - 'connection error'

Created on 26 Mar 2019  路  33Comments  路  Source: firebase/firebase-functions


Hi,
after switching our functions to NODE 8 we are seeing "random" function failures with a message connection error.
It is always right before resource id is replaced. There is no additional info provided.

(MSSQL pool create: null means we that connection was established without error)
image

Related issues

https://groups.google.com/forum/#!topic/firebase-talk/21_uiLDav34
https://status.firebase.google.com/incident/Functions/18046
https://status.firebase.google.com/incident/Functions/17010
https://groups.google.com/forum/#!searchin/firebase-talk/%22connection$20error%22%7Csort:date/firebase-talk/pwLvgWmQPBQ/g94xJD0sAgAJ

[REQUIRED] Version info

node:
8.14.0
firebase-functions:
2.2.1
firebase-tools:
6.5.0
firebase-admin:

[REQUIRED] Test case

[REQUIRED] Steps to reproduce

[REQUIRED] Expected behavior

[REQUIRED] Actual behavior

'connection error' fail message with severity DEBUG without additional information

{
   "textPayload": "Function execution took 48 ms, finished with status: 'connection error'",
   "insertId": "000000-2739f8e3-4c81-4586-8931-a89020d19659",
   "resource": {
     "type": "cloud_function",
     "labels": {
       "project_id": " *** ",
       "region": "us-central1",
       "function_name": " *** "
     }
   },
   "timestamp": "2019-03-26T10:46:27.464788799Z",
   "severity": "DEBUG",
   "labels": {
     "execution_id": "vopi12sigsg0"
   },
   "logName": "projects/ *** /logs/cloudfunctions.googleapis.com%2Fcloud-functions",
   "trace": "projects/ *** /traces/40116ed33fcca88d605290469693fba0",
   "receiveTimestamp": "2019-03-26T10:46:32.776101066Z"
 },

Were you able to successfully deploy your functions?

yes

infrastructure tracked internally

Most helpful comment

Yes @Neilpoulin I have the same issue that just showed up this week.
My cloud function worked fine on node 8 until about 3 days ago when it started failing.
I'm also using typescript, so I'm sure I'm returning promises correctly (i've also checked and double checked to be sure), and I'm only querying/updating 46 documents - so it should be an easy operation.

Finally after finding this thread - I was able to fix it by updating to node 10.

Side note:
I was on node 10 about a month ago, but the console logs are so much harder to navigate (see this thread: https://twitter.com/JustinNoelDev/status/1207360381041156096)
I guess for now I'll keep it at node 10, and file a separate issue for that.

Update: fix for console.log objects showing up clean in node 10 here: https://github.com/firebase/firebase-functions/issues/612#issuecomment-646652675

All 33 comments

I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight.

Hi @m4recek thank you for reporting this. Unfortunately we have been seeing several reports of this "connection error" happening with Node8 runtimes. There is work to track this down in the Google Cloud Functions infrastructure (internal bug reference: 124230993) and I will ping that thread to get it prioritized since this seems to be reoccurring.

Do you always get this error or only transiently? Has your function been able to run at any point with Node8? I assume it was working fine with Node6?

For now as a workaround, can you try setting a retry option on your function in the google cloud console? It should help with these kind of transient errors, see https://firebase.google.com/docs/functions/retries#use_retry_to_handle_transient_errors. Do make sure that you test your code thoroughly before you enable retries to ensure a bug in your code does not make your function execute many times (as you pay per every invocation even when retried).

Hi, thanks for quick reply!

Do you always get this error or only transiently?

  • always

Has your function been able to run at any point with Node8?

  • functions are responding until this happens for one request, afterwards underlying resource is replaced (see screenshot) and we are running fine until next error. Its best to test this on a slow reoccurring requests (our is running every 10s)

I assume it was working fine with Node6?

  • yes, no problem there

Workaround

  • we see this on our DEV environment, I cannot ensure bugfree code there :)

I think in this case it's an infrastructure issue, try the workaround and let me know if it helps. Regarding bug-free code that was just my way of warning that you do get charged for retries same as normal invocations ;)


Hey @m4recek. We need more information to resolve this issue but there hasn't been an update in 7 days. I'm marking the issue as stale and if there are no new updates in the next 3 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

We are also seeing this issue and also only since upgrading to Node 8 from Node 6 in March. I'm happy to provide any assistance to getting it resolved as it is hitting our production servers.

The retry workaround does not help, as we have HTTP endpoints that various client apps use, and adding retry logic to these, particularly to writes, is not desirable.

Thanks for chiming in @shaneosullivan and thanks for the kind offer. Unfortunately this issue cannot be resolved in the public facing functions SDK (this repo) as it looks like a Google Cloud Functions infrastructure issue with the Node 8 runtime. I'll check in on the internal bug we have to track this and post any updates here.


Hey @m4recek. We need more information to resolve this issue but there hasn't been an update in 7 days. I'm marking the issue as stale and if there are no new updates in the next 3 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

This error happens randomly in our cloud functions too. The app uses firestore. For now we need to downgrade to Node 6.
Logs_Viewer

We also have been seeing this since updating to Node 8. It got so bad that sometimes we were seeing a 50% failure rate hitting a function that served a GraphQL application. We're downgrading our entire fleet to Node 6 as the situation is simply untenable, our product is seriously degraded as a result. Moving the initial set of servers to Node 6 has completely fixed the issue, and we expect that to be the case for the rest of our Firebase projects

Hi @shaneosullivan , thanks for the additional info. After following up on the internal bugs tracking this, it looks like a similar issue to https://buganizer.corp.google.com/issues/125425924#comment18. The Cloud Functions team is actively investigating this - please feel free to drop a line in that bug and sharing your project number (publicly or privately) to help with the debugging. Additional internal bug reference: 126199277


Hey @m4recek. We need more information to resolve this issue but there hasn't been an update in 7 days. I'm marking the issue as stale and if there are no new updates in the next 3 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

@thechenky that seems to be an internal Google site, unfortunately I can't access it, as I don't work at Google.

Hi, is there any update?

Being on Node 6 is a bit of a challenge nowadays. Today we cant release because gaxios just dropped support for Node 6. Which is dependency of Google's pubsub.

Current state is:
Node 6 - will reach end-of-life in 4 days, more and more packages are dropping support
Node 8 - has this bug for at least a month now
Node 10 - is in beta

What is expected upgrade path? Is there any ETA for this to be resolved?

We've also had to force our servers to use an old version of gaxios just today. This is getting crazy, we simply cannot upgrade to Node 8 again as the failure rate is beyond unacceptable (up to 30% on some calls). When will this be fixed? If there is not a fire drill/war room situation inside the Google Cloud team right now, there should be

Hi @m4recek, @shaneosullivan I'm sorry you're still struggling with this issue. I've checked on the internal bug and the GCF team is working hard to resolve this. I have pinged the bugs again to make sure your concerns have been passed on. Will post more updates as I have them. Apologies for this frustrating experience!


Hey @m4recek. We need more information to resolve this issue but there hasn't been an update in 7 days. I'm marking the issue as stale and if there are no new updates in the next 3 days I will close it automatically.

If you have more information that will help us get to the bottom of this, just add a comment!

Removing the no-recent-activity label so this doesn't get automatically closed

I have some updates! The GCF team has identified a workaround that prevents the connection error issue from happening, and they are in the process of rolling this out to production in the next week. I'll update this thread when the fix is live. Thanks again for everyone's patience!

Hi @thechenky , any updates on this? Have the fixes been deployed and verified?

The error stop happening in our project from yesterday May 14, 2019
Logs_Viewer

Hi all! We are in the middle of rolling out the fix to production :) so glad to see its effects are already showing! I will update this thread when the rollout completes. Thanks everyone!

Really glad to hear this is being worked on. We also experienced this and it was causing crucial data inconsistency considering the denormalised data kept synced by cloud functions approach is what was recommended. Can't wait to be able to reliably use 8 thanks

Update: the issue has now been resolved for almost all of the cases that we have seen thus far.

Public tracker here: https://issuetracker.google.com/125425924. Please post on the tracker if you are still having issues. Thanks everyone! I'll mark this bug closed now.

Any Updates on this issue. I got this error on just 1 functions. And I don't know why and how to work around it.
Sorry, I can't join the public tracker(Permission denied, don't know why), so I post here.

Thank you

So, i just ran into this issue with a cloud function that processes ~10,000 records from firestore. I use promises (via async/await, transpiled via TypeScript) to process all of the records in a collection in batches of a configurable size. The function was regularly failing with the error code 'connection error', which, sadly, doesn't show up as an error in any of the Firebase function logs or dashboards.

This same code is used in other situations/cloud functions with great success and has been working well for weeks (or maybe even months).

I've tried changing the batch size (from 500 to values as low as 100 and as high as 1000), adding a timeout between processing batches to ensure i wasn't creating too many connections too quickly, all with no success. The function fails well before the timeout value and it still has plenty of memory. I have the timeout set to 540s and the memory at 1GB. Max memory useage appears to be just below 250MB.

Because this thread made it seemed related to the Node 8 runtime, I just tried changing my function to use the Node 10 (Beta) runtime. My function succeeded on the first try after making this runtime change.

Random Note: I noticed recently that the graphs in the Functions dashboard (https://console.cloud.google.com/functions/details/us-central1/xxxx) hadn't been working until recently, right around the time I started having trouble with my functions failing with "connection error".

Yes @Neilpoulin I have the same issue that just showed up this week.
My cloud function worked fine on node 8 until about 3 days ago when it started failing.
I'm also using typescript, so I'm sure I'm returning promises correctly (i've also checked and double checked to be sure), and I'm only querying/updating 46 documents - so it should be an easy operation.

Finally after finding this thread - I was able to fix it by updating to node 10.

Side note:
I was on node 10 about a month ago, but the console logs are so much harder to navigate (see this thread: https://twitter.com/JustinNoelDev/status/1207360381041156096)
I guess for now I'll keep it at node 10, and file a separate issue for that.

Update: fix for console.log objects showing up clean in node 10 here: https://github.com/firebase/firebase-functions/issues/612#issuecomment-646652675

I got an update from firebase support - the issue has been resolved again:

The last update from the team is that the issue in the node8 engine has been solved. In order to grab the fix, please re-deploy any previously affected Functions with the node8 engine.
Regarding the GCP status page [...] As this was not an outage, is not listed as such, but there was a message visible at the time I sent my previous response; it read something like "Recent deployments with node8 and python3 engines are returning 'connection error' when triggered with Firestore. This is being investigated and more information will be provided". It looks like, as this was not an outage, the message was just temporary.

Update 8 to 10 for node resolved this error for me.

I've been using cloud functions with node 8 a lot for the last year and a bit and never saw this issue. Over the last few weeks I've been seeing it constantly. We were planning to update to node 10 but it sucks to have our cloud functions suddenly failing due to an internal Google issue? Anyone have tips other than update to node 10 asap?

I have been struggling with this problem since yesterday randomly, very often for functions in node8 written in typescript

I have been seeing this issue recently as well. Guess they trying to force me to update to Node 10

This error was not related to the Node 8 deprecation. But it is worth mentioning that Node 8 has been completely sunset and is no longer supported.

What happens if a Node.js 8 function is left running after the removal of Node.js 8 support? Will it work indefinitely?

Why is Node 8 being deprecated?

How do I upgrade to Node 10?

Firebase Functions billing FAQ

Was this page helpful?
0 / 5 - 0 ratings