Firebase-tools: Retry deploy of failed functions

Created on 17 Apr 2020 · 10Comments · Source: firebase/firebase-tools

First: Is this still the best place to submit feature requests for the Firebase CLI? As the template told me otherwise :D I could submit this with the GCP support if this is better.

Deployments with many functions tend to fail often, it's recommended to split them up into batches, what we did and it works better now.
However, sometimes in a batch, one function fails to deploy and the Firebase CLI tells me which and even gives me a hint, what command I should use to run to deploy only this function.

To try redeploying those functions, run:
firebase deploy --only functions:functionName

However, in CI systems do this will not work. We would have to scrape the command output and extract this command.

A way better and easier option would be to tell the Firebase CLI to automatically retry those failed deployments.

I am thinking of firebase deploy --only functions:function1,functions:function2,functions:function3 --retry-failed

This would work like this:

i functions: preparing functions directory for uploading...
i functions: packaged functions (18.65 MB) for uploading
✔ functions: functions folder uploaded successfully
i functions: uploading functions in project: function1(us-central1), function2(us-central1), function3(us-central1)
i functions: updating Node.js 8 function function1(us-central1)...
i functions: updating Node.js 8 function function2(us-central1)...
i functions: updating Node.js 8 function function2(us-central1)...
✔ scheduler: all necessary APIs are enabled
✔ functions[function1(us-central1)]: Successful update operation.
✔ functions[function2(us-central1)]: Successful update operation.
⚠ functions[function3(us-central1)]: Deployment error.
Build failed: {"cacheStats": [{"status": "MISS", "hash": "askdjhaskjdhf", "type": "docker_layer_cache", "level": "global"}, {"status": "HIT", "hash": "adkhasdjhfi", "type": "docker_layer_cache", "level": "project"}]}

Functions deploy had errors with the following functions:
function3

Automatically retrying deployment

i functions: preparing functions directory for uploading...
i functions: packaged functions (18.65 MB) for uploading
✔ functions: functions folder uploaded successfully
i functions: uploading functions in project: function3(us-central1)
✔ scheduler: all necessary APIs are enabled
✔ functions[function3(us-central1)]: Successful update operation.

✔ Deploy complete!

Of course, there would be a limit of retries and errors without success for succeeding should not be retried.
Maybe including an exponential backoff.

This would improve the deployment process, as we as developers would not see us forced to build in a retry of whole batches to improve the stability of the CI/CD processes. And only actually failed deployments would be retried.

I've seen @laurenzlong's comments in this issue #728, about using partial deploys and I get that, we do this already, but we still get errors sometimes and we would like to make our deployment smarter and less resource-intensive.
I think this change would improve things already, as the Firebase CLI has the most information about the deployment and we should be able to use that to make deployments more reliable.

This is why I think a --retry-failed or a --retry flag would be really useful.
The flag itself is also very generic so it could be used for several other things as well.

Source

IchordeDionysos

👍13

Most helpful comment

My issues are also not about quotas. I have a custom script that deploys functions in batches. A percentage of functions always fails to deploy, not due to quota issues, but due to multiple "random" failures: "cannot load source code xxx" etc. Retrying always helps. I reported these issues 1.5-2 years ago, still not fixed. At this point I'd be really happy with an auto-retry mechanism, as I don't believe the root cause will be fixed soon.

merlinnot on 21 Apr 2020

👍6

All 10 comments

There's already a PR for it, unfortunately seems not to be progressing: https://github.com/firebase/firebase-tools/pull/1977.

merlinnot on 17 Apr 2020

I don't really like the retry approach proposal in #1977 . IMO it's an issue which needs to be fixed instead of adding features to compensate it...

tspoke on 20 Apr 2020

👍1

I agree. But we don't know when and if it's going to be fixed, so having a workaround would still be useful.

merlinnot on 20 Apr 2020

👍1

@tspoke There will always be quota limits, don't you want those failures due to quota limits be retried automatically with exponential back off so you don't have to worry so much?

IchordeDionysos on 20 Apr 2020

anantakrishna on 21 Apr 2020

@IchordeDionysos I didn't said that the retry feature was useless, in case of quota limit it's fine. I was talking about the current issue we are facing recently which is not a quota limit error, using a retry is just a workaround to hide a problem :)

tspoke on 21 Apr 2020

merlinnot on 21 Apr 2020

👍6

Until the core “Build failed” issue is fixed and #1977 is approved, someone suggests a temporary workaround in the form of a self-made script: https://issuetracker.google.com/issues/154260223#comment26

anantakrishna on 23 Apr 2020

👍1

Yeah, I chime in with @merlinnot here, I have the same experience. I'm not sure how to solve, it's not too bad for local deploys, but for CI, I'd like it to wait with a much longer timeout or something.

mpj on 1 May 2020

Struggled with a couple hours with this issue today. Was appalled to know that google did not let me deploy due to some payment issue. My project is on Blaze plan and some auto debit had failed. Logging this here for any future readers facing the same issue to keep in mind this angle as well. Google can do a better job of communicating this through their console, emails etc. Wasted a lot of my time today.