firebase-tools 3.18.6
(I have seen this issue for as long as I can remember -- since I started using Firebase about a year ago)
Windows
I do not have exact steps to reproduce this issue, but it seems related to deploying "too many" functions at a time.
I have two separate projects, one of which has 6 functions, the other has about 30 functions. When deploying either project, usually at least 2 functions will fail with the error Server Error. connect ETIMEDOUT <ip>:443
All functions deploy correctly
OR
Failed functions retry until they deploy correctly
Randomly, at least 2 functions will fail with the error Server Error. connect ETIMEDOUT <ip>:443. The same functions won't always fail. Often it's different functions.
Result of firebase deploy --only functions --debug from the 30-function project:
https://gist.github.com/SidneyNemzer/622243b987cc3529fcdfc0d835d7f3cf
There are two other issues that I saw with the same error message. They are related to deploying behind a proxy, which I am not doing.
Thanks for filing! You're right that the error is most likely caused by deploying too many functions at once, that's why we recommend deploying individual or sub-groups of functions rather than all the project's functions at once. Learn more at: https://firebase.google.com/docs/cli/#partial_deploys
Thanks as well for the comment about the logs containing an access token - that's actually one that is used by the Firebase CLI for all projects, so it is not unique to you or your project, so it's really not so secret after all.
I appreciate the response. So far, my workaround has been manually batch-deploying a few functions at a time. But, since the solution appears to just be retrying, I was hoping that it would be possible for the CLI to automatically retry the failed functions? In the past, I've seen references to having the CLI retry failures, but I'm not sure what's currently implemented (and I know that for this specific error, the CLI reports that there was an error, even though some functions are actually deployed!).
And thanks for the clarification on the access token, I've removed that note from the original issue :)
It actually already does retry once for each function, otherwise the failure would be way higher ;)
Really!? Wow, that's good to know. Why exactly does deploying so many functions at a time cause a problem? Are function deployments throttled on Google Cloud Services' end?
Anway, could the CLI continue to retry failed functions? That's what has to happen manually anway, if the CLI doesn't.
@SidneyNemzer There is a daily quota for deployment as an anti-abuse measure. So having the Firebase CLI retry indefinitely would blow through that quota. You can learn more about quotas at: https://cloud.google.com/functions/quotas
That page is very helpful, thank you. It makes sense that deploys are throttled, although it is not clear from the CLI error that any limit has been hit, as opposed to a random connection issue (which should be safe to retry).
I've found the partial deploys are really useful for speeding things up or incrementally updating.
Sometimes though we are setting up a new coworker and they need to deploy all of the functions. Fairly often there will be one or two timeouts. Sometimes we get timeouts in our CI workflows as well.
It would be nice if the tools were a bit more reliable for these scenarios. If it's a quotas issue perhaps it could be beneficial to trade off some deploy speed for a bit more reliability. The normal scenario after a failure is people attempt to deploy again right away. Throttling the deploy speed so they don't have to reattempt would still be faster in that situation.
Because this issue is largely caused by backend / quota management problems, I'm going to close it on this repo -- there's no real action to take in the CLI since retries will only exacerbate the problem in some cases. We are tracking this kind of thing internally and hope to make infrastructure improvements in the future to address.
Actually, like @Crazometer was suggestion, I opened this issue here because I was hoping the CLI could be a bit smarter about managing the quota, even if that meant longer deploys. For now I can manually retry failures.
I just ran into this issue deploying a _single_ function. I had to redeploy like 6 times before it was successful. Annoying.