React-native-code-push: CodePush.updateCheck returns 503

Created on 21 Sep 2020  ·  64Comments  ·  Source: microsoft/react-native-code-push

Thanks so much for filing an issue or feature request! Please fill out the following (wherever relevant):
Hi,guys
When I request an updated interface, it returns 503
who can help me?

availability bug investigating

Most helpful comment

We found that issue happening only for old SDK, which uses https://codepush.azurewebsites.net/ service, new SDK(which using https://codepush.appcenter.ms) have no issues (at least at the current moment), the backend services themselves are healthy and able to process the traffic.

Right now we are looking into possible root cause https://codepush.azurewebsites.net/ unable to work with backend services which effectively means that unfortunately we still have no ETA on when it going to be fixed.

So now I advice those who experience the issue and can deploy new version to store, to switch to most recent SDK so that works with new gateway and API which doesn't affected.

Being wrote that, I also would like to clarify, this doesn't mean that we going to just stop support previous SDKs right now (at some moment of time they get retired though, as this normally happens while software evolves) our plan now is to get https://codepush.azurewebsites.net/ back.

All 64 comments

Same here



Service Unavailable


Service Unavailable



HTTP Error 503. The service is unavailable.



Same here this is what I'm seeing on Sentry

POST https://codepush.azurewebsites.net/reportStatus/deploy [503]
GET https://codepush.azurewebsites.net/updateCheck [503]

Same here
502 and 503 errors
Can someone look into this? its affecting production apps.

Hello everybody, and sorry for late response.

Recently we had an incident on Azure side that lead to 503 errors when requesting CodePush server. Do you still experiencing such problem?

@ruslan-bikkinin Its still happening for us today (I have sent the curl call to anvesh kasani from the support appcenter team)
See the screenshot below, this is the same XHR that the codepush cordova plugin sends and randomly, it will take a back a 503.

image

Thanks for you comment, looking further it may be related not only to underlying Azure issue but also to some of backend services recycled. Our team actively looking for root cause and possible mitigations steps.

We're getting this issue now across all our apps. It happens way too often. We may have to think to move away from code push or investigate ways to host our own code-push server. It's very frustrating :(

Apologise for the inconvenience the issue causing, unfortunately no root cause is found yet but we continue our investigation.

Couple of hours ago we deployed the change which may(or may not, we need to gather some statistics to estimate it) mitigate(not fully fix unfortunately) impact by how frequent this happening.

So far I recommend you to stay tuned to this issue where we definitely would post update when we expect full remediation and would ask you for helping us to confirm this.

@snowpardx absolutely. I'll let you know once it's resolved on my end. Are you guys seeing the issue on your end? And is it under investigation?

Yes, we are still investigating this: 1 change which may mitigate this was deployed recently, and in addition we also investigating what could be possible root cause (probably tomorrow CET time, additional telemetry changes could help us to confirm one of the issue would be ready for deployment).

By the way unfortunately right now we don't have understanding on why this is happening, and so have no ETA. Once we learn more, we definitely return back to this issue.

@snowpardx we've been having issues with code-push throughout the day. We've checked recently and seems like all the issues are gone now. The code-push is now working as expected for us. We're happy that everything is resolved, would be good to know the root cause to avoid it in the future. Thanks!

@snowpardx and Andrei we are still facing this code push issue drastically and can't send builds to production application. Can we have any workaround so that the builds could be sent ?
Please suggest

Service Unavailable

Service Unavailable



HTTP Error 503. The service is unavailable.


Unfortunately we don't see any indication that deployed change helped a lot, I can confirm that error rate is somewhat about the same as before, so issue persists for some part of requests.
image
But it definitely appear only for small part of requests, so some subsequent retries would probably succeed.

Right now, we don't understand the root cause of the issue (it just started to happen at some moment, without directly related code changes). So there is no workaround available right now (except of retrying in some time).

Thanks for bearing with us while we investigating what's going on.

Thank you @snowpardx , I understand, just hope the team is actively working on this one. Facing too much heat from customers about the failing code updates.
Will wait for an update from you. 👍

@snowpardx @VrajSolanki I can confirm that the issue is back this morning. It was working well last night which makes me think, if it's simply a high load issue.

Are you guys able to scale out code-push services?

At the moment its 80-90% fail rate;
Maybe try and restart the blades on azure? If that fixes it, maybe setup Azure Autoheal to detect 503 and restart the instance?

We are running two apps in service. However, we only find the issue in the only one of our apps. Another one works just fine. Our app heavily depends on the update check code, which make whole app unserviceable. We just deployed new application which can run with the 503 error. Hope the issue gets fixed soon. Let me know if you need more details of packages or settings.

@tedkimzikto I am currently sending the code push to check. Still the same issue persists.

  1. the 503 error
  2. check code push coming up too late for the app, because the api is not responding.
    I would say it has the same fail rate even now.

@VrajSolanki yes, it's not working at all. It's hurting our business badly, especially considering it's been down for couple days now. Customers are seeing an older app version because updates are not coming in...

deploy new version to the playstore and app store for now, and hope for the best :0

@tedkimzikto :) easy to say, we have about 100 apps that we'd need to deploy.

https://status.appcenter.ms went back to normal, is it relative to the issue?

@tedkimzikto it was green all day yesterday. It doesn't reflect the real status of the service as far as I can see

@tedkimzikto @andrei-m-code even if we push the app to app store it is going to take couple of days for the apps to be live, until then all the applications would break. Can we have a hotfix or workaround so that we can send a hotfix and then freeze the use of code push until it is fixed ?
@tedkimzikto as you suggested on app is working, won't the same setup solve for this one too ?
@snowpardx Want to know your thoughts here

@VrajSolanki to be honest I don't think that it going to be fixed in short frame like couple of days, given the fact we don't understand the underlying issue, so no ETA so far.

Interesting thing here is that with url you provided I'm able to reproduce the issue way more frequent that "around 5% of total requests" we seeing in the telemetry, while it doesn't repro at all for the app I created by my own.

@snowpardx can we get the details from your test app ? maybe some miss on our end for the setups ?
that can be a good starting point, for us to try and fix it ?

Same issue here. Testing using the @decadX suggest. Some evolves from App Center team?

@snowpardx The issue is clearly periodic. Right now code-push works very well for me. If the issues start to happen again tomorrow morning, it would support this theory. Therefore I believe the periodicity of the issue happening could only depend on the load, unless you guys do something funky with the services every day during business hours :) ?

Possible solution: scale the services out and see if it fixes the issue. As you're hosted on Azure, you must have AppInsights connected to this app service? What is memory and CPU use? Is the service running hot between 8am-6pm EST? I'm sure you guys have already checked it. Regardless I'd still try to scale the services and see if it fixes the issue.

We heavily rely on code-push and really need it to be fixed. I'm happy to jump on a call and show you what we experience or run some tests from my app as needed. Please feel free to reach out.

Hi,
Same issue here. All our production updatechecks are failing and no one is able to install the build.
Does anyone have any update on when the issue will be resolved?

@snowpardx
We are heavily relied on code push and need it to be fixed. How can we help here?

Same issue here. but not all of our apps , some apps still working normally.

We found that issue happening only for old SDK, which uses https://codepush.azurewebsites.net/ service, new SDK(which using https://codepush.appcenter.ms) have no issues (at least at the current moment), the backend services themselves are healthy and able to process the traffic.

Right now we are looking into possible root cause https://codepush.azurewebsites.net/ unable to work with backend services which effectively means that unfortunately we still have no ETA on when it going to be fixed.

So now I advice those who experience the issue and can deploy new version to store, to switch to most recent SDK so that works with new gateway and API which doesn't affected.

Being wrote that, I also would like to clarify, this doesn't mean that we going to just stop support previous SDKs right now (at some moment of time they get retired though, as this normally happens while software evolves) our plan now is to get https://codepush.azurewebsites.net/ back.

@snowpardx if we're using [email protected] does this mean we're on the latest SDK? If we are, we are continuing to experience issues. Specifically, we had about 21k+ yesterday and 39k+ errors today. Please let us know if there's anything we can do about this, we're losing a lot of users due to this

@sanjaypojo thanks for the additional info, starting react-native-code-push v5.7.0 requests should go to another URL. Which means now both frontends are affected 😞

hello,

I'm also seeing this for the new url as well, as you can see in the screenshot below:
image

libraries used:

"appcenter": "3.1.1",
"appcenter-analytics": "3.1.1",
"appcenter-crashes": "3.1.1",
"react-native-code-push": "6.3.0"

Note that this does not happen as often as the failures mentioned above, which target the older url, but it still occurs once in 4-5 hits. Unfortunately that does impact severely our users as well. I also think that there are certain periods of time when it occurs more often, probably when there is more intense traffic on the servers.

Also getting long running requests that take either more than a minute or end up being timed out:
image

@snowpardx we've implemented retries and we're getting users retrying the API up to 70 times and receiving 503 errors every time.

This is independent of users who timeout (we have a 30 second timeout on the request). There has to be something you can do about this outage? And since we're getting responses rapidly from MS, I'm assuming that this has to be logged somewhere server side? We're really struggling here and don't have a workaround right now.

The status page says that the issue is only with the legacy API. Could this be updated so that it's confirmed that the outage is on the latest API too? And please let me know if you want any specific details, we're happy to share and help you solve this problem ASAP.

https://status.appcenter.ms

Hey folks, any ETA on this one? Or any possible action on our side to help fixing this?

Hi folks. Sincerest apologies for the delay here. After a lot of investigation, I think we've narrowed down our issue to a recent infrastructure change. I believe this has been resolved by a revert of that change in the last hour or so. Please verify this is the case for you, and let us know if the problem persists.

Hallelujah! @jphenow @snowpardx thank you guys! It seems to be working well for us now. A couple questions:

  • Would this "legacy" endpoint be available? No plans to shut it down just yet?
  • Is there any SLA associated with code-push endpoints? Any service level difference between free and paid accounts?

Again, thanks for keeping us posted and fingers crossed it will all work well from now on :).

Seems to be working for us as well 😄

And great point @andrei-m-code. I'm also interested in hearing about this! It would give us a lot of confidence in using AppCenter as stability is critical for us:

  • Is there any SLA associated with code-push endpoints? Any service level difference between free and paid accounts?

Still getting error 503

Now still getting error 503

Yep, 503 again.
@snowpardx @jphenow

Hey folks, sorry about this. We're working on getting better alarms on this kind of thing since we've been investigating this. I'm tuning some things that should resolve the worst of this right now, though we're monitoring to be sure of that.

One thing I meant to note earlier is if folks reach out via support ("?" in top right of appcenter.ms, then "contact support") we tend to be more able to triage and find a pattern of issues on our end.

To your questions earlier:

Would this "legacy" endpoint be available? No plans to shut it down just yet?

I'm not aware of any near-term plans to shut this down, no. I'm working to double check that though. If there are plans while this conversation continues, I will correct this.

It being a legacy API does mean we tend to have less attention on it, and sadly tends to take us more time to investigate issues like this. I would really encourage folks to update to newer APIs whenever possible.

Is there any SLA associated with code-push endpoints? Any service level difference between free and paid accounts?

We don't have a paid plan for CodePush. We do have SLOs (Objectives) which is a general aim for 99.9%.

Hey folks, sorry about this. We're working on getting better alarms on this kind of thing since we've been investigating this. I'm tuning some things that should resolve the worst of this right now, though we're monitoring to be sure of that.

One thing I meant to note earlier is if folks reach out via support ("?" in top right of appcenter.ms, then "contact support") we tend to be more able to triage and find a pattern of issues on our end.

To your questions earlier:

Would this "legacy" endpoint be available? No plans to shut it down just yet?

I'm not aware of any near-term plans to shut this down, no. I'm working to double check that though. If there are plans while this conversation continues, I will correct this.

It being a legacy API does mean we tend to have less attention on it, and sadly tends to take us more time to investigate issues like this. I would really encourage folks to update to newer APIs whenever possible.

Is there any SLA associated with code-push endpoints? Any service level difference between free and paid accounts?

We don't have a paid plan for CodePush. We do have SLOs (Objectives) which is a general aim for 99.9%.

What api version can i use to solve this problem?. I'm using react-native-code-push version 5.4.0

@tackanoway35 you may try the latest plugin. Starting v5.7.0 requests are go to different URL.

@tackanoway35 you may try the latest plugin. Starting v5.7.0 requests are go to different URL.

Thanks!
I will try this

We're still continuing to experience about 3k+ errors per day :(

Any news on this issue?
I'd be happy to update to the latest version but we need the legacy/older version to work, to propagate the update 😄

Any news on this issue?
I'd be happy to update to the latest version but we need the legacy/older version to work, to propagate the update 😄

OK, I've re-deployed a new CodePush release and it seems that it's working fine so far. Fingers crossed!

Thank you folks for your help!!

We continued to have 15k+ errors last week. @jphenow @snowpardx would you know why this is happening?
Specifically:

  • codePush.checkForUpdate has errored out 10k+ times
  • codePush.checkForUpdate has timed out 7k+ times. We're using a 10 second timeout to speed up boot times since this function is failing so often.

Would really appreciate help here. We've been struggling with this issue for over a month now and really want to rely on CodePush but this makes it extremely hard.

I recently stepped away from Microsoft so sadly I don’t have information for you really. To my knowledge, there was still work being done to resolve this. Perhaps @snowpardx can update the thread. If you have open support conversations with the App Center team that will likely be the best way to get up-to-date information about the situation.

@sanjaypojo @jphenow sorry I don't work for Microsoft anymore as well for some time already, so don't really have exact information on what's going on right now. From my a bit outdated knowledge team is looking for way to resolve the issue, but it may be not possible for a legacy endpoint (where the issue caused by a significant increase in the traffic happened last month).

So I personally highly recommend to update for a new SDK where backend architecture allows better scaling.

For the next updates please reach out to @alexandergoncharov

We are also seeing issues with the SSL or connectivity, on FireFox/Chrome, trying to open the https://codepush.azurewebsites.net/updateCheck results to a PR_CONNECTION_RESET_ERROR. To replicate this, you need to try a few times (maybe open/close incognito to avoid re-using an established connection)

Its no longer a 503 error.

Same error with curl

  • About to connect() to codepush.azurewebsites.net port 443 (#0)
  • Trying 40.112.243.12...
  • Connected to codepush.azurewebsites.net (40.112.243.12) port 443 (#0)
  • Initializing NSS with certpath: sql:/etc/pki/nssdb
  • CAfile: /etc/pki/tls/certs/ca-bundle.crt
    CApath: none
  • NSS error -5961 (PR_CONNECT_RESET_ERROR)
  • TCP connection reset by peer
  • Closing connection 0
    curl: (35) TCP connection reset by peer

Thanks for writing in @jphenow / @snowpardx, didn't realize you were no longer with Microsoft!

@alexandergoncharov could you please help with this? It's a huge problem for us right now.

We continued to have 15k+ errors last week. Would you know why this is happening?
Specifically:

codePush.checkForUpdate has errored out 10k+ times
codePush.checkForUpdate has timed out 7k+ times. We're using a 10 second timeout to speed up boot times since this function is failing so often.
Would really appreciate help here. We've been struggling with this issue for over a month now and really want to rely on CodePush but this makes it extremely hard.

I'm currently using react-native-code-push 5.6.1: I'm wondering if updating to the latest version (v6.4.0
) would require some changes to our codebase?

Hi all,

It looks like the percentage of error became really lower. Could you please let us know if you still see 503 errors issue or it was resolved for you?

We're seeing a few random 503 from time to time, but updates are now mostly
visible and downloadable in our app.
Thanks a lot folks!

Le jeu. 19 nov. 2020 à 10:51, Alexander Goncharov notifications@github.com
a écrit :

Hi all,

It looks like the percentage of error became really lower. Could you
please let us know if you still see 503 errors issue or it was resolved
for you?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/microsoft/react-native-code-push/issues/1952#issuecomment-730257041,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAMIH76TOL6N654NMDVLKGLSQTTA7ANCNFSM4RUI5HHQ
.

Hi all,

It looks like the percentage of error became really lower. Could you please let us know if you still see 503 errors issue or it was resolved for you?

OK, my bad, just noticed this afternoon that the API replies are very random today again. I was using the live version of our app and had our "Update Box" appear very randomly... Still some issues with the servers I guess!

Hi all,

It looks like the percentage of error became really lower. Could you please let us know if you still see 503 errors issue or it was resolved for you?

Hi @alexandergoncharov it wasn't resolved for us and we saw about 1000 errors this morning. There seems to be no reduction compared to the previous trend for us. Is it possible that these errors occur in some specific geographies?

Hi @sanjaypojo ,

It is a good question. Possibly it can be but I'm not sure for now. Could you please share your status for now?

@alexandergoncharov we continue to see 25k+ errors per week, with no change in the trend. We've had this issue for over 3 months now. Is there any ongoing work to fix this? If not, we are forced to try and self-host our CodePush bundle due to this issue.

Was this page helpful?
0 / 5 - 0 ratings