Cwa-app-android: 9002: Timeout waiting for 180000 ms

Created on 20 Sep 2020  ·  76Comments  ·  Source: corona-warn-app/cwa-app-android

Avoid duplicates

  • [x] Bug is not mentioned in the FAQ
  • [x] Bug is specific for Android only, for general issues / questions that apply to iOS and Android please raise them in the documentation repository
  • [x] Bug is not already reported in another issue

Describe the bug

Timeout bei Prüfung und daher keine aktuelle Risiko-Ermittlung

Ich habe ein ziemlich altes und etwas langsames Handy (Fairphone 2 mit Android 7) und ich habe die App da letzte Woche installiert und sie hat die ersten paar Tage auch gut funktioniert.
Nach fast einer Woche dauert die Risiko-Überprüfung allerdings so lange, dass sie abbricht.
Wenn ich die App öffne, sehe ich den "Prüfen"-Bildschirm. Dort steht, dass der Vorgang mehrere Minuten dauern kann.

Nach ca. einer Minute bricht der Vorgang aber ab mit der folgenden Fehlermeldung:

photo_2020-09-20_11-36-17
photo_2020-09-20_11-36-18

Einen Tag später war der Fehler immer noch da (er ist vollständig reproduzierbar bei mir) und die Risikoeinschätzung ist von grün auf unbekannt gesprungen:

photo_2020-09-20_11-36-19

Expected behaviour

Dass die Überprüfung lange dauert, ist wohl leider so. Mein Gerät ist halt langsam.
Aber ich würde mir wünschen, dass sie funktioniert.

Steps to reproduce the issue

Ich weiß nicht, ob sich das einfach reproduzieren lässt. Vielleicht:

  1. Die App auf einem alten Handy installieren.
  2. In einen Supermarkt gehen.
  3. Status sollte grün sein.
  4. An zwei Tagen in Folge durch einen vollen Hauptbahnhof laufen.
  5. Prüfung läuft in einen Timeout.

Technical details

  • Mobile device: Fairphone 2
  • Android version: Android 7.1.2
  • App version: 1.3.0

Possible Fix

Den Timeout verlängerne – vielleicht auf so zehn Minuten?

Additional context

Das könnte das gleiche Problem sein wie #809, aber das war mit einer älteren Version der App.

Ich benutze Fairphone Open OS mit microG statt Fairphone OS mit den originalen Google Apps.


Internal Tracking ID: EXPOSUREAPP-2863
Internal Tracking ID: EXPOSUREAPP-2893

bug hot topic 🔥 mirrored-to-jira

Most helpful comment

So on a 100 Mbps connection downloading took a significant portion of the 60s timeout even though packages were quite a bit smaller back then… so maybe it's not the internet connection speed from the wall which is limiting but something else in the pipeline? I have no clue tbh shrug.

Connection speed can only be as fast as the device can write away the data.
Storage I/O is one of the biggest bottlenecks on Android devices, especially on budget devices.
Even worse on budget devices that can replace the smaller (but faster) internal storage, with a bigger (but slower) external sdcard.
Lot's of manufacturers skimp on good hardware there as it can't be used as effectively as screen specs or "the number of cores" when advertising a device :wink:.
It's often not even the raw file to storage speed, but parallel storage access that slows everything down to a crawl.

All 76 comments

Hi @YtvwlD , es gibt schon eine Code-Änderung, die den Timeout von 1 Minute auf 3 Minuten verlängert #1138 . Leider kommt die Änderung erst mit der App-Version 1.4.0, es wird also noch eine Weile dauern.
Halte uns hier doch bitte auf dem Laufenden, ob sich nach dem Update auf die Version 1.4.0 das Problem bei dir erledigt. Vielen Dank und Grüße

Auf den Screenshots ist der Flugmodus aktiviert. Könnte es daran liegen?
Ich vermute, dass dann die Kommunikation mit dem Server nicht klappt.
Dass auch Bluetooth deaktiviert ist, sollte an dieser Stelle nicht stören.

Nee, das hat damit nichts zu tun. Ich hab auch schon probiert, den aus zu machen. Und WLAN ist ja da.

Hey @YtvwlD ,

have you had a look at https://github.com/corona-warn-app/cwa-app-android/issues/998? The error message is slightly different, but maybe the issues are related. Other have posted workarounds there that might work for your problem too.

As @vaubaehn already pointed out, the timeout problem is addressed in #1138 and should hopefully be fixed in a future update.


Corona-Warn-App Open Source Team

Hey @heinezen,
yes, I've looked at this issue. But I don't think they are related. The other issue is about failing to download the data in time. I don't have a firewall or something like that and I tried it on multiple occasions and both on mobile data and on wifi. Also, it worked in the past.

@YtvwlD could you share your ENF log? I'd like to see whether the keys for today have been downloaded successfully.

Do you also experience "CAUSE: 3 Something went wrong. ApiException: 39508" like the user here when you open CWA afterwards?

@daimpi How do I get the ENF log? Is that just inside Android's normal logcat?

I can't remember this happening before but after opening the app today I got "URSACHE 3: Etwas ist schiefgelaufen. Fehler bei der Kommunikation mit der Google-Schnittstelle (8)" (after clicking on "details" the stacktrace begins with "com.google.android.gms.common.api.ApiException: 8")

(At this point it might be relevant to remind you that I'm using microG instead of Google's com.google.android.gms.)

@YtvwlD

(At this point it might be relevant to remind you that I'm using microG instead of Google's com.google.android.gms.)

That's a good point. Usually the ENF log can be found under Settings -> Google -> Covid-19 Notifications -> Click on "checks performed in the last 14 days" -> burger menu in the upper right corner -> export checks.

You can upload it here by either packing the file into a .zip archive or renaming the ending from .json to .txt.

I can't remember this happening before but after opening the app today I got "URSACHE 3: Etwas ist schiefgelaufen.

Interesting, thanks for the info 🙂.
You might want to check out my and @vaubaehn's comments here and here for further details.
(@heinezen you might also wana check out those comments as they're directly related to the reason why the "9002: Timeout waiting for 60000 ms" error is happening)

After we Figure out that my phone has the same Problem, what can I do? Just wait until CWA 1.4 is available?
Shall I try to empty the RAM (daily use is 627 MB of 940 MB)? Could this help to speed up the performance?

@dukleinekkmaus that's a good question. I'm not entirely sure what can be done from the user side at this point, b/c in your test you already provided basically optimal conditions for CWA but it failed nevertheless.
Maybe @vaubaehn & @kbobrowski have an idea?

One thing the CWA-Team could do to spot such problems in the future, is to add low power devices like your Blackview A5 to their Telekom testing center.

Not sure what can be done. @YtvwlD regarding "ApiException: 8" maybe you can ask at https://github.com/microg/android_packages_apps_GmsCore, since you are using alternative ENF implementation this error may have different meaning than in the original Google implementation, maybe microG developer can say something more about it

Looks like this error is different from #998 then.

@YtvwlD @dukleinekkmaus

I've mirrored your report to an internal Jira ticket (EXPOSUREAPP-2863), so the developers will take care of it now. If new information on the bug is available, it should get posted here.

Regards,
CH


Corona-Warn-App Open Source Team

Hi @YtvwlD , es gibt schon eine Code-Änderung, die den Timeout von 1 Minute auf 3 Minuten verlängert #1138 . Leider kommt die Änderung erst mit der App-Version 1.4.0, es wird also noch eine Weile dauern.
Halte uns hier doch bitte auf dem Laufenden, ob sich nach dem Update auf die Version 1.4.0 das Problem bei dir erledigt. Vielen Dank und Grüße

Gibt es die Möglichkeit, diesen Patch auf 1.3.1 zurückzuportieren, damit er schneller ausgerollt wird? Ich habe bei meinem (ebenfalls langsamen und alten) Telefon dasselbe Problem.

Gibt es die Möglichkeit, diesen Patch auf 1.3.1 zurückzuportieren, damit er schneller ausgerollt wird? Ich habe bei meinem (ebenfalls langsamen und alten) Telefon dasselbe Problem.

Ich befürchte dafür ist es mittlerweile zu spät…
1.3.1 wurde bereits gestern released: https://github.com/corona-warn-app/cwa-app-android/releases/tag/1.3.1

Ah schade, eine Zeitmaschine wollte ich jetzt wirklich nicht verlangen. ;)

There is a separate Topic on our end regarding the 9002 Timeout Issue. I will bring this up in our developer meeting today again and ask for an update. I will keep you updated.

These are the currently released versions:
1.3.1 Android
1.3.2 iOS

1.4.0 will hopefully be released within the next week. Unfortunately, I can not give exact dates for the release.

The developers confirmed that the timeout will be increased with release 1.4 and they will continue working on the timeouts in the following release 1.5.

This is the (kind of) corresponding issue in the MicroG repository:

https://github.com/microg/android_packages_apps_GmsCore/issues/1171

Not sure what can be done. @YtvwlD regarding "ApiException: 8" maybe you can ask at https://github.com/microg/android_packages_apps_GmsCore, since you are using alternative ENF implementation this error may have different meaning than in the original Google implementation, maybe microG developer can say something more about it

According to the MicroG developer, this error code is most likely emitted by the client library, thus it shouldn't matter which service is used. 8 means INTERNAL_ERROR.

The Update solve the Problem, but it takes more then 2 minutes to check the risk. I am afraid to run into the timeout again if there is more to compare ...

Android 6
LG G4C Smartphone.
Last update of Corona data was Okt. 13th

The error was with Version 1.3.x

Ursache: 9002 
Etwas ist schief gelaufen
Timed out waiting for 60000 ms

Now with Version 1.5:
After the update the app started from the scratch with asking all the questions for a new installation
The Error is now:

Ursache: 9002 
Etwas ist schief gelaufen
timeout

Screenshot_2020-10-20-08-25-40

I tried to activate all previously deactivated google apps - no change

@etron770 The update should have mitigated the issue, I am sorry to hear that the issue still persists for you.

I have collected your feedback and sent it to the developers.

Additionally:
doesn't matter the quality of the Wlan or LTE data connection. Because there is a poor connection rate at the screen print.

@svengabr
As I have a second unused G4C i did
a factory reset
update all apps and system
installed the corona warn app

same error

feel free to ask for more details, I am using Linux and have the Android debug bridge installed

Hi @etron770 and @svengabr

The error was with Version 1.3.x

Ursache: 9002 
Etwas ist schief gelaufen
Timed out waiting for 60000 ms

Now with Version 1.5:
After the update the app started from the scratch with asking all the questions for a new installation
The Error is now:

Ursache: 9002 
Etwas ist schief gelaufen
timeout

They are two different errors with different root causes. First error indicates, that exposure checking takes too long. The second error indicates, that there is a problem with the internet connection.

Additionally:
doesn't matter the quality of the Wlan or LTE data connection. Because there is a poor connection rate at the screen print.

Related to the stacktrace and having bad wifi AND having bad LTE, I would assume that the phone jumped between wifi and LTE while downloading the key files for exposure checking from the backend server. Nevertheless I'm surprised, because a) the timeout for the transaction to download files should be long enough, and b) even after a factory reset (and likely changed network conditions?) the error is reproducable.

@svengabr : is it possible to cut out the above comments from @etron770 and move it to issue #998 where it belongs to?

@dukleinekkmaus thanks for the update 🙂.

The Update solve the Problem, but it takes more then 2 minutes to check the risk. I am afraid to run into the timeout again if there is more to compare ...

That's good to know. One thing which was also changed is that the padding multiplier on the server side has also been dropped to 1, i.e. no unnecessary fake keys anymore 🎉. This will help with decreasing the load on the whole infrastructure (from smartphones over networks to servers), so the number of keys will go down quite significantly (by a factor of 5) but otoh we have federation roaming now which might increase the load again.

There will be an even further extension of this timeout to 8 min coming with CWA 1.6 though 🙂.

@ daimpi and @svengabr

Related to the stacktrace and having bad wifi AND having bad LTE,...I

The wlan and/or LTE qualitiy does not change anything.
At the test cellphone (factory reset-updates- installed corona app) is not sim card therefor not LTE availabel, and I tested it nearby the router

Lan:  Fiber to the Home  100Mb/25Mb
Fritzbox 7490
Wlan: 
Band 2,4 GHz
Download rate  ↓72   
upload rate: ↑56  
proberties: n/Wi-Fi 4, 20 MHz, WPA2, 1 x 1, 11k, 11v

The same error

Hi @d4rken ,
just to fill up some informational gap for me:
https://github.com/corona-warn-app/cwa-app-android/blob/0f12f4ddbd814ac368901858363f7835bb30b9a8/Corona-Warn-App/src/main/java/de/rki/coronawarnapp/transaction/RetrieveDiagnosisKeysTransaction.kt#L311-L324

At what point in time success is returned by ENF API? Immediately after calling/handing over keyFiles, configuration and token? Or after ENF finished exposure checking? (I'd expect the first, as ENF broadcasts intents when finished, but I'm not sure about it)

Thanks in advance!

Immediately after calling/handing over keyFiles, configuration and token?

Yes, shortly after. Handing over the files is quick (few seconds on most devices). The calculation inside the ENF happens asynchronously, so does not count towards the timeout as our transaction ends after handing over the files. When the ENF is done we are notified via broadcast and start a new transaction with its own timeout for the risk calculation.

Just another hint:
it seems that there is only a lack of data.
The test cellphone, switched off after installing and testing is still having this error.
The cellphone in use is working fine after one day.
The test cellphone is still showing the grey information field "Unbekanntes Risiko .... " and after "Prüfung läuft" the error will be schown
The cellphone in use ist showing the green information message "Niedriges Risiko .... 2 der letzten 14 Tage aktiv ...."

But ... after reading the error the user is assuming that the app is not working. Maybe he will remove the app.
That seems to be real problem during the first period of collecting data.

@d4rken

Yes, shortly after. Handing over the files is quick (few seconds on most devices). The calculation inside the ENF happens asynchronously, so does not count towards the timeout as our transaction ends after handing over the files. When the ENF is done we are notified via broadcast and start a new transaction with its own timeout for the risk calculation.

Thanks for this explanation. This is quite interesting b/c it kind of speaks against my theory that the process of expansion of DKs and matching them with RPIs is what is responsible for reaching this timeout on lower powered devices.

But if the matching and DK expansion isn't actually something which counts towards the timeout limit, what is causing the timeouts particularly on low powered devices like @dukleinekkmaus's ?

Hi @daimpi , if it's ok, I'll jump in for @d4rken to answer.

Thanks for this explanation. This is quite interesting b/c it kind of speaks against my theory that the process of expansion of DKs and matching them with RPIs is what is responsible for reaching this timeout on lower powered devices.

But if the matching and DK expansion isn't actually something which counts towards the timeout limit, what is causing the timeouts particularly on low powered devices like @dukleinekkmaus's ?

Actually, your theory is not wrong, and it _was_ contributing towards the timeouts, but in an indirect way.
Do you remember the kindergardener? The problem before CWA 1.5 was, that CWA was kind of DoS'ing the ENF (how @kbobrowski described it elsewhere) by trying to provide 14 keyfiles of DKs in 14 batches. After ENF received one, it already started its process of DK expansion and matching, which took quite much processor time. With the second call to the API it took more time, until ENF could respond. With every API call the ENF got more unresponsive, ending up in the timeout.

Now CWA 1.5 hands out a list of all DK files in just one single call, and it's up to the ENF to work down on the list (expansion of DKs and matching) 'in the background'. The ENF can now respond quite fast to CWA, that it had taken over the keyfiles list, before the timeout is reached. After ENF returned 'success' to CWA on taking over all files for matching, the matching process is starting. You can easily verify it, when comparing the 'update times' in CWA and ENF history: CWA's timestamp is set when the keyfiles had been handed over to ENF. ENF's timestamp is set when matching finished. On my slow phone, difference is now 2 minutes.
Did this clarify up your questions?

(@d4rken , please correct if I'm wrong - and thanks for answering to my question above!)

@vaubaehn ah yes, that explains it, thank you 😊.

One thing I'm still wondering about: in which cases will a new check be triggered by CWA? B/c if we look at this log, there were 20 checks performed even with CWA 1.5 and even on the log the day later (where the person didn't experience 39508) there were still two checks performed instead of one 🤔.

@daimpi

One thing I'm still wondering about: in which cases will a new check be triggered by CWA?

Afaik, checks are triggered (but not nescesseraily performed) when a) the main process of CWA is created, which happens after installation, update, reboot and opening CWA after force stop, b) a scheduled background job to check is due, which is 24 hours after the last successful check, or a certain time after the last unsuccessful check. A check should only be performed, if there was no successful check for the current day already. So, if you reboot your phone, and you already had a susccessful check that day, CWA won't do another then. But if there was no successful check, the reboot would trigger one (this is why we had also 9002: timed out while waiting 60000 ms in the past with less DKs, because the re-initialisation of Android on reboot consumes quite much processor time). If there is a scheduled job, it could be, that CWA can not perform it, especially when Andorid energy saving restrictions apply. E.g., if CWA is in doze mode, the scheduled job will be delayed until the device is not in doze mode anymore. You will wake up the phone from doze mode, when you plug in the power charge. But at that moment not only CWA's scheduled job is trying to run, but also scheduled jobs of other apps... All are fighting for processor time now at the same moment. This is, why even plugging your phone to the charge could have caused a (silent) time out in past versions of CWA.

But tbh, at the moment I don't have any clue, what had caused that log. Obviously CWA couldn't handover the keyfiles successfully, which is why I thought there might be an API 10 problem - 'no public key found'. Extremly wild guessing: Maybe there was a problem with the (cached) keyfiles, that they not had been stored properly before or got somehow corrupted? :thinking:

@daimpi
Ok, I re-checked the logs of that person again, and I think, a timeout is the most plausible cause.
And if you look to all past entries, something strange is going on with that phone:

  • Many days no check at all
  • Many days 20 checks (points to time out)
  • Some (pre-CWA1.5) days less than 14 checks, e.g. just 9 checks or just one check, no further try for that day -> ???
  • Post-CWA1.5 also 2 checks needed for getting a result -> ???
    Strange...

(@d4rken , please correct if I'm wrong - and thanks for answering to my question above!)

That sounds about right.

I've checked it the ENF submission time on a dev build (theoretically slower than prod) and it's about 600ms on a Moto G3 (LOS 7.1.2) for 15 EUR packages:

2020-11-05 16:07:31.960 32180-32197/de.rki.coronawarnapp.test D/RetrieveDiagnosisKeysTransaction: Attempting submission to ENF
2020-11-05 16:07:32.472 32180-32220/de.rki.coronawarnapp.test D/RetrieveDiagnosisKeysTransaction: Diagnosis Keys provided (success=true, token=39955808-a8b4-47d0-b9d4-db8ca93f7545)

Not really comparable with the prod environment, but as relative comparison, the ENF processed the test files in ~14 seconds:

2020-11-05 16:16:26.948 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: PrefixIdMap loaded 19 scan records
2020-11-05 16:16:26.948 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: matchingNative get 14 files
2020-11-05 16:16:26.948 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: Matching with /data/user/0/com.google.android.gms/app_en_diagnosis_keys/1604589386402/export0.bin
(...)
2020-11-05 16:16:40.171 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: Created iterator for /data/user/0/com.google.android.gms/app_en_diagnosis_keys/1604589386402/export13.bin (process_revised_keys = 1)
2020-11-05 16:16:40.208 13411-32255/com.google.android.gms.persistent I/ExposureNotificationJni: Matching done, total 139758 keys, no key matches

And ~550ms for 48 EUR hour packages, so it doesn't really go up with the amount of files we provide:

2020-11-05 16:12:48.596 600-904/de.rki.coronawarnapp.test D/RetrieveDiagnosisKeysTransaction: Attempting submission to ENF
2020-11-05 16:12:49.047 600-903/de.rki.coronawarnapp.test D/RetrieveDiagnosisKeysTransaction: Diagnosis Keys provided (success=true, token=5c4977e5-5611-4e8b-912e-8d2244fa7bb3)

As the ENF calculation does not happen synchronous and the submission is so quick, the timeout is most likely caused by the download, isn't it? So with the hotfix we increased the overall transaction timeout to 8 minutes, we can't go much higher because we only get a max of 10 minutes background processing time for our worker. So if the 8 minutes don't solve it for some devices, we could add some kind of mechanism that we do fewer downloads in parallel and also abort the downloads early enough such that we can continue with what we have :thinking:.

The 10 minutes should only count though if we are in the background, so maybe for problematic devices there could be a mechanism that removes the timeout when in foreground :thinking:.

Just another hint:
it seems that there is only a lack of data.
The test cellphone, switched off after installing and testing is still having this error.

And it is reproducable:
The app was working now at the test cellphone, reporting 2 days of collecting data.
to reproduce.

  1. remove the app
  2. install the app
  3. start the app
    the error is there again

With the update to 1.5.0 (and a microG update, also), the app has now been working for a few days for me now. And it also seems to have gotten much faster.

@vaubaehn just a small question:

You can easily verify it, when comparing the 'update times' in CWA and ENF history: CWA's timestamp is set when the keyfiles had been handed over to ENF. ENF's timestamp is set when matching finished. On my slow phone, difference is now 2 minutes.

Are you sure it's not the other way around?

My CWA status today:

My EN log entry today:
{"timestamp":"22. Oktober 2020, 03:02","keyCount":307304,"matchesCount":0,"appName":"Corona-Warn","hash":"oTbVuBbNBBnFzm25gcZH9lTOjOFRPEViypUZH7h9JzA="}

I.e. my CWA timestamp is one minute after my EN timestamp.

@daimpi

Hmm, that's interesting. Either you have been accelerated to more than speed of light while exposure matching was performed, or your phone's clock goes backwards in time for a different reason?
Here's mine:
Screenshot_2020-10-22-17-50-38
Screenshot_2020-10-22-17-50-23

@vaubaehn
Ah yes that was it! After dropping out of hyperspace time is correct now 😄.

Kidding aside: I just remembered that there was a GH issue on time discrepancy a while back and voila: https://github.com/corona-warn-app/cwa-app-android/issues/948#issuecomment-670088844

Maybe in my case it's a rounding error by ENF in the other direction? As long as the time difference doesn't exceed 1 min I won't be able to tell for sure I guess but it's also not that important tbh ^^

@d4rken thanks for providing this data.

Just for my understanding, when you say

I've checked it the ENF submission time on a dev build (theoretically slower than prod) and it's about 600ms on a Moto G3 (LOS 7.1.2) for 15 EUR packages:

you're using the CWA 1.5 approach which submits all DKs in one batch, right?

Regarding

As the ENF calculation does not happen synchronous and the submission is so quick, the timeout is most likely caused by the download, isn't it?

Does this mean it would maybe help to also increase the http timeout further, or is this something else? Iirc this timeout was increased from 10s to 20s in 206b181 (#1138) but maybe this could be increased further, would this maybe help?

you're using the CWA 1.5 approach which submits all DKs in one batch, right?

Yes.

Does this mean it would maybe help to also increase the http timeout further, or is this something else? Iirc this timeout was increased from 10s to 20s in 206b181 (#1138) but maybe this could be increased further, would this maybe help?

The timeout only comes into effect if the download actually times out. If we would run into that I'd say we wouldn't have an issue. If the download is not timing out though, so only very slow, then the task/transaction will just wait for the downloads to finish and if that takes 3 (or 8 minutes), eventually hit the top-level timeout and fail the whole thing.

@d4rken thanks for the explanation 🙂

So regarding the downloads: I tried to find a really big package and the one timestamped 2020-10-16 with almost 44k DKs was the largest I could find. But even this package only weights 750kB, so if we round this up to 1MB for overhead and multiply by 14, we only have 14MB. Even with a slow 1 Mbit/s connection this would take less than 2min to download and those numbers are already chosen very generously…

Are there really connections that are that much slower that would need more than 3 min?

But otoh we have this conversation between @kbobrowski and me from back in August:

One possibly important thing here is that CWA already operates with very thin timeout margin - without introducing any artificial delay and on the fast wifi network (100 Mbps) the whole process of fetching 14 diagnosis keys and submitting all of them to the API takes about 45 seconds on a slow Android 6 device, which is 75% of available timeout.

Wow… that's much longer than I'd have thought this process takes… I've always assumed that this is done in a few seconds max. I guess the limiting factor in this case is not the bandwidth but the (cryptography-) performance of the SoC, right?

What would speak for this hypothesis is the fact that the Galaxy A6 mentioned above is certainly not the most potent device… On which device did you observe the 45s sync time?

To which @kbobrowski replied:

@daimpi I guess slow device (Motorola G2) plays a role here, but it seems that just downloading keys takes significant portion of this timeout limit (if it has to download all 14 of them).

Similar issue reported in #1030 , if it is the same pattern as in my case (all 14 keys successfully submitted, but then CWA retried the same day), then I think timeout hypothesis would be less likely

So on a 100 Mbps connection downloading took a significant portion of the 60s timeout even though packages were quite a bit smaller back then… so maybe it's not the internet connection speed from the wall which is limiting but something else in the pipeline? I have no clue tbh 🤷.

So on a 100 Mbps connection downloading took a significant portion of the 60s timeout even though packages were quite a bit smaller back then… so maybe it's not the internet connection speed from the wall which is limiting but something else in the pipeline? I have no clue tbh shrug.

Connection speed can only be as fast as the device can write away the data.
Storage I/O is one of the biggest bottlenecks on Android devices, especially on budget devices.
Even worse on budget devices that can replace the smaller (but faster) internal storage, with a bigger (but slower) external sdcard.
Lot's of manufacturers skimp on good hardware there as it can't be used as effectively as screen specs or "the number of cores" when advertising a device :wink:.
It's often not even the raw file to storage speed, but parallel storage access that slows everything down to a crawl.

@d4rken I see, thanks.

It's often not even the raw file to storage speed, but parallel storage access that slows everything down to a crawl.

But we're talking about less than 14 MB here, can storage really become so slow that this could lead to a timeout? Oo

It's not even only download + write to storage, but also provideDiagnosisKeys call (which is extracting files, verifying signatures and headers and copying them to internal GMS database), not sure how long each of these steps take though

Do we have some way of benchmarking this, to narrow down the space of possibilities?

It would be possible to benchmark this with https://github.com/kbobrowski/en-i13n

For us could be possible, if devs allow timber to log also in production (even a lot of puppies will die then).

Today I run into the timeout....
Edit:
Here are the Screenshots and the exposure file
all-exposure-checks.txt
Screenshot_20201024-083756
Screenshot_20201024-083801
Today it was the first day that I open the APP later then 7 o'clock (the And I think I enable everything for the hintergrundaktualisierung but for the check I always has to open the APP and do it manually.

Edit 2:
Ehen I now Start the risk check, I run Into the 38905 error.

Maybe the title of this issue could be changed to accommodate the new timeout limit of 180s such that ppl with this error message can find it more easily?

One new report of this issue (under CWA 1.5) has recently been filed here: https://github.com/corona-warn-app/cwa-app-android/issues/998#issuecomment-715775412

@daimpi I've changed the title to reflect the increased timeout limits.


Corona-Warn-App Open Source Team

I don't und erstand it. Today there was no Problem with the risk check. Tomorrow I purpose to open the APP later and check, if the risk check run into the timeout again.
Uploading all-exposure-checks2610.txt…

@maethes does the problem reported in https://github.com/corona-warn-app/cwa-app-android/issues/998#issuecomment-715775412 still happen for you?

Hi @daimpi, yes, i am still receiving timeouts. I can provide the ENF log if necessary.
I realized that each check contains about 270.000 to 314.000 keys. Not sure if this is related to the fact that there are several people nearby using the app.
Please let me know if there is anything I can help with.
Thanks, Matthias

Also heute hat die APP selbstständig den Risikostatus aktualisiert, allerdings erst kurz vor 9:00 Uhr. Gestern und vor dem Wochenende habe ich den Risikostatus manuell aktualisiert zwischen 6:30 und 7:00 Uhr das hat auch Funktioniert. Nur am Wochenende bin ich bei dem Versuch den Risikostatus zu aktualisieren am timeout gescheitert, an beiden Tagen hatte ich nach 9:00 Uhr keine Internetverbindung (höchstens gedrosseltes mobiles Netz) mehr. An beiden Tagen bekam ich abends den Fehler 38905.
Warum ich am Wochenende den Timeout hatte, kann ich nicht verstehen.
Hier der Screenshot von heute, als sich der Risikostatus selbstständig aktualisiert hat:
Screenshot_20201027-092949
Und hier noch einmal die aktuellen ENF:
all-exposure-checks2710.txt

v 1.5.1 is available with increased timeout

Hi, after updating the app to 1.5.1 I received the 39508 error message. I will check again tomorrow morning and let you know the result.
Thanks, Matthias

Hi @maethes ,
can you do us a favor? Before you sleep tonight, plug your phone to the charger. Then, inside CWA, deactivate exposure logging. Have a good sleep ;)
Tomorrow morning, leave your phone connected to the charger. Open CWA and activate exposure logging.
The moment exposure logging is active again, CWA should begin to download data and after start exposure checking.
If there is still any problem (left), then an error message should appear (which would not be shown, if CWA would check for exposures automatically in the background during the night).
If there is any error, could you take a screenshot and post it here tomorrow? If there is no error, could you shortly report your success? Thanks in advance!

Hi @vaubaehn, yes, sure.
Thanks ;-), Matthias

@dukleinekkmaus

Thanks for the log. The problem with timeouts seems still to be present on your phone even on days where it ultimately didn't present an error. Just looking at the checks per day since CWA 1.5:

Date | No. of checks performed
-- | --
27. Okt | 14
26. Okt | 12
25. Okt | 20
24. Okt | 20
23. Okt | 12
22. Okt | 13
21. Okt | 12
20. Okt | 10

This should have been 1 check per day if everything was working correctly not 10+ ^^.

Maybe the even further increased timeout will help, could you update to CWA 1.5.1 and share your EN log with us again tomorrow?

Oh okay, that explains a lot. Yesterday I updated the APP. Today it was only one ENF and the risk check worked automaticly in the night ^^.
all-exposure-checks2810.txt

Good Morning! Huray, the app was able to update the logs succesfully today! ;-)

image

I activated exposure logging at about 6:34. The update has been completed at 6:41. I will check the next days and let you know if the problem occurs again.

Thank you so much for investigating! Please let me know if there is any additional information I can share with you.

Hi @maethes

I activated exposure logging at about 6:34. The update has been completed at 6:41. I will check the next days and let you know if the problem occurs again.

When you write, "update has been completed at 6:41", you mean the timestamp in ENF?

I mean, that's fine at first sight, and it looks like, in general your phone will be able to start exposure checks without problems, and that's fantastic!

but...

I activated exposure logging at about 6:34. The update has been completed at 6:41

That's 7 minutes! ... and increasing infection rate in Germany/Europe means increasing amount of exposure checks, means increasing amount of time, until the exposure check result can be transmitted from ENF to CWA.

@svengabr could you discuss with the devs, if there can be a problem, if one day the exposure check takes more than 10 minutes? If the Android system sets CWA to idle after 10 minutes, will CWA still be able to receive the ACTION_EXPOSURE_STATE_UPDATED and ACTION_EXPOSURE_NOT_FOUND intents, and be able to send a notification in case of increased risk? If not, we may run into trouble in some weeks for the older devices, if CWA is not opened regularly once per day...

Anyway, thanks for your report @maethes !

Hi @vaubaehn,
with "completed by 6:41" I meant the time when CWA was switching from "Updating..." to "Low risk". Anyway the timestamp is also 6:41.
Attached please find the screenshot of the google settings:

image

corresponding entry in all-exposure-checks.json:
{"timestamp":"28. Oktober 2020, 06:41","keyCount":268412,"matchesCount":0,"appName":"Corona-Warn","hash":"mToAiAZR7\/fWbZmTufwXarfS+Mooh2+ssaY1Qs1hNtc="}

@dukleinekkmaus thanks for the feedback. Looks like it's working fine now with the extended 8min timeout. Maybe you could keep checking your log from time to time and if multiple entries reappear for no apparent reason report back to us? 🙂

@maethes can you give us some details about your device?

@thomasaugsten see his original post here:

Device: Huawei Honor 6
Android 6
Corona-Warn Version 1.5.0

Hi @thomasaugsten,

Sure:
Device: Huawei Honor 6
Android 6.0 / EMUI 4.0
Corona-Warn Version 1.5.1 (since yesterday)

Prior to the timeout issue, I received mainly the 39508 error message. I changed following settings which finally made the status update of CWA possible.

  • Activated mobile data, WLAN and background update for CWA app
  • allowed background activity for the app
  • protected the app in power saving settings
  • activated prioritized background activity
  • changed power savings of my phone to full power
  • freed internal disk space
  • disabled Huawei Mobile Services in "apps having access to usage data" ("Apps mit Nutzungsdatenzugriff")
  • uninstalled system apps using following commands:
    adb shell pm uninstall -k --user 0 com.huawei.powergenie
    adb shell pm uninstall -k --user 0 com.huawei.android.hwaps

Let me know if you need further details.
Thanks, Matthias

@daimpi yes i will check the enf log every day. If there is something odd I will report it Here. Today the difference between the timestamps of the APP and the enf was 4 Minutes....

Happy to see that the timeout increase improved the overall situation, we will still continue to monitor this issue.

Dear community, do we have any new reports on the 9002 timeout issue?

Best wishes,
DS

Corona-Warn-App Open Source Team

Hi,

since update 5.0.1 the app is working fine on my device. In most cases the update is being executed overnight without a timeout, which I can see because there is only one entry in ENF log.

On some days I can see 2 entries in ENF log (for example at 01:34 AM and 01:50 AM). I think this should be fine because CWA shows a current timestamp and tells me everything is good.

I have exactly one day (November 14) showing 19 checks in ENF log from 01:42 AM to 08:50 AM. I think the last checks have been executed after starting the app showing the old update-date. After waiting a while I recognized it was able to update because it showed a current timestamp in CWA. The next days were working fine again.

In conclusion, I think there are still timeouts, however after automatically retrying the update works fine.

Thanks for your investigation,

Matthias

Hi @maethes,

Thank you for your feedback and good news. We still keep monitoring this issue though.

Best wishes,
DS


Corona-Warn-App Open Source Team

Dear community,

This issue will be closed now. Please re-open if any new problems related to this issue should occur. Thanks for contributing!

Best wishes,
DS


Corona-Warn-App Open Source Team

Was this page helpful?
0 / 5 - 0 ratings

Related issues

marceljay picture marceljay  ·  3Comments

michaelwingender picture michaelwingender  ·  3Comments

Diapolo picture Diapolo  ·  3Comments

Alestrix picture Alestrix  ·  3Comments

crcsn picture crcsn  ·  3Comments