Cwa-app-ios: Timeout for risk check seems to behave inconsistently (simulator at v1.11.0)

Created on 8 Feb 2021 · 5Comments · Source: corona-warn-app/cwa-app-ios

Avoid duplicates

[ ] Question is not already answered in the FAQ
[ ] Question has not already been asked in another issue

Your Question

On the simulator at tag: v1.11.0, I have observed some behavior that seems unexpected at the first glance.
I am currently investigating in more detail. I plan to update this ticket later with more details and more specific questions.

Note: I have split this ticket from #1918.

Technical details

Device name: XCode Simulator (for iPhone 8)
iOS Version: XCode Simulator (for iOS 14.3)
App Version: at tag: v1.11.0

Summary

I have checked-out the cwa-app-ios code at tag "v1.11.0", compiled it for ENACommunity, and now run it on a XCode Simulator for iPhone 8 / iOS 14.3.
I have setup several breakpoints and am watching the screen of the simulation and the debug messages.
I see the following:

(Scenario A) When I execute the risk check without artificial delays, it downloads the files, then starts the exposure detection in the MockExposureManager. After 8 minutes it triggers a timeout and aborts with failure.
(Scenario B) Now I setup a breakpoint and delay part of the download. The timeout triggers. Still it starts the exposure detection in the MockExposureManager (which is then covered by no timeout and runs forever).

Steps to reproduce / Scenario (A)

Checkout code at tag "v1.11.0"
compile for ENACommunity
setup breakpoints as follows:
- (a) MockExposureManager line 63 (func detectExposures)
- (b) ExposureDetectionExecutor line 74 (the completion handler)
- (c) RiskProvider line 169 (the timeout handler)
run the simulation and watch breakpoints:
- hit breakpoint (a) at MockExposureManager.detectExposures
- continue execution
- see simulation screen with message "Check is running ..."
- wait 8 minutes
- hit breakpoint (c) at timeout handler
- continue execution
- see simulation screen with message "Exposure check failed"

Steps to reproduce / Scenario (B)

as above: checkout code at tag "v1.11.0" and compile for ENACommunity
setup breakpoints (a), (b), (c) as above, plus:
- (d) RiskProvider line 188 (during package download)
run the simulation and watch breakpoints:
- hit breakpoint (d) during package download
- wait more than 8 minutes at the breakpoint, then continue execution
- hit breakpoint (c) at timeout handler
- continue execution
- hit breakpoint (a) at MockExposureManager.detectExposures
- continue execution
- see simulation screen with message "Check is running ..."
- wait forever, no timeout

Notes

The different behaviour with regard to timeout in scenario (A) and (B) is unexpected at the first glance, but may have good reasons.
I am currently investigating it in more detail, and I plan to update this ticket later with more specific questions.

Internal tracking ID: EXPOSUREAPP-5053

mirrored-to-jira question

Source

ndegendogo

👍1

Most helpful comment

@ndegendogo
Since we are moving fast in this Project with deep changes in nearly every release I would recommend that you focus on the current version 1.13 or 1.14 at the time of writing.

marcussc on 10 Feb 2021

👍2

All 5 comments

@ndegendogo
Since we are moving fast in this Project with deep changes in nearly every release I would recommend that you focus on the current version 1.13 or 1.14 at the time of writing.

marcussc on 10 Feb 2021

👍2

@marcussc sure.

I am currently analysing it in more detail at a 1.13.x intermediate version (a few days old), although I must confess that during this analysis I don't update my code base daily :-)
I will probably rewrite the text of this ticket completely as soon as I have a clearer picture. (My current feeling is that the behaviour is not so bad, but the implementation is quite hard to read / to understand / and to maintain, and has some unexpected aspects ... and that I am not the first and only person who is - well, surprised ...)

Note: If you prefer that I close this ticket meanwhile and open a new ticket later - no problem.

ndegendogo on 10 Feb 2021

👍1

@ndegendogo

I am currently investigating it in more detail, and I plan to update this ticket later with more specific questions.

Developer is asking if there are any updates from your side?

dsarkar on 2 Mar 2021

@dsarkar thanks for the reminder!

I am quite busy these days - we have an upcoming release for an important customer, and you can imagine that these are crazy days .... Still I think it is a good idea that I share my insights, questions, and thoughts on this so far.
Note that my findings may be outdated; I analysed at some intermediate version on branch 1.12.x / 1.13.x and did not find time to compare my results with your latest source ...

I found some surprising behaviour in a timeout scenario during the risk procedure. In this scenario the timeout occurs during the downloading of key files from the server, but the server response is not missing completely, it is just late.

You have already implemented a unit test for this scenario. And this comment indicates that I am not the first and only who is surprised 😅

The "normal" transition path for the activity state is: idle => riskRequested => downloading => detecting => idle
But in this timeout scenario the path is: idle => riskRequested => downloading => idle => detecting => idle which is a bit confusing and unexpected.

The implementation is a deeply nested sequence of completion handlers. The timeout triggers the error path, but does not abort this sequence.

Questions / discussion of error strategy:

1) Definitely the communication with the server needs a timeout.
But what is the intended behaviour / strategy after timeout?

download of hourly files: I think here you have already implemented a fault-tolerant strategy (use whatever files are there).
download of daily files: what about a similar fault-tolerant strategy here?
Or would a check of an incomplete set of daily files lead to a wrong result in the risk check?

An alternative strategy could be:
Let the timeout span only the communication with the server.
After all files are received (or after the timeout) perform the risk check; provided there is at least one new file that was not checked before.

2) Do the calls to ENManager API need a timeout?
I did not see such a requirement in the API documentation. There is a completion for error handling, maybe they use it also for timeouts. I have seen ENF log files with funny timing patterns, like: several files in a fast sequence; then a gap of 4 hours, then it continued to check, then again a gap ... it looked like an aggressive power saving mode.
But of course, you tested a lot, you have experienced the behaviour of several iOS versions, and you had many discussions with Apple; so I guess you have reasons to implement a timeout here.
Still the timeout for ENF API could be separated from the timeout towards the server.

ndegendogo on 3 Mar 2021

👍1

@ndegendogo Thank you, will forward the info to the dev team!

dsarkar on 3 Mar 2021

👍1

Was this page helpful?

0 / 5 - 0 ratings