Remark: Due to lack of time, I will keep this short and put in external references (Twitter, Writeup on Github)
Throughout the last days, I publicly discussed some technical issues which affect privacy and reliability of the CWA, but are mostly related to the Google-Apple EN API and vendor BLE stack implementation.
A written summary was published here: https://github.com/mame82/misc/blob/master/corona.pdf
Related Twitter threads (for reference):
There is no written document for observations according to Apple devices, but a short summary was posted in the following Twitter thread: https://twitter.com/mame82/status/1275079124898308101
My observation are based on a limited test set of devices (Android) and multiple public BLE scans (for Apple/Android devices).
The fact that most smartphones use a combined stack for Bluetooth LE and BR/EDR increases the chance of capturing valuable data (which could be correlated to EN advertisements based on RSSI curvature) from not only BLE advertising channels, but also from BR/EDR traffic.
One root cause of the issues is that TX power for ENs is not randomized, which likely is caused by hardware limitations or limits of respective APIs provided by the OS. Also, I am pretty sure that TX power for EN advertisements is set by Google-Apple EN API, not by the application (I was not able to verify this in the CWA source).
Internal Tracking ID: EXPOSUREAPP-1917
I shared a PoC video, which demos how more sensitive data advertised by a device with CWA installed, could be correlated to RPIs sent by the same device in automated fashion, just by passivly listening for BLE advertisements (using a different device).
The video is only available on Twitter: https://twitter.com/mame82/status/1276164514275360771
Hi @mame82 ,
Thanks a lot for your detailed description.
We will get back to you as soon as possible.
Best
MS
Hi @mame82,
Regarding your observation 2 and your statements that you couldn't find detailed information: Google has now released details about that, e.g. how they set the (fixed-per-device-type) TX Power value in AEM, based on calibration: https://developers.google.com/android/exposure-notifications/ble-attenuation-overview
Regarding your observation 1 (resolvable random private address) -
Do I understand correctly that an attacker would need the IRK to de-anonymize the device?
Regarding your observation 3 (observing additional BLE advertisements from same device) -
a) Do I understand correctly that an attacker needs to do frequent measurements near the device to "follow" a device across RPI changes?
b) Did you get personally identifiable data from the other device advertisements (those which could be correlated based on the RSSI curves)?
Dear @mame82 ,
first of all, thank you for compiling these issues. Let me give some remarks from the developer side.
As developers of the Corona-Warn-App, we are in constant exchange with the developers of the technologies that the app is based on - that, of course, includes security of the solution.
These technologies are amongst others the Exposure Notification API or Bluetooth and have been implemented by the respective vendors in their mobile operating systems Android and iOS.
Based on this exchange, for example with Google, we can say that the points addressed in your issue do not change Googles assessment that the Exposure NotificationAPI is more secure than alternative solutions and they have considered many sophisticated, hypothetically possible cyber attacks in its design.
Besides technical feasibility, other factors such as likelihood of success or required effort, need to be taken into account, as well. We'd like to point you to this Twitter thread, that focuses on exactly these factors.
We ask the community to report potential vulnerabilities in the underlying technologies, such as the Exposure Notification API, directly to the respective vendors to guarantee a direct flow of information and allow for timely fixes of potential issues.
Best regards,
TK
Corona-Warn-App Open Source Team
Regarding your observation 1 (resolvable random private address) -
Do I understand correctly that an attacker would need the IRK to de-anonymize the device?
The IRK would be required for a third party, to verify if a "resolvable random address" of a broadcaster is generated by this exact IRK (== generated by the device which handed out the IRK). The IRK is exchanged in pairing phase 3 (Key Distribution)
and stored persistently by involved peers (bonding). There are some remarkable facts about IRKs:
Do I understand correctly that an attacker needs to do frequent measurements near the device to "follow" a device across RPI changes?
It try to avoid the term "follow" as it implies that an attacker physically moves in relation to a device which should be tracked.
The idea is more about an attacker being able to reliably confirm that two adjacent RPIs originated from the same device, even if the random address has changed along with the RPI. To better explain this, let me take some assumptions:
RPI-A with a random address called Addr-ARPI-B, at the same time the random address is changed from Addr-A to Addr-BA simplistic (and unreliable way) would be to track RSSI values of RPI-A (sent from Addr-A) continuously until no more advertisements are received from Addr-A. If now beacons from Addr-B start to arrive at the same RSSI level where beacons from Addr-A ended (without much delay), there is a high probability that both beacon sequences originated from the same devices and thus RPI-A and RPI-B are related to the same devices.
Beside being unreliable, such an approach would require to listen for advertisements for a long period ("frequent measures").
So let me add in some additional assumptions:
META and the address used to advertise it Addr-CAddr-C is chosen to be random, it uses a rotation interval with offset (compared to the interval for Addr-A/Addr-B) or a longer interval. In other words: While Addr-A rotates to Addr-B, Addr-C remains unchangedMETA advertised from Addr-C remains constant (and unique), even if Addr-C is rotated to a new oneWith this assumptions, an attacker has to record advertisements up till a point where transmission from Addr-A could reliably be correlated to transmissions from Addr-C based on the curve progression of the respective RSSI graphs (only shape, not amplitude, as data transmission power used for Addr-A could differ from transmission powwer for Addr-C). The "capture duration" required in order to reliably conclude that two RSSI traces are related or unrelated, depends on various factors (amount of active broadcasters, beacon interval for individual advertisements, presence of meaningful variation in the RSSI traces etc.). For my tests (less than 50 broadcasters on air) a measurement duration of 15 to 30 seconds was sufficient to correlate RSSI traces (the "RSSI trace" screenshot in the writeup covers an scan interval of about 40 seconds, for example).
So at this point the attacker has associated Addr-A to Addr-C and thus RPI-A to META.
The attacker repeats this whole process, for example in intervals of 5 minutes, till she hits an interval where Addr-B gets reliably associated to Addr-C. Ultimately the attacker could conclude that Addr-A was rotated over to Addr-B (on the same device), because both addresses have been correlated to Addr-C. This also means RPI-A, RPI-B and META originated from the same device.
Even if Addr-C would have changed to Addr-D, if this has not happened at the same time where Addr-A changed to Addr-B, the related RSSI traces would overlap and allow continuous association of the transmissions to a single device (given that the attacker does RSSI correlation measurements in a continuous interval, which would have been 5 minutes in this example).
If the advertised data META would really be constant and unique or if it would not be advertised using a private address, an attacker would be able to correlate RPIs to a single device, even if it is not monitored continuously.
A serious privacy issue would arise if TKEs/Diagnosis Keys are released for RPIs which have been collected by an attacker and have been associated to META, if META contains personally identifiable data - which brings me to the final question:
b) Did you get personally identifiable data from the other device advertisements (those which could be correlated based on the RSSI curves)?
Keeping the example terminology up, the question could be rewritten as:
"Did you get personally identifiable data from META?"
The answer would be: It depends on what the user (or device without the user knowing about it) advertises in addition!
My test capabilities are too limited, to give non speculative answer. Also I like to avoid constructing unrealistic scenarios on possible content of advertisements. So let me summarize some facts:
In short words: I'm not able to provide a profound answer to this question, without doing large scale scans. Anyways, using an unresolvable address for Exposure Notifications and trying to randomize TX power (TX power has to be changed for each advertising interval) mitigates the covered issues
@mame82 Thanks a lot for your detailed explanation.
I tend to use the term "follow" on purpose, because receiving BLE beacons does require physical proximity. Of course it could also be achieved by multiple separate receivers.
When I try to make up my mind, how problematic this could be, I see mainly these two aspects:
META, I would attribute this problem to transmitting META, not to GAEN.I agree in most points mentioned in the comments.
There are reasons on why I opted for headings with the term "observation", instead of talking about (weighted) risks, severity, fixes or really naming root causes.
I not even made a recommendation about informing the users, because this could lead to unnecessary mistrust in an application which was build with privacy in mind (and IMO a decent job was done here).
Yet, I like to express my personal thoughts in technical terms:
If I'd be Google and would do a risk assessment of my own API (which clearly state that only unresolvable addresses shall be used for broadcasts), if this assessment has a positive result because of well written specs, but I would not enforce those specs on devices running my own OS ... I at least would call this lying to myself.
If I'd be Apple and would use unresolvable addresses in alignment to my own specs, but I'm constantly broadcasting data with a resolvable address which could be associated easily ... I would call it the same.
@mame82 Thanks a lot for your detailed explanation.
I tend to use the term "follow" on purpose, because receiving BLE beacons does require physical proximity. Of course it could also be achieved by multiple separate receivers.When I try to make up my mind, how problematic this could be, I see mainly these two aspects:
- If my device transmits personally identifiable data via
META, I would attribute this problem to transmittingMETA, not to GAEN.- Warning unknown other people about being infected includes making a trade-off: I want to warn them, but they might be somehow able to identify me, by various means, including the techniques you mentioned, or e.g. when I was the only person in front of a camera where beacons were recorded. I don't think it's possible to warn unknown other people, and get 100% guarantee of anonymity at the same time. There will always be scenarios how I could potentially be identified. I think I would still do it.
Agreed, but: If transmitted META could be associated to data transmitted by GAEN, because of the way transmissions are done (no random TX power) AND the G or the A of the abbreviation GAEN are in control of both (TX power for sending META and for sending EN), I would attribute it to GAEN.
When it comes to assessment of a system like GAEN, which states that unresolvable addresses shall be used for advertisements in its own specs, but the operational implementation disregards those specs ... well the assessment has either been done based on the design goals only OR there are two different results (high grade of privacy in theoretical design, lower grade of privacy 8n real world applications).