Cwa-documentation: [EN-Apple, EN-Google] Summary of technical issues affecting privacy and reliability (CWA for Android and iOS)

Created on 25 Jun 2020  路  8Comments  路  Source: corona-warn-app/cwa-documentation

Remark: Due to lack of time, I will keep this short and put in external references (Twitter, Writeup on Github)

Throughout the last days, I publicly discussed some technical issues which affect privacy and reliability of the CWA, but are mostly related to the Google-Apple EN API and vendor BLE stack implementation.

1. Issue summary Android

  • weak TX power for Exposure Notifications (EN) on Android raisea concerns on reliability (also, is not consistent to Apple devices, which use high power transmissions for EN advertisements)
  • usage of resolvable random addresses for EN beacons (instead of unresolvable random private addresses)
  • if the device in use advertises additional beacons, RSSI correlation allows to match those additional information to advertised Exposure Notifications. If both information are tracked, this could allow device de-anonymization (in worst case user de-anonymization). The effect would get even worse, if "diagnosis keys" would be released for tracked RPIs which have been "enriched" with correlated device data/user data.
  • RSSI correlation also allows to track match EN advisement before RPI and BLE address change to the ones used after the change (continuous tracking of RPIs originating from a single device)

A written summary was published here: https://github.com/mame82/misc/blob/master/corona.pdf

Related Twitter threads (for reference):

2. Issue summary Apple

  • issue of resolvable address was not observed on EN advertising Apple Devices, yet, additional advertising frames from the same device (f.e. beacons with manufacturer data or status beacons for nearby features) could use resolvable addresses. The probability to spot additional advertising frames from Apple devices (beside EN beacons) is high, compared to Android.
  • TC power for EN beacons on apple devices is high, which makes RSSI based correlation to additional device advertisements even easier. On the other hand, the "high signal strength approach" for EN notification seems to be more reliable (compared to Android, where I had issues to capture EN beacons from a distance > 1.5 meters LoS with no obstacles in between broadcaster and observer).
  • continuous tracking of devices after RPI- and Address-change should be possible based on RSSI correlation, again (no tests done yet).

There is no written document for observations according to Apple devices, but a short summary was posted in the following Twitter thread: https://twitter.com/mame82/status/1275079124898308101

3. Additional remarks

My observation are based on a limited test set of devices (Android) and multiple public BLE scans (for Apple/Android devices).
The fact that most smartphones use a combined stack for Bluetooth LE and BR/EDR increases the chance of capturing valuable data (which could be correlated to EN advertisements based on RSSI curvature) from not only BLE advertising channels, but also from BR/EDR traffic.
One root cause of the issues is that TX power for ENs is not randomized, which likely is caused by hardware limitations or limits of respective APIs provided by the OS. Also, I am pretty sure that TX power for EN advertisements is set by Google-Apple EN API, not by the application (I was not able to verify this in the CWA source).


Internal Tracking ID: EXPOSUREAPP-1917

mirrored-to-jira question

All 8 comments

I shared a PoC video, which demos how more sensitive data advertised by a device with CWA installed, could be correlated to RPIs sent by the same device in automated fashion, just by passivly listening for BLE advertisements (using a different device).

The video is only available on Twitter: https://twitter.com/mame82/status/1276164514275360771

Hi @mame82 ,
Thanks a lot for your detailed description.
We will get back to you as soon as possible.
Best
MS

Hi @mame82,
Regarding your observation 2 and your statements that you couldn't find detailed information: Google has now released details about that, e.g. how they set the (fixed-per-device-type) TX Power value in AEM, based on calibration: https://developers.google.com/android/exposure-notifications/ble-attenuation-overview

Regarding your observation 1 (resolvable random private address) -
Do I understand correctly that an attacker would need the IRK to de-anonymize the device?

Regarding your observation 3 (observing additional BLE advertisements from same device) -
a) Do I understand correctly that an attacker needs to do frequent measurements near the device to "follow" a device across RPI changes?
b) Did you get personally identifiable data from the other device advertisements (those which could be correlated based on the RSSI curves)?

Dear @mame82 ,

first of all, thank you for compiling these issues. Let me give some remarks from the developer side.

As developers of the Corona-Warn-App, we are in constant exchange with the developers of the technologies that the app is based on - that, of course, includes security of the solution.
These technologies are amongst others the Exposure Notification API or Bluetooth and have been implemented by the respective vendors in their mobile operating systems Android and iOS.

Based on this exchange, for example with Google, we can say that the points addressed in your issue do not change Googles assessment that the Exposure NotificationAPI is more secure than alternative solutions and they have considered many sophisticated, hypothetically possible cyber attacks in its design.

Besides technical feasibility, other factors such as likelihood of success or required effort, need to be taken into account, as well. We'd like to point you to this Twitter thread, that focuses on exactly these factors.

We ask the community to report potential vulnerabilities in the underlying technologies, such as the Exposure Notification API, directly to the respective vendors to guarantee a direct flow of information and allow for timely fixes of potential issues.

Best regards,
TK
Corona-Warn-App Open Source Team

Regarding your observation 1 (resolvable random private address) -
Do I understand correctly that an attacker would need the IRK to de-anonymize the device?

The IRK would be required for a third party, to verify if a "resolvable random address" of a broadcaster is generated by this exact IRK (== generated by the device which handed out the IRK). The IRK is exchanged in pairing phase 3 (Key Distribution)
and stored persistently by involved peers (bonding). There are some remarkable facts about IRKs:

  1. The IRK is unique per device. This means, even if CWA Exposure Notification advertisements are "not connectable", the addresses in use could still be resolved by other devices, if a trust relationship was established with a remote peer (f.e. for another service, which was/is connectable and used a resolvable address, too). This is because all "resolvable random addresses" generated by the device are derived from the same IRK.
  2. Even if the trust relationship which was established in the past (and involved exchange of the IRK) gets deleted on the device running CWA, this does not mean that the formerly trusted peer deletes the retrieved IRK. The formerly trusted device would still be able to resolve all addresses generated from the known IRK (== associated to the device which generates it).
  3. Along with the IRK, the Identity Address of a device could be exchanged in pairing phase 3. The Identity Address typically equals to the unique hardware address used by a device (either "public address" or "random static address"). Now for a mobile phone Bluetooth hardware is typically implemented as a combined solution, which not only supports BLE, but also Bluetooth BR/EDR. Because of this, chances are high that an exchanged Identity Address equals the address used for Bluetooth BR/EDR, too. If this is the case, a resolvable random address which could be matched to an IRK known by a 3rd party, would not only uniquely identify the device which generated the address - the 3rd party would also be able to associate the Bluetooth address in use for BR/EDR. The Bluetooth BR/EDR address, again, could be used to actively track a device (for example by sending L2CAP echo requests and reading connection RSSIs).

Do I understand correctly that an attacker needs to do frequent measurements near the device to "follow" a device across RPI changes?

It try to avoid the term "follow" as it implies that an attacker physically moves in relation to a device which should be tracked.
The idea is more about an attacker being able to reliably confirm that two adjacent RPIs originated from the same device, even if the random address has changed along with the RPI. To better explain this, let me take some assumptions:

  • the device advertises exposure notifications called RPI-A with a random address called Addr-A
  • after a while the device changes RPIs to RPI-B, at the same time the random address is changed from Addr-A to Addr-B

A simplistic (and unreliable way) would be to track RSSI values of RPI-A (sent from Addr-A) continuously until no more advertisements are received from Addr-A. If now beacons from Addr-B start to arrive at the same RSSI level where beacons from Addr-A ended (without much delay), there is a high probability that both beacon sequences originated from the same devices and thus RPI-A and RPI-B are related to the same devices.

Beside being unreliable, such an approach would require to listen for advertisements for a long period ("frequent measures").
So let me add in some additional assumptions:

  • the same device continuously sends additional advertisements for a different service (f.e. Manufacturer data on Apple devices), let's call this data META and the address used to advertise it Addr-C
  • Even if Addr-C is chosen to be random, it uses a rotation interval with offset (compared to the interval for Addr-A/Addr-B) or a longer interval. In other words: While Addr-A rotates to Addr-B, Addr-C remains unchanged
  • the payload META advertised from Addr-C remains constant (and unique), even if Addr-C is rotated to a new one

With this assumptions, an attacker has to record advertisements up till a point where transmission from Addr-A could reliably be correlated to transmissions from Addr-C based on the curve progression of the respective RSSI graphs (only shape, not amplitude, as data transmission power used for Addr-A could differ from transmission powwer for Addr-C). The "capture duration" required in order to reliably conclude that two RSSI traces are related or unrelated, depends on various factors (amount of active broadcasters, beacon interval for individual advertisements, presence of meaningful variation in the RSSI traces etc.). For my tests (less than 50 broadcasters on air) a measurement duration of 15 to 30 seconds was sufficient to correlate RSSI traces (the "RSSI trace" screenshot in the writeup covers an scan interval of about 40 seconds, for example).
So at this point the attacker has associated Addr-A to Addr-C and thus RPI-A to META.
The attacker repeats this whole process, for example in intervals of 5 minutes, till she hits an interval where Addr-B gets reliably associated to Addr-C. Ultimately the attacker could conclude that Addr-A was rotated over to Addr-B (on the same device), because both addresses have been correlated to Addr-C. This also means RPI-A, RPI-B and META originated from the same device.

Even if Addr-C would have changed to Addr-D, if this has not happened at the same time where Addr-A changed to Addr-B, the related RSSI traces would overlap and allow continuous association of the transmissions to a single device (given that the attacker does RSSI correlation measurements in a continuous interval, which would have been 5 minutes in this example).

If the advertised data META would really be constant and unique or if it would not be advertised using a private address, an attacker would be able to correlate RPIs to a single device, even if it is not monitored continuously.

A serious privacy issue would arise if TKEs/Diagnosis Keys are released for RPIs which have been collected by an attacker and have been associated to META, if META contains personally identifiable data - which brings me to the final question:

b) Did you get personally identifiable data from the other device advertisements (those which could be correlated based on the RSSI curves)?

Keeping the example terminology up, the question could be rewritten as:

"Did you get personally identifiable data from META?"

The answer would be: It depends on what the user (or device without the user knowing about it) advertises in addition!

My test capabilities are too limited, to give non speculative answer. Also I like to avoid constructing unrealistic scenarios on possible content of advertisements. So let me summarize some facts:

  • Apple devices mostly seem to advertise manufacturer data in addition which obviously allows to determine some Information
  • Apple devices could advertise additional status data, which could help to plant AWDL based attacks (see here for an example of additional advertisements: https://twitter.com/mame82/status/1275082514470559747, here for related scanner/PoC attacks: https://github.com/hexway/apple_bleee)
  • Additional advertisements usually contain an unencrypted "TX power" value. If they are received with the exact same RSSI level as the Exposure Notifications, it could be assumed that TX power for Exposure Notifications is the same (a value which is otherwise only transmitted in Associated Encrypted Metadata).
  • If custom beacons contain a field for the full or partial device name, this could reveal additional information (f.e. "Jon Doe's iPhone")
  • If Exposure Notifications could be correlated to a puplic or random static address, this could allow attacks directed towards the BR/EDR stack

In short words: I'm not able to provide a profound answer to this question, without doing large scale scans. Anyways, using an unresolvable address for Exposure Notifications and trying to randomize TX power (TX power has to be changed for each advertising interval) mitigates the covered issues

@mame82 Thanks a lot for your detailed explanation.
I tend to use the term "follow" on purpose, because receiving BLE beacons does require physical proximity. Of course it could also be achieved by multiple separate receivers.

When I try to make up my mind, how problematic this could be, I see mainly these two aspects:

  • If my device transmits personally identifiable data via META, I would attribute this problem to transmitting META, not to GAEN.
  • Warning unknown other people about being infected includes making a trade-off: I want to warn them, but they might be somehow able to identify me, by various means, including the techniques you mentioned, or e.g. when I was the only person in front of a camera where beacons were recorded. I don't think it's possible to warn unknown other people, and get 100% guarantee of anonymity at the same time. There will always be scenarios how I could potentially be identified. I think I would still do it.

I agree in most points mentioned in the comments.

There are reasons on why I opted for headings with the term "observation", instead of talking about (weighted) risks, severity, fixes or really naming root causes.

I not even made a recommendation about informing the users, because this could lead to unnecessary mistrust in an application which was build with privacy in mind (and IMO a decent job was done here).

Yet, I like to express my personal thoughts in technical terms:

If I'd be Google and would do a risk assessment of my own API (which clearly state that only unresolvable addresses shall be used for broadcasts), if this assessment has a positive result because of well written specs, but I would not enforce those specs on devices running my own OS ... I at least would call this lying to myself.

If I'd be Apple and would use unresolvable addresses in alignment to my own specs, but I'm constantly broadcasting data with a resolvable address which could be associated easily ... I would call it the same.

@mame82 Thanks a lot for your detailed explanation.
I tend to use the term "follow" on purpose, because receiving BLE beacons does require physical proximity. Of course it could also be achieved by multiple separate receivers.

When I try to make up my mind, how problematic this could be, I see mainly these two aspects:

  • If my device transmits personally identifiable data via META, I would attribute this problem to transmitting META, not to GAEN.
  • Warning unknown other people about being infected includes making a trade-off: I want to warn them, but they might be somehow able to identify me, by various means, including the techniques you mentioned, or e.g. when I was the only person in front of a camera where beacons were recorded. I don't think it's possible to warn unknown other people, and get 100% guarantee of anonymity at the same time. There will always be scenarios how I could potentially be identified. I think I would still do it.

Agreed, but: If transmitted META could be associated to data transmitted by GAEN, because of the way transmissions are done (no random TX power) AND the G or the A of the abbreviation GAEN are in control of both (TX power for sending META and for sending EN), I would attribute it to GAEN.

When it comes to assessment of a system like GAEN, which states that unresolvable addresses shall be used for advertisements in its own specs, but the operational implementation disregards those specs ... well the assessment has either been done based on the design goals only OR there are two different results (high grade of privacy in theoretical design, lower grade of privacy 8n real world applications).

Was this page helpful?
0 / 5 - 0 ratings