Element-web: "Unable to decrypt: The sender's device has not sent us the keys for this message." (The UISI bug)

Created on 19 Jan 2017 · 13Comments · Source: vector-im/element-web

This message (or, less often, the closely related "OLM.UNKNOWN_MESSAGE_INDEX") can be caused by a number of things. This bug serves as a reference to the reasons we know about.

You logged into your device, or joined the room, after the message was sent. This is sort-of by design, though #2258, #2286 and #2713 are relevant.
The sender has blacklisted your device. This is definitely by design, though see #3831 for improving visibility on this.
The keys haven't arrived yet. Patience you must have.

Client-specific bugs:

~~#2325~~ [olm is racy when riot-web is used in multiple tabs]
#3309 (also on android/ios) [We can still throw away one-time keys which have messages in-flight]
#2783 (also on android/ios) [prioritise established olm sessions over half-open ones]
#3231 (also on android/ios) [different devices with the same curve25519 key confuse the olm layer]
https://github.com/vector-im/riot-web/issues/4216 / https://github.com/vector-im/riot-android/issues/1289 / https://github.com/vector-im/riot-ios/issues/1256 [We throw away all our one-time-keys and replace them with new ones]
~~#4983~~ / matrix-org/matrix-ios-sdk/issues/340 / vector-im/riot-android#1603 [yet another device list sync bug]
~~#5001~~ (fixed riot-web 0.12.5) [race in decryption]
~~#2672~~ / ~~#2305~~ / ~~#3796~~ / ~~vector-im/riot-ios#955~~ (fixed riot-ios 0.5.0) / ~~vector-im/riot-android#863~~ (fixed riot-android 0.6.10) [racy device list sync]
~~#2782~~ [process incoming messages before uploading new keys]
~~#2273~~
~~#2562~~
~vector-im/riot-android#799~ (fixed, but you may still have broken Olm sessions because of it)
~vector-im/riot-android#1209~ (fixed in riot-android 0.6.10, but you may still have broken Olm sessions because of it)

Protocol/server things:

#3187
matrix-org/synapse#2165
#3754
#3822
#3868
#3846 [sometimes the sending device doesn't seem to know about some users in the room] (hopefully will be fixed by state resets)?
~vector-im/riot-android#1208~ (fixed in synapse v0.21.0-rc3)
~matrix-org/synapse#2418 (joins can seed from servers with stale state)~
#6989 Device lists can get out of sync

Unexplained:

~~#2711~~

bug meta p1 critical e2e

Source

richvdh

👍28

Most helpful comment

For anyone reading this looking for a workaround, the best advice currently is that if you suddenly find yourself unable to decrypt messages from someone, ask them to open up your contact details on their client. This should force their Riot to resync its copy of your device list, increasing the chance that the next message sent will actually be encrypted for your current device.

If this doesn't work, you have no choice but create a new room (or worst case, export your session keys, logout, login and import your session keys), until the workaround in #3553 is implemented (or this meta-bug is finally closed up)

ara4n on 2 Apr 2017

👍3

All 13 comments

ara4n on 2 Apr 2017

👍3

Making forward progress on this requires either @richvdh to hop back into it, or for him to hand over to somebody else.

lampholder on 3 Apr 2017

I just spent a while reviewing all of the known remaining UISI causes with @richvdh. UISI bugs fall into two broad categories:

Wedged olm sessions:

#3309: Olm sessions can get wedged if we throw away OTKs for messages in-flight
#2783: We can reduce the risk of #3309 by prioritising established olm sessions over half-open ones
~#2325: Olm sessions corrupt & wedge if you open the same Riot in multiple tabs~
#3231: If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT<->RIOT.
- #3822: If users restore a device backup or tab whilst the existing device is still around and active, Olm will wedge.

Missing megolm keys:

matrix-org/synapse#2165: Caused if federation is broken when you login, so remote HS don't get told about your new device.
- There is no solution to this; if alice doesn't know bob's new device exists when encrypting messages for him, he'll never be able to decrypt them.
#3754: Caused by your HS federation being broken when someone starts a new megolm session to you, but you then receive messages from other HSes before you receive your megolm keys
- We could query the origin HS (if it's available again) for the keys, to speed up the recovery process.
#3187: You ran out of Olm OTKs, so can't start a new Olm session, and thus share Megolm keys (or do anything else)
~~#3796~~: If Alice adds a new device whilst we're downloading her old device list, we may not spot the new device.
#3825: If Alice and Bob only ever use Riot in short-lived incognito windows, they may never successfully exchange megolm keys
...possibly other bugs where the act of loading the target's MemberInfo kicks the sender into refreshing their device list?

Rich estimates UISIs to roughly be caused 50/50 split between the two.

However, all of the 'missing megolm key' class of bugs can be worked around by giving users a way of recovering missing megolm keys - and in some cases (broken federation; matrix-org/synapse#2165) this is the only plausible solution. In turn, if we had a way of recovering missing megolm keys, we'd also have a way to share history to new devices - the infamous 'share history' bug (#2286).

Therefore the suggestion is to focus entirely[1] on solving the problem of sharing megolm keys, given the value of solving both the 'missing megolm key' bugs as well as the 'sharing history' feature is greater than the value of solving the individual bitty 'wedged olm session' bugs (which all have different solutions). This means setting aside UISI hell and forging ahead and solving history sharing (#2286) or at least a subset of it. This could well include improving the UX for sharing history by supporting cross-signed keys (#2714).

[1] We can probably progress the 'multiple tabs' problem (#2325) in parallel. And the plan is to finish the devicelist race #3796 first.

ara4n on 4 May 2017

3231: If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT.

Well, it is a bug for riot, in that if anyone uses the js-sdk in the obvious manner, riot fails to talk e2e with the resultant client. It's arguable whose fault that is - ideally both ends would be fixed. But it's not a bug for riot inasmuchas it doesn't affect riot<->riot comms.

...possibly other bugs where the act of loading the target's MemberInfo kicks the sender into refreshing their device list?

AFAIK the only thing that loading the MemberInfo would solve these days is #3796.

richvdh on 5 May 2017

writing https://github.com/vector-im/riot-web/issues/2286#issuecomment-299605919 made me realise that perhaps we can also improve the experience here in general with better error messages. For instance, do we have any way of detecting when an Olm session has got wedged, such that we can complain about that (and perhaps reset it?) rather than just whine about missing megolm keys?

ara4n on 6 May 2017

Yes, we can certainly improve this. We can give the user feedback about failing to decrypt to_device messages (though they tend to get replayed at initial sync, so we'd have to think how to avoid false positives). https://github.com/vector-im/riot-android/issues/800 randomly, covers that. If we can get it reliable, we can start a new Olm session to try and unwedge things. We can also consider giving better feedback from the sender's end (https://github.com/vector-im/riot-web/issues/2494).

In general it's hard to tell the cause of any particular UISI, because you can't correlate it to a to_device you couldn't decrypt.

richvdh on 6 May 2017

If users restore a device backup or tab whilst the existing device is still around and active, Olm will wedge.
If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT<->RIOT.

While it is possible at the moment I don't really like the idea of the same keys on multiple devices (this is surely a security risk). Notably also device_keys changing while device_id remains the same is another way to wedge olm sessions (the client sharing the megolm session key thinks they sent keys to the associated device, but that device has new keys and cannot decrypt the olm session).

pik on 6 May 2017

@pik: please take your questions about #3231 and #3822 to the relevant bugs.

Another thing we should consider on this bug is a way for the recipient to distinguish "the sender failed to send to you" vs "the sender chose not to send to you" (either they blocked you or your device explicitly, or because they had the 'Never send encrypted messages to unverified devices from this device' setting (https://github.com/matrix-org/matrix-js-sdk/pull/336) checked).

basically we should be able to distinguish between "it went a bit wrong" and "expected behaviour".

Of course that would probably mean the sender sending a "you're blocked' notification to the recipient, but that wouldn't be hard.

richvdh on 8 May 2017

👍2

Added #3845 for the "blocked vs failure" sidebar

richvdh on 8 May 2017

The workaround 'opening the contact details' did not work for us. I can read all messages from the mobile of my colleague but not from Linux Riot. We found a workaround: my colleague closed Riot, deleted the config folder ~/.config/Riot and started the app again; now his device showed up as unverified but I could read the messages again; after verification, we get the green lock and everything works as expected.

ubmarco on 24 Apr 2018

I had problems with my Phone Key after restoring a TWRP Backup.
I couln't read my phone on the same account with my web client on the desktop. With exactly this error: "Unable to decrypt: The sender's device has not sent us the keys for this message."
After Bugrequest I got following Fault + Solution:
Fault: "Looks like by restoring your phone from backup its crypto state got completely hosed and out of sync with the server."

Solution that worked: Export my room keys, logging out & in again and reimporting them on the telephone.

Phone will get new key after log out and new login.

Twitter Thread:
https://twitter.com/Th3PeKo/status/988821611313758208

Hetti on 24 Apr 2018

For anyone reading this looking for a workaround, the best advice currently is that if you suddenly find yourself unable to decrypt messages from someone, ask them to open up your contact details on their client. This should force their Riot to resync its copy of your device list, increasing the chance that the next message sent will actually be encrypted for your current device.

If this doesn't work, you have no choice but create a new room (or worst case, export your session keys, logout, login and import your session keys), until the workaround in #3553 is implemented (or this meta-bug is finally closed up)

None of this works forme me:( Creating new room, deleting the user from device and resigning in, trusting all the users oif the room from scratch etc does not work. And I have this issue with only one user that I interact with. The other user is using Android (Fdroid) and Linux versions.

I am wondering if anyone else was completely able to get rid of this issue without deleting the user from Synapse.