Synapse: /sync API does not tell clients when state changes due to state resolution cause their view to diverge from the HS's

Created on 13 Dec 2017 · 12Comments · Source: matrix-org/synapse

If the state graph forks and then merges again, some state events may be reset if they are in conflict. For instance:

alice sets topic1
 |
 |---------------------------.
 |                           |
 v                           v
alice sets topic2       bob kicks alice
 |                           |
 |     .---------------------'
 v     v
bob says something

The fact the bob kicked alice on one branch of the DAG may arrive late, causing alice's topic2 to be considered as reverted for the room's state going forwards. We currently do not tell clients via /sync that this has happened however, so the client's view of the room state will drift until they entirely resync (e.g. by flushing your cache in Riot).

Instead, we need a way of re-issuing the 'topic1' event into the state block in the /sync response to tell clients about the reset. This is potentially problematic as state changes like this may happen at any point in the timeline, whereas we only snapshot state at the beginning of a sync response. (However, if we include inlined state changes in the timeline at the 'right' point, perhaps this is enough?)

bug

Source

ara4n

👍1

All 12 comments

You're not using the term "reset" in the same way as we have been doing elsewhere.

We've been using "state reset" to refer to https://github.com/matrix-org/synapse/issues/1953, which is normally where an attempt to resolve a recent room state against a quite old one causes a whole cascade of changes to the current room state with the net effect that much (but not all!) of the room state ends up back where it was a few months ago.

So a state reset is one reason that clients can get out of sync, but in general any state conflict resolution (such as the one in your diagram above) can result in a change in room state which we don't currently tell clients about.

richvdh on 13 Dec 2017

this is a spec issue really; moving it over.

richvdh on 29 Jan 2019

oh bother. It's really not.

richvdh on 29 Jan 2019

why not? i don't think the spec says how to handle this?

ara4n on 12 Jan 2020

The API for /sync provides a mechanism for resolving this, as it segregates state and timeline. Things in timeline appear in the timeline and things in state are room state deltas.

AFAIK it should be possible for a server to say "whoops, this state is wrong, and this state is right" and put it under the state key of the room and have it work transparently to the client. Assuming this is the approach we are taking, this is a synapse bug and not a spec bug, though definitely should be clarified somewhere (do we even have any places where we explain this issue anywhere in the spec?).

https://matrix.org/docs/spec/client_server/r0.6.0#get-matrix-client-r0-sync

Kegsay on 30 Mar 2020

This is overall pretty catastrophic from a UX perspective when this happens, and makes me feel Matrix is super flakey.

I encountered this from a state reset in a v1 room, which meant I was kicked but it was still present as joined in riot-web, and I couldn't leave it because I wasn't in the room :/ but this issue is applicable for any kind of fork in room state and so we will likely keep hitting edge cases manifesting as crap UX until we fix this.

Kegsay on 30 Mar 2020

👍1

@Kegsay Related, in that case: #1597

joepie91 on 30 Mar 2020

AFAIK it should be possible for a server to say "whoops, this state is wrong, and this state is right" and put it under the state key of the room and have it work transparently to the client.

Except that entries in state are supposed to be real events, with real event ids. If you're trying to tell a client that you're backing out an earlier bit of state (in the example in the description, we need to back out the topic change), you'd have to do so by making up events. That might not be a crazy thing to do, but it's certainly not an obvious thing to do.

(do we even have any places where we explain this issue anywhere in the spec?)

not that I'm aware of, no.

richvdh on 30 Mar 2020

@joepie91 thanks for this. It really depends on how we wish to communicate state conflicts to clients. We probably don't want to flood the timeline with state changes because the state res algo decided to kick in and pretend a bunch of previously sent state is no longer valid. This is why I'm suggesting it should reside in state.

1597 is mainly complaining that the spec says this should be impossible. It might be worth killilng two birds with one stone and saying "hey, under these circumstances, you may get them to appear twice (byte-for-byte identical)" so clients have some way of realising that something weird is going on.

Kegsay on 30 Mar 2020

you'd have to do so by making up events.

I would propose we just used the previous state which we already track under unsigned.prev_content. In the event there is no previous state, a redacted content {} for that state. That is to say, if there is a previous event use it, else yes, we'd need to make one up with a redacted content.

Kegsay on 30 Mar 2020

the proposal sounds reasonable, but it's obscure enough that I think if we're going to do it, it needs to end up in the spec.

richvdh on 30 Mar 2020

👍1

incidentally, there is another glitch here. The timeline can consist of up to ten events; the state reset can occur anywhere in that series of events; by sticking it in state we imply that the state changes at the start of the timeline, so the timeline events may not be shown correctly.

It may be close enough, or we may need to work around it by (say) forcing a gappy sync.

richvdh on 30 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings