Synapse: state resolution failing with `KeyError: '$S38_aRG4Qz8eIdrTyzJuQdH3Ieus1Y1dhczoFKnOB7M'`

Created on 30 Dec 2019  路  8Comments  路  Source: matrix-org/synapse

Description

My client is getting continuous 500's from my synchrotrons due to a KeyError from _get_event, and thus I'm unable to sync. Restarting does not help:

2019-12-30 17:33:07,540 - synapse.http.server - 109 - ERROR - GET-18029- Failed handle request via 'SyncRestServlet': <SynapseRequest at 0x7f272183c5f8 method='GET' uri='/_matrix/client/r0/sync?filter=7&timeout=...
Traceback (most recent call last):
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: {}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: {}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: {}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: []

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: {}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/http/server.py", line 77, in wrapped_request_handler
    await h(self, request)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/http/server.py", line 326, in _async_render
    callback_return = await callback_return
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/rest/client/v2_alpha/sync.py", line 178, in on_GET
    full_state=full_state,
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/handlers/sync.py", line 263, in wait_for_sync_for_user
    full_state,
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/handlers/sync.py", line 285, in _wait_for_sync_for_user
    sync_config, since_token, full_state=full_state
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/handlers/sync.py", line 956, in generate_sync_result
    sync_result_builder, account_data_by_room
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/handlers/sync.py", line 1367, in _generate_sync_entry_for_rooms
    await concurrently_execute(handle_room_entries, room_entries, 10)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/util/async_helpers.py", line 155, in _concurrently_execute_inner
    await maybe_awaitable(func(next(it)))
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/handlers/sync.py", line 1762, in _generate_room_entry
    newly_joined_room=newly_joined,
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/handlers/sync.py", line 413, in _load_filtered_recents
    current_state_ids = await self.state.get_current_state_ids(room_id)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/state/__init__.py", line 171, in get_current_state_ids
    ret = yield self.resolve_state_groups_for_events(room_id, latest_event_ids)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/util/metrics.py", line 82, in measured_func
    r = yield func(self, *args, **kwargs)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/state/__init__.py", line 402, in resolve_state_groups_for_events
    state_res_store=StateResolutionStore(self.store),
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/state/__init__.py", line 525, in resolve_state_groups
    state_res_store=state_res_store,
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/state/v2.py", line 116, in resolve_events_with_store
    room_id, power_events, event_map, state_res_store, full_conflicted_set
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/state/v2.py", line 364, in _reverse_topological_power_sort
    room_id, event_id, event_map, state_res_store
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/state/v2.py", line 186, in _get_power_level_for_sender
    aev = yield _get_event(room_id, aid, event_map, state_res_store)
  File "/home/synapse/env/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/synapse/env/lib/python3.6/site-packages/synapse/state/v2.py", line 535, in _get_event
    event = event_map[event_id]
KeyError: '$S38_aRG4Qz8eIdrTyzJuQdH3Ieus1Y1dhczoFKnOB7M'
2019-12-30 17:33:07,544 - synapse.access.http.9183 - 302 - INFO - GET-18029- 127.0.0.1 - 9183 - {@andrewm:amorgan.xyz} Processed request: 16.942sec/0.001sec (6.428sec, 0.142sec) (9.334sec/43.300sec/259) 55B 500

The event ID is $S38_aRG4Qz8eIdrTyzJuQdH3Ieus1Y1dhczoFKnOB7M, in room !BAXLHOFjvDKUeLafmO:matrix.org (IRC Matrix bridges). I don't see any event with that ID in my events table, nor can I see it in the content of any event in my db.

Version information

Synapse v1.7.2

Most helpful comment

for those encountering this issue: https://github.com/matrix-org/synapse/commit/b3d344c0dab33f695864c52d8ceee9bd02928d53 is a patch to work around it; I'm continuing to investigate what has happened.

All 8 comments

Likewise for t2bot.io

Riot can no longer sync with my server; the event ID is the same. I'm not sure the room belongs to it, but it does appear in the logs preceding each error:

synapse.state: [GET-1676] Resolving state for !BAXLHOFjvDKUeLafmO:matrix.org with 2 groups
synapse.state: [GET-1676] Resolving conflicted state for '!BAXLHOFjvDKUeLafmO:matrix.org'

@heftig does your server use workers at all or is it one giant synapse?

No workers here.

for those encountering this issue: https://github.com/matrix-org/synapse/commit/b3d344c0dab33f695864c52d8ceee9bd02928d53 is a patch to work around it; I'm continuing to investigate what has happened.

conclusions so far:

$vxKZZEW3pJmuKI4V320ZzWOr0d2O9MqJ+/WMPsUIEqM was an event sent (to the IRC bridges room !BAXLHOFjvDKUeLafmO:matrix.org) back in August; it (along with a few other events around that time) referenced a non-existent event in its auth_events. It therefore should have been rejected (as indeed it seems to have been on many servers); however, matrix.org appears to have incorrectly accepted it. The event in question was a join event.

Today, the IRC bridge kicked all users deemed to have been idle for 30 days or more; this included the user who joined in $vxK.... The kick event ($iFQmiui69oTYNtGPMQilJEfe5U6HYWQooMFicYRhXBI) referenced the broken event in its auth events.

That means that $vxK... is now getting involved in the state-resolution algorithm, which apparently explodes when it fails to find the auth event referenced by $vxK.

another odd thing is that $iFQmiui69oTYNtGPMQilJEfe5U6HYWQooMFicYRhXBI comes up as a forward extremity, even though its successor ($8l7/U97QjDXLjc+Ym9qzyBaO0tFF9KeYUPgH5x/vdTA) has been correctly received and persisted.

This appears to have happened because $8I7... was initially persisted as an outlier: see #6607.

the fact that some servers accepted $vxKZZEW3pJmuKI4V320ZzWOr0d2O9MqJ+/WMPsUIEqM whereas others rejected it appears to be because different code paths use different values of auth_events when performing event auth: some use a locally-calculated set of events, whereas others use the auth events provided by the event itself.

The spec says:

The receiving server must ensure that the event:
...

  • Passes authorization rules based on the event's auth events, otherwise it is rejected.
  • Passes authorization rules based on the state at the event, otherwise it is rejected.

I'm far from convinced that both of these are happening on all code paths.

Was this page helpful?
0 / 5 - 0 ratings