Synapse: Error when loading particular communities: Failed handle request via 'GroupRoomServlet'

Created on 19 Mar 2019  路  19Comments  路  Source: matrix-org/synapse

Description

Whenever I try to load any group/community, I see the following in logs:

Mar 19 12:25:22 benp-synapse docker[11604]: 2019-03-19 12:25:22,453 - synapse.http.server - 112 - ERROR - GET-694 - Failed handle request via 'GroupRoomServlet': <XForwardedForRequest at 0x7f1bbd565be0 method='GET' uri='/_matrix/client/r0/groups/%2Bsdks%3Abpulse.org/rooms' clientproto='HTTP/1.0' site=8008>
Mar 19 12:25:22 benp-synapse docker[11604]: Traceback (most recent call last):
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
Mar 19 12:25:22 benp-synapse docker[11604]:     result = g.send(result)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/synapse/storage/_base.py", line 460, in runWithConnection
Mar 19 12:25:22 benp-synapse docker[11604]:     defer.returnValue(result)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1362, in returnValue
Mar 19 12:25:22 benp-synapse docker[11604]:     raise _DefGen_Return(val)
Mar 19 12:25:22 benp-synapse docker[11604]: twisted.internet.defer._DefGen_Return: {}
Mar 19 12:25:22 benp-synapse docker[11604]: During handling of the above exception, another exception occurred:
Mar 19 12:25:22 benp-synapse docker[11604]: Traceback (most recent call last):
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
Mar 19 12:25:22 benp-synapse docker[11604]:     result = g.send(result)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/synapse/storage/_base.py", line 418, in runInteraction
Mar 19 12:25:22 benp-synapse docker[11604]:     defer.returnValue(result)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1362, in returnValue
Mar 19 12:25:22 benp-synapse docker[11604]:     raise _DefGen_Return(val)
Mar 19 12:25:22 benp-synapse docker[11604]: twisted.internet.defer._DefGen_Return: {}
Mar 19 12:25:22 benp-synapse docker[11604]: During handling of the above exception, another exception occurred:
Mar 19 12:25:22 benp-synapse docker[11604]: Traceback (most recent call last):
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/synapse/http/server.py", line 81, in wrapped_request_handler
Mar 19 12:25:22 benp-synapse docker[11604]:     yield h(self, request)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
Mar 19 12:25:22 benp-synapse docker[11604]:     result = result.throwExceptionIntoGenerator(g)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
Mar 19 12:25:22 benp-synapse docker[11604]:     return g.throw(self.type, self.value, self.tb)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/synapse/http/server.py", line 316, in _async_render
Mar 19 12:25:22 benp-synapse docker[11604]:     callback_return = yield callback(request, **kwargs)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
Mar 19 12:25:22 benp-synapse docker[11604]:     result = result.throwExceptionIntoGenerator(g)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
Mar 19 12:25:22 benp-synapse docker[11604]:     return g.throw(self.type, self.value, self.tb)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/synapse/rest/client/v2_alpha/groups.py", line 355, in on_GET
Mar 19 12:25:22 benp-synapse docker[11604]:     result = yield self.groups_handler.get_rooms_in_group(group_id, requester_user_id)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
Mar 19 12:25:22 benp-synapse docker[11604]:     result = result.throwExceptionIntoGenerator(g)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoGenerator
Mar 19 12:25:22 benp-synapse docker[11604]:     return g.throw(self.type, self.value, self.tb)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/synapse/groups/groups_server.py", line 546, in get_rooms_in_group
Mar 19 12:25:22 benp-synapse docker[11604]:     room_id, len(joined_users), with_alias=False, allow_private=True,
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
Mar 19 12:25:22 benp-synapse docker[11604]:     result = g.send(result)
Mar 19 12:25:22 benp-synapse docker[11604]:   File "/usr/local/lib/python3.6/site-packages/synapse/handlers/room_list.py", line 393, in generate_room_entry
Mar 19 12:25:22 benp-synapse docker[11604]:     result["m.federate"] = create_event.content.get("m.federate", True)
Mar 19 12:25:22 benp-synapse docker[11604]: AttributeError: 'NoneType' object has no attribute 'content'

Version information

0.99.2, matrix.bpulse.org

Possibly duplicates #4786, but more detail here

bug communities p2

Most helpful comment

I am running a Synapse server on which I am the only user. I have found that when I leave a channel that was in a community, this breaks the community until I remove the relevant room ID from the database table group_rooms.

It seems that communities break when they refer to a room that has no local participants.

All 19 comments

@benparsons I got same problem, described in #5824, did your group recovers, or still shows the error? Can you lookup content of group_attestations_remote table in database via query SELECT * FROM group_attestations_remote WHERE group_id = '{GROUP_ID}' - maybe here we can find some problems?

I cleared out group_attestations_remote and group_attestations_renewals sql tables, but this don't solve the problem :(

I have the same issue here.
synapse: 1.10.1

log:
https://envs.sh/uj.txt

Seeing the 500 error in the browser network inspector for the +containers:mozilla.org community (#6974).

Any idea how to clean it up manually?

Due to the (inter)national state of affairs a lot of people started to use matrix. Communities would be the way to separate rooms into local communities, local languages.
Right now _all_ communities are _completely broken_ on my server, and I don't know what's broken and I don't know how to hack, fix or prevent it. People cannot use communities, can't find rooms. They specifically come then leave matrix.

Please, shed some light on _what's broken_ and _how to hack it_ to work, or how to periodically fix it. We desperately need communities working _now_, not 6 months in the future when room communities will be implemented. Please? Thanks!

looks like someone has figured it out in #7097:

I removed references to this room from group_rooms to solve it

delete from group_rooms where room_id='!XxXxXxXxXxXxXxIFYD:matrix.example.org';

(I understand your frustration, but sorry: communities aren't our top priority as a team right now, compared to more pressing concerns like performance. In the meantime, the joy of open-source is that solutions don't have to come from the core dev team. Thanks to @airblag for taking the time to investigate and propose a workaround!)

https://github.com/matrix-org/synapse/pull/7070 should fix missing room_version values in the rooms table.
When 1.11 releases, I am going to:

(On a side note, how do I find out more about a room just with its ID? I have no idea what these two rooms were, there aren't even any aliases in room_aliases for those IDs...)

In my case the room looked perfectly valid, had a version, but trying to join it resulted various errors on synapse side (for reasons not relevant here).

This resulted retrieving the room list fail with a server error which resulted room list not returned which in turn results infinite wait on the client side.

Removing the bad room from group_rooms indeed fixed the problem, but there was no indication about the culprit anywhere, and the call never returned.

i have fixed here manualy..

$ sudo -i -u postgres psql

postgres=# \c matrix
You are now connected to database "matrix" as user "postgres".
matrix=# SELECT * from group_rooms, rooms where group_rooms.group_id='+groupname:example.com' and rooms.room_id=group_rooms.room_id;

# delete all roomid's in the group
matrix=# delete from group_rooms where room_id='!roomid:example.com'
...

after this step i have used my Riot-Client to add my rooms to the community group.
works fine now.

For me, on synapse:v1.12.0, running DELETE FROM group_rooms WHERE room_version = ''; fixed 8 out of my 10 broken communities. The remaining 2 still had the same failure mode.

I identified the remaining bad rooms in those 2 using @grinapo's clue above ("trying to join it resulted in errors"), using "/join !..." in Riot-web, and those attempts resulted the logs described in #7108 (matching this regex: "WARNING.*Request failed: GET matrix://!.*:.*/_matrix/federation/v1/make_join/%21.*%3A.*invalid literal for int() with base 10.*").

Running DELETE FROM group_rooms WHERE room_id='<room_id>'; for those s immediately restored the communities to working.

I suspect those rooms are obsolete in some way, but have not found more info about them except their room_versions (which are 4 and 5).

I am running a Synapse server on which I am the only user. I have found that when I leave a channel that was in a community, this breaks the community until I remove the relevant room ID from the database table group_rooms.

It seems that communities break when they refer to a room that has no local participants.

This query fixed the two broken communities on my unfederated server:

DELETE FROM group_rooms WHERE room_id IN ( select group_rooms.room_id from group_rooms left join room_stats_current on group_rooms.room_id = room_stats_current.room_id where joined_members = 0 );

I'm facing this issue as well. Existing Riot clients / sessions are working properly but when I create a new login the Riot lists only bunch of unnamed rooms and they won't ever be synchronized correctly. Sometimes if I open an empty room it gets the correct room name but it's unable to fetch the messages or user list. And if I close the Riot client the room is named "unnamed room" again.

It's very same kind of situation than here: https://github.com/matrix-org/synapse/issues/5824#issuecomment-526345933

I am running a Synapse server on which I am the only user. I have found that when I leave a channel that was in a community, this breaks the community until I remove the relevant room ID from the database table group_rooms.

It seems that communities break when they refer to a room that has no local participants.

I can confirm this. We are running 2 distinct servers on domain A and domain B, the room list for a community on server A would not load for a user on server A as long as there was a room in the community no local user was in, the room list for the same community would load fine for a user on server B though as server B has at least one user in each of the community's rooms. Rejoining the rooms no-one from server A was in with a user from server A also restored the room list to working condition on server A.

A user was able to reproduce this in https://github.com/matrix-org/synapse/issues/7462 by adding a room to a community, then removing all users from that room. Whenever the community room list is queried, Synapse returns a 500 with OP's traceback.

For me, the issue is not caused by a room in the community which does not have any members.
I checked that with the sql command from above:

matrix=# select group_rooms.room_id from group_rooms left join room_stats_current on group_rooms.room_id = room_stats_current.room_id where joined_members = 0;
 room_id 
---------
(0 Zeilen)

So It seems the changes in the pull request do not fix this completly, since the issue can be caused by something else as well.

I could solve the issues by removing a single room from a community. Here are the room stats:

matrix=# select room_stats_current.* from group_rooms left join room_stats_current on group_rooms.room_id = room_stats_current.room_id where group_id='+community:example.com';
                room_id                | current_state_events | joined_members | invited_members | left_members | banned_members | local_users_in_room | completed_delta_stream_id 
---------------------------------------+----------------------+----------------+-----------------+--------------+----------------+---------------------+---------------------------
 !roomid1:example.com |                   73 |             89 |               0 |            7 |              0 |                  54 |                    150246
 !roomid2:example.com |                  117 |            164 |               0 |           12 |              0 |                  92 |                    150300
 !roomid3:example.com |                   81 |            107 |               0 |            4 |              0 |                  66 |                    150312
 !roomid4:example.com |                   55 |             65 |               0 |            2 |              0 |                  43 |                    150365
 !roomid5:example.com |                   46 |             10 |               0 |           67 |              0 |                   0 |                    143200
 !roomid6:example.com |                   39 |             30 |               0 |            2 |              1 |                  28 |                    150019

Removing 'roomid5:example.com' solved the issue.
However, these stats do not seem special to me, so I don't know why this room caused the issue.

Here is the error in the matrix log:

^[[36mmatrix          |^[[0m 2020-05-27 07:35:21,569 - synapse.http.server - 110 - ERROR - GET-122 - Failed handle request via 'GroupRoomServlet': <SynapseRequest at 0x7f0653421190 method='GET' uri='/_matrix/client/r0/groups/%2Boperations%3Ahive-mind.network/rooms' clientproto='HTTP/1.0' site=8008>
Traceback (most recent call last):
^[[36mmatrix          |^[[0m   File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
^[[36mmatrix          |^[[0m     result = g.send(result)
^[[36mmatrix          |^[[0m StopIteration: [{'room_id': '!roomid2:hive-mind.network', 'is_public': True}, {'room_id': '!roomid1:hive-mind.network', 'is$
^[[36mmatrix          |^[[0m
^[[36mmatrix          |^[[0m During handling of the above exception, another exception occurred:
^[[36mmatrix          |^[[0m
^[[36mmatrix          |^[[0m Traceback (most recent call last):
^[[36mmatrix          |^[[0m   File "/usr/local/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
^[[36mmatrix          |^[[0m     result = g.send(result)
^[[36mmatrix          |^[[0m StopIteration: [{'room_id': '!roomid2:hive-mind.network', 'is_public': True}, {'room_id': '!roomid1:hive-mind.network', 'is$
^[[36mmatrix          |^[[0m
^[[36mmatrix          |^[[0m During handling of the above exception, another exception occurred:
^[[36mmatrix          |^[[0m
^[[36mmatrix          |^[[0m Traceback (most recent call last):
^[[36mmatrix          |^[[0m   File "/usr/local/lib/python3.7/site-packages/synapse/util/caches/descriptors.py", line 462, in _wrapped
^[[36mmatrix          |^[[0m     cached_result_d = cache.get(cache_key, callback=invalidate_callback)
^[[36mmatrix          |^[[0m   File "/usr/local/lib/python3.7/site-packages/synapse/util/caches/descriptors.py", line 186, in get
^[[36mmatrix          |^[[0m     raise KeyError()
^[[36mmatrix          |^[[0m KeyError
^[[36mmatrix          |^[[0m
^[[36mmatrix          |^[[0m During handling of the above exception, another exception occurred:
^[[36mmatrix          |^[[0m
^[[36mmatrix          |^[[0m Traceback (most recent call last):
^[[36mmatrix          |^[[0m   File "/usr/local/lib/python3.7/site-packages/synapse/http/server.py", line 78, in wrapped_request_handler
^[[36mmatrix          |^[[0m     await h(self, request)
^[[36mmatrix          |^[[0m   File "/usr/local/lib/python3.7/site-packages/synapse/http/server.py", line 331, in _async_render
^[[36mmatrix          |^[[0m     callback_return = await callback_return
^[[36mmatrix          |^[[0m   File "/usr/local/lib/python3.7/site-packages/synapse/rest/client/v2_alpha/groups.py", line 329, in on_GET
^[[36mmatrix          |^[[0m     group_id, requester_user_id
^[[36mmatrix          |^[[0m AttributeError: 'NoneType' object has no attribute 'content'

If you you need any more information about this, to figure out what caused this,I'll happily provide it.

Was this page helpful?
0 / 5 - 0 ratings