Synapse: Ways to decrease size of Synapse database - state_groups_state table is too large

Created on 12 May 2020  路  5Comments  路  Source: matrix-org/synapse

Problem with very large size of Synapse PostgreSQL database table is very popular, and I can't find any good instructions how to decrease the size, so fill support issue here for make it available for others via googling.

Our public homeserver ru-matrix.org with about 20 local users - after 2 years of working got total size of pgsql table is 123GB!

As I understand, main tables, that store all events are events (8gb) and event_json (22gb), all other tables is generate data from it.

But in our database most size eats state_groups_state table, that have 274 millions of rows and 51GB size!

Is there any ways exists for reduce it size? Maybe exists some scripts for compress, deduplicate, regenerate, optimize, purge old/unused data in it?

I already try execution synapse-compress-state on larger rooms in my homeserver (Matrix HQ, Riot-*, KDE, etc), but the total size of state_groups_state table is still large, so I can't understand what to do next.

Comparing with real state storage table state_events, that have 5 millions of rows, I can't understand why state_groups_state so large?

Is it store many duplicates of data from state_events? If yes, maybe exists ways to deduplicate it?

If this is not duplicates, maybe I can purge old or rarely used data from local database, to sync it on demand from other servers via federatation?

All 5 comments

Here is statistics of table size in our Synapse database:

relation | total_size | rows
-- | -- | --
state_groups_state | 51 GB | 274929152
event_json | 22 GB | 14572037
events | 8853 MB | 14609418
event_edges | 8477 MB | 15573624
event_reference_hashes | 4519 MB | 14520730
stream_ordering_to_exterm | 4340 MB | 2154655
event_auth | 3719 MB | 17286570
event_search | 3637 MB | 8290099
received_transactions | 2815 MB | 9915562
event_to_state_groups | 2555 MB | 11454427
room_memberships | 2102 MB | 5461632
current_state_delta_stream | 1306 MB | 6627053
state_events | 1232 MB | 5625349
current_state_events | 958 MB | 1272631
cache_invalidation_stream | 850 MB | 4414804
receipts_linearized | 794 MB | 249685
presence_stream | 771 MB | 190234
state_groups | 604 MB | 2960779
event_forward_extremities | 347 MB | 2129
state_group_edges | 337 MB | 3225766

Maybe related to #3364.

I wrote a load of stuff about how state_groups_state works at https://github.com/matrix-org/synapse/wiki/State-Groups.

Honestly, state_groups_state being only twice the size of event_json doesn't sound so bad to me. I think this is probably just a duplicate of #3364.

Honestly, state_groups_state being only twice the size of event_json doesn't sound so bad to me.

Table event_json contains full data from event, including message. So we must compare not total size, but total rows, and state_groups_state table have 18 times more rows, than event_json, and 48 times more than in state_groups table. And yes, let's continue investigating this issue in #3364

I have found good article with ways to compress Synapse database: https://levans.fr/shrink-synapse-database.html

Was this page helpful?
0 / 5 - 0 ratings