Synapse: Purge old cleartext on my homeserver

Created on 9 Mar 2018  路  17Comments  路  Source: matrix-org/synapse

User story: I am a system administrator setting up a homeserver for strangers. I want to delete any cleartext messages older than 30 days, so I have limited access to user content.

Is this possible with Synapse? Dendrite?

Related: https://github.com/matrix-org/synapse/issues/2963

Most helpful comment

We plan to add more administrative functionality to synapse later this year - the idea being that admins can have greater control over data storage etc.

For now purge api is your best bet.

If you are using using your hs for anything other than very light loads, strongly suggest migrating to postgres.

All 17 comments

You can redact messages via API, though the db will still contain them.

What you probably want is the purge api, which is in master, but not in 0.26. I don't have a release date for 0.27, other than that I confirm the purge api will be part of that release.
https://github.com/matrix-org/synapse/blob/develop/docs/admin_api/purge_history_api.rst

Note, if the server federates, then the room data will live equally across all servers that join the room.

FWIW 0.27.0 release candidate should be out later today

Here is my little private maintains script with postgresql on localhost as an example:

#!/bin/bash

logger "$0 started."

HOMEBASE="http://localhost"
ADMIN="@admin:matrix.example.org"

DBNAME="synapse"

TOKEN=$(sudo -u postgres psql -t -A --dbname="$DBNAME"  --command="SELECT token FROM access_tokens WHERE user_id='$ADMIN' ORDER BY id DESC LIMIT 1;" 2>/dev/null)

TIME='30 days ago'
# # unix timestamp in milliseconds
UNIX_TIMESTAMP=$(date +%s%3N --date='TZ="UTC" '"$TIME")
ROOMS=$(sudo -u postgres psql -t -A --dbname="$DBNAME" --command="SELECT room_id FROM rooms;" 2>/dev/null)

echo "### MATRIX MAINTAINS"
echo "### purge history at $TIME:"

date --date='TZ="UTC" '"$TIME"

for ROOM_NAME in $ROOMS; do
    echo "ROOM_ID: $ROOM_NAME"
    curl --silent --header "Content-Type: application/json" --request POST \
    --data '{"purge_up_to_ts":'$UNIX_TIMESTAMP',"delete_local_events": true}' \
    $HOMEBASE':8008/_matrix/client/r0/admin/purge_history/'$ROOM_NAME'?access_token='$TOKEN
done

echo "### purge media cache:"
curl --silent --request POST $HOMEBASE':8008/_matrix/client/r0/admin/purge_media_cache?before_ts='$UNIX_TIMESTAMP'&access_token='$TOKEN

echo "### list rooms:"
sudo -u postgres psql -t -A --dbname="$DBNAME" --command="SELECT room_id, name FROM room_names;" 2>/dev/null

echo "### done."

logger "$0 stopped."

exit 0

# eof

@ukcb And why postgresql? Default config of matrix uses sqlite3. So how would that work?

Postgresql gives me more options. https://github.com/matrix-org/synapse#using-postgresql

Sorry, I don't use sqlite3 here.

@ukcb but does this even work? https://github.com/matrix-org/synapse/pull/2540 says this api does not remove everything, for whatever reason. I dont get it, why there is no way to clear all channel data after x days.

The script is only as good as the API. I notice for myself that the API does not delete everything (see also https://github.com/matrix-org/synapse/issues/3148 and https://github.com/matrix-org/synapse/issues/3189).

@ukcb But how do admins then properly purge old data... it cant be, that they are stored for ever and the server runs low on disk space?

I hope that it will eventually work completely. Nothing is forever. :-)

@ukcb but that doesnt make any sense. Why arent you making a proper script then, which deletes all database entries after timestamp x, doesnt that work? Didnt someone do that already, I looked for a script but couldnt find any, but yours, which just uses the nonsense API.

@makedir The script is not that important, it should only control the API and not do any direct access to the database. I hope that at some point there will be settings in Matrix that make such scripts superfluous. At the moment, the script does a good job for me, even if it does not delete everything.
Of course, I could delete everything in the database without an API, but that's not the purpose of the script.

We plan to add more administrative functionality to synapse later this year - the idea being that admins can have greater control over data storage etc.

For now purge api is your best bet.

If you are using using your hs for anything other than very light loads, strongly suggest migrating to postgres.

@neilisfragile there should be some easy way an admin can access things like that, for example via riot client, if youre an admin, just go into channel settings and click "purge older than 30 days data and media" or auto purge these after 90 days in this channel or something like that.

@neilisfragile As server admin I would like to have a central purging option for all rooms. I am concerned with avoiding data in the sense of the GDPR

@makedir - we'd probably want to make it an admin for the server itself rather the tie into any given client.

@ukcb - Nods, this is a popular feature request, though we talked in #1941 on why I don't believe it a prerequisite for GDPR. As I say, we'll definitely be working towards improved server admin tooling. If you can't wait that long, PRs always welcome!

so is this not solved by scripting the purge api?

In the Script in my pull request https://github.com/matrix-org/synapse/pull/1034 the data is really deleted (unless it it is re federated from another Homeserver)

Was this page helpful?
0 / 5 - 0 ratings