Synapse: Redact all events older than a certain time

Created on 30 Dec 2016  路  46Comments  路  Source: matrix-org/synapse

We could add this to the prune API. So additionally, when you prune a room, you can also redact all those events, so all content is removed on federated rooms too

Most helpful comment

@kythyria thanks for the description!

I want to describe more about privacy problem with deletion - Matrix developers very often warns about privacy issues on feature requests about delete room & messages: with federation we can't control other servers and can't be sure that they remove messages & rooms, so they don't want implement deletion (full remove room, self-destruct messages, etc) in Matrix protocol. But most of rooms in server usually not federated and can be succesfully cleaned up on one homeserver with full guarantees. But users miss this feature, even if room is not federated.

So good way on deletion process will be check if room is federated, and show "large red warning" on client side, when user try to cleanup something, and describe that data of this room is removed only on this homeserver, and can be kept on other federated servers. And add per-room option "Disable federation".

This is better that ignoring all delete feature requests from users with "this is insecure so will not implemented".

All 46 comments

I'm not happy with this. Pruning is intended to be used to save space, not to erase history. Redaction and pruning are entirely different things. By pruning, you are saving space on your server, not everyones.

I would consider it counterproductive to the history-first nature of Matrix to start allowing mass history removal across servers.

It depends on the use case.

The history-first directive isn't always what's needed.

Maybe make it configurable

I want to be clear that what your asking is also against the use case of the Prune API.
It's sole purpose is to free up space on the host. It is not intended for redaction. Redaction is a different concept designed for removing events across servers.

What you are asking for is a separate API to remove events in bulk, as a redaction. I don't see this happening due to my reasons given before.

TL;DR - The prune API is a local admin api to free space, not to delete history.

I clarified the title and the initial message, so it is clear what I aim for.

There are several usecases, where it is desired to clear old messages for the sake of data avoidance and data parsimony

This was requested before already here: https://github.com/matrix-org/synapse/issues/1480

If you want to delete old messages from your server, that's fine. You have exactly no right--and _no ability to enforce_--that I also delete them.

I don't understand, what are you hoping to achieve by redacting over pruning. The only difference is that everyone's server gets affected v.s. yours. Redaction will leave some data, and as Erik explained here, you will still lose more or less the same amount of information in both cases. On top of the fact that deleting events is literally impossible to do entirely due to Matrix's design, so signatures are always left.

The only pro I can see to your argument is that by redacting, people can't paginate to get it back which causes issues for people like me, who want to retain all my history where I can.

TL;DR

If you want to delete old messages from your server, that's fine. You have exactly no right--and no ability to enforce--that I also delete them.

EDIT: I would like to clear up that #1480 is a bug predating pruning which became it later on.

You can't unsend email, or unsend paper mail, or unsay things in general.

that is true, you cannot unsay it, but you could globally flag it as redacted

You can demand people treat it as unsaid, but you can't actually unsay it.

yes, that is what I meant.

And additionally you _could_ prevent an email from being sent, if it is still in your outbox. Meaning in Matrix: you could prevent a redacted message from being federated to other servers, if there are no other servers connected with your room yet. This actually is desired in private rooms a lot by people that are trying to avoid excessive data collection in the internet

And additionally you could prevent an email from being sent, if it is still in your outbox.

Well, yes. Things that haven't happened yet are generally easy to undo. However, as soon as that message, matrix or email, touches another server, you've lost control of it. Period. No take-backs. So you have to run that bulk redact before anyone from another server enters the room. After that point the window for redaction is, uh, tiny.

I am perfectly fine with that.

What I aim for here is, that there will be an option to automatically redact all messages older than a certain timeframe, not deleted from the database (thats what #1621 is about) but redacted as it already works right now.

So if that option is turned on, all old messages are not shown in the clients (Riot) any more (although they theoretically still exist as "redacted" in the database) so normal users cannot scroll back in history more than this time.

If the room is federated, this redaction-flag should be federated too, so the admin has full control over the history of the room

A script to redact the history would start like this:

#!/bin/bash

# this script will purge all messages of a given room older than a definable age

DOMAIN=yourserver.tld
# add this user as admin in your home server:
ADMIN="@username:$DOMAIN"

#choose the room to prune old messages from
ROOM='!cURbafjkfsMDVwdRDQ:matrix.org' # for example: "Matrix HQ"

# choose a time before which the messages should be pruned:
# TIME='2016-08-31 23:59:59'
TIME='3 months ago'

# creates a timestamp from the given time string:
UNIX_TIMESTAMP=$(date +%s%3N --date='TZ="UTC+2" '"$TIME")

BUSY="pragma busy_timeout=20000"
BUFFER=$(sqlite3 homeserver.db "$BUSY;select event_id from events where type='m.room.message' and received_ts<'$UNIX_TIMESTAMP' and room_id='$ROOM' order by received_ts;")

for line in $BUFFER; do
  # use the api to redact those events
  # ...

If the room is federated, this redaction-flag should be federated too, so the admin has full control over the history of the room

And then you hit a server run by someone like me that's been patched to ignore the flag, or never implemented it to begin with. Oops.

The admin does not, and cannot, have that kind of control. This is a _fundamental property_ of any distributed system whose nodes are owned by unrelated entities. Imagine going up to Google and demanding they remove from the inboxes of their users every message older than X days. That's basically what you're asking for here.

And the point of redact is that stuff is deleted from the database leaving only a tombstone whose sole function is to prevent the room from becoming broken.

I don't understand what is the problem with flagging a message as "redacted by ...."? And Yes, every federated server can choose how to handle that flag, which is fine.

The problem isn't a "redacted by" thing, it's that having an auto-redact state entry creates a false sense of security, and an even falser sense of control.

The "sense of security" wouldn't be false, if the history length would be visible in the head of the room.

look at Telegram: there are rooms that delete everything after some minutes and this is very visible to the user.

And moderated rooms are a fine option in chat systems like Slack and Matrix. There only has to be a fine configuration option who is allowed to delete messages or if it is not allowed.

And it has to be transparent

The "sense of security" wouldn't be false, if the history length would be visible in the head of the room.

It would be entirely a lie if any server in the room ignored the history length. Which they will. So the only non-wishful history length it would be valid to display is "messages might be retained forever".

To put it another way, I can put Delete-after: 2d in my emails, and write a client that advertises the option to set that header, but that means _absolutely nothing_ if your mail system doesn't honour it. It just means people will incorrectly think the messages will self-destruct.

Telegram can do this because it's a closed system where one party controls all the servers, and using a third-party client is difficult. Neither of those applies to Matrix, except in a _strictly_ non-federated context.

https://github.com/matrix-org/synapse/issues/1621#issuecomment-269859232 by @kythyria

If and only if the room is completely unfederated, and the server honours the relevant messages, will redaction do what @rubo77 seems to think it does.

So is this all true?

  1. redacting does "flag" a message in the database, so it _should_ not be shown in clients (but still _could_ be shown anyway)
  2. this "flag" is federated to other servers too
  3. If every client _would_ obey not to show the redacted messages any more, they _would_ be visible nowhere anymore (Riot does obey this)
  4. If clients don't obey, they can still show the content of messages that were redacted
  5. If a server is not federated to other servers a complete deletion of the content of a message could be a possibility to implement in the future

I also would like the ability to clear history in a room on my homeserver.

In the interim I just run these three SQL queries on my homserver.db...

delete from events where room_id = "...";
delete from event_json where room_id = "...";
delete from event_push_actions where room_id = "...";

If I were to go a step further I would parse each event type, and if it's a media message, go and delete the appropriate resources from the content repository, and then expose this feature in the UI to admins. But for now this + clear caches in riot is sufficient for my needs.

@kfatehi i think you are causing havoc on your database like this. There are a lot more tables affected and the federation completely breaks if you Yost delete stuff directly.

Please use the implemented prune functionality for this

@rubo77 Thanks for the comment. I am not familiar with prune -- reading the thread above it sounds like it doesn't actually delete messages, and for that I'd have to redact. A script that goes through and redacts everything might be good, but I'm not sure how effective redaction is in a situation like seizure of a homeserver. I'd need to audit these mechanisms and find out for sure.

Had this room been anything but a private direct-chat without federation, I'd have been more cautious!

Keeping an eye on https://github.com/vector-im/riot-web/issues/3104 -- thanks for creating these.

There is a purge Feature that really deletes the messages #911

This request was another idea instead of Prune,redact

@rubo77

On top of the fact that deleting events is literally impossible to do entirely due to Matrix's design, so signatures are always left.

Care to elaborate on this?

The problem is following: In some Rooms, there is just the need of the history being deleted after a certain time. Since really deleting the messages is not possible if the room is federated, because you can only delete it in your homeserver and it will be federated back to live from other homeservers.

The only solution at the moment is to redact all old posts, which will be federated then. (I am aware, that some homeservers could be modified to not to obey the redact flag, but the solution would be "best effort")

It would be easy to create a script, that redacts all posts older than a certain time, so this would be a nice feature, if it would exist directly in the room configuration.

Such an option should be completely transparent to all members, so you see, that if you write something in that room, it will only last for that long.

@rubo77 What about "signatures are always left"? I don't understand this part.

@Half-Shot said:

signatures are always left.

I can only guess what he meant: if you redact messages, there is a rest left in the database, for example the dateof the posts, and who posted it but "signature" is not the correct term for these "relics"

@rubo77 so metadata?

I don't think metadata should be left on servers forever, that's a privacy nightmare and there's no reason for that.

Can we after redacting - keep on servers only signature, without metadata (message text content, etc)? As I understand, via signature server validate message content, but if message is redacted - can we skip validation and accept cleaned up message with 'redact' flag and kept signature?

Redacted messages contain a copy of the redaction message, the id, timestamp, and sender, as far as I can tell. The content is gone (this is all assuming that redaction is correctly implemented, which of course there are _zero_ guarantees about).

The signature validation is designed so that this works (and the redaction message isn't part of the signature, nor could it be). Matrix relies on the signatures chaining together in order for a room to stay coherent, so there needs to be enough for the validation to work.

@kythyria thanks for the description!

I want to describe more about privacy problem with deletion - Matrix developers very often warns about privacy issues on feature requests about delete room & messages: with federation we can't control other servers and can't be sure that they remove messages & rooms, so they don't want implement deletion (full remove room, self-destruct messages, etc) in Matrix protocol. But most of rooms in server usually not federated and can be succesfully cleaned up on one homeserver with full guarantees. But users miss this feature, even if room is not federated.

So good way on deletion process will be check if room is federated, and show "large red warning" on client side, when user try to cleanup something, and describe that data of this room is removed only on this homeserver, and can be kept on other federated servers. And add per-room option "Disable federation".

This is better that ignoring all delete feature requests from users with "this is insecure so will not implemented".

@MurzNN what you describe is basically what i described here: https://github.com/matrix-org/synapse/issues/1621#issuecomment-269857023

Yes, Great conclusion! So please include this behaviour someone.

What can we do to help accelerate the development in this direction, so we get these options?

We can already implement this feature now via bot, here is issue: https://github.com/turt2live/matrix-wishlist/issues/82
This is not too hard work, so if anybody have free time or programmer resources - he can do the bot, based on Go-NEB for example.

Seems here is admin command now in Synapse for purging rooms: https://github.com/matrix-org/synapse/blob/master/docs/admin_api/purge_history_api.rst

@MurzNN this API doesn't delete events, but just some state related stuff, AFAIK

see https://github.com/matrix-org/synapse/blob/master/synapse/storage/events.py#L2014

Any news here? An optional per-room auto-deletion feature is strongly needed!

I added a script to the contrib section, that you can use: https://github.com/matrix-org/synapse/tree/develop/contrib/purge_api
This script only purges the history, so if the rooms are federated, the messages are not gone (unless purged everywhere)

@rubo77 thanks.. what is the best way to discuss if the script does not work for me?

If you have enhancements to the script then create a pull request here.

Or contact me in https://riot.im/app/#/room/#synapse-admins:yuhu.ddns.net as user rubo77

@rubo77 : I create simple python script that can be remove messages after predefined timeout.
https://github.com/matrix-org/synapse/pull/4206

I think it unlikely this is a feature we will add to synapse.

But we have MSC2228: Self destructing events in proposed-final-comment-period - does it not related to current feature?

that's about events which get redacted after a certain period (eg '1 hour') which is different to an API which redacts all events older than a certain point in time (eg '06:00 today')

Please reopen.

I plan to create a contribution, that adds this as an external script

Was this page helpful?
0 / 5 - 0 ratings