Synapse: Documentation for media_storage_providers

Created on 25 Mar 2020  路  8Comments  路  Source: matrix-org/synapse

I want to use the s3_storage_provider.S3StorageProviderBackend.

I configured it and when images are send over my synapse server they are also stored in the s3 bucket.
Here is how i configured it:

media_storage_providers:
- module: s3_storage_provider.S3StorageProviderBackend
  store_local: False
  store_remote: True
  store_synchronous: True
  config:
    bucket: synapse1
    endpoint_url: $HIDDEN_ENDPOINT
    access_key_id: $HIDDEN_KEY
    secret_access_key: $HIDDEN_KEY

But the images are still also stored locally on disk at /var/lib/matrix-synapse/media and if i remove the folder /var/lib/matrix-synapse/media the images are not shown anymore although they are still in the s3 bucket. So it seems to me that the s3_storage_provider.S3StorageProviderBackend does nothing but also store the data as a backup option?

I also could not find any documentation about media_storage_providers in this repo.
Can someone explain what they are and how they are supposed to work please.

My use case is that i want only to use the S3 Bucket and no local storage. Is this possible with media_storage_providers?

docs help wanted media-repository

Most helpful comment

Okay, I think I understand how the media storage providers are meant.
The bad news: It is not possible to replace the local storage (media_store_path) with the remote storage.

The storage of the media is primarily managed by the MediaStorage and not the StorageProvider.

https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_repository.py#L160
https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_repository.py#L492
https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_repository.py#L539
https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_repository.py#L648






This in turn always saves the file first in the local directory:

https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_storage.py#L65-L70






And then this locally stored file is passed on to the StorageProviders:

https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_storage.py#L109-L112






I'm slowly beginning to understand what the s3_media_upload.py cleanup job is all about.
https://github.com/matrix-org/synapse-s3-storage-provider#regular-cleanup-job






However, it should not be a problem to simply delete the local files from media_store_path at regular intervals. In this case, the file is then delivered by the first StorageProvider who still holds the file.

https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_storage.py#L143-L151






I hope I have analyzed it correctly.
Perhaps a responsible developer can comment again on whether this is correct ?!

All 8 comments

I'm also interested in what the media storage is for.

For me too, the file is stored locally in the file system AND on the S3.
The strange thing is that store_local: False means that the file is only saved in the file system and NOT on the S3!

But if I delete the local directory, all files are still available to me.

I use a Minio server as S3 storage.

What I just noticed is that all pre-generated thumbnails are ONLY stored on the S3 as expected. :thinking:

I also could not find any documentation about media_storage_providers in this repo.

The documentation is pretty light, but seems to be at https://github.com/matrix-org/synapse/blob/develop/docs/media_repository.md, the default config could certainly offer a bit more info:

https://github.com/matrix-org/synapse/blob/883ac4b1bb7c520e928e8a42d7700de7f0d56055/docs/sample_config.yaml#L695-L708

Looking at some of the code it seems that the "local" vs. "remote" in those configurations is whether the media was uploaded directly to this server vs. whether it was received over federation (it does not mean whether the data is stored "locally" on the server vs. "remote on S3", which is how I originally read it). See https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/config/repository.py#L46-L48

What's your configuration for backup_media_store_path? It looks like the file system store might be enabled if backup_media_store_path is set to true, see https://github.com/matrix-org/synapse/blob/883ac4b1bb7c520e928e8a42d7700de7f0d56055/synapse/config/repository.py#L118-L126

I have not specified backup_media_store_path, I will take a deeper look into this...

The backup_media_store_path is not the problem, it cannot be used together with media_storage_providers:

https://github.com/matrix-org/synapse/blob/883ac4b1bb7c520e928e8a42d7700de7f0d56055/synapse/config/repository.py#L112-L117

Okay, I think I understand how the media storage providers are meant.
The bad news: It is not possible to replace the local storage (media_store_path) with the remote storage.

The storage of the media is primarily managed by the MediaStorage and not the StorageProvider.

https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_repository.py#L160
https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_repository.py#L492
https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_repository.py#L539
https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_repository.py#L648






This in turn always saves the file first in the local directory:

https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_storage.py#L65-L70






And then this locally stored file is passed on to the StorageProviders:

https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_storage.py#L109-L112






I'm slowly beginning to understand what the s3_media_upload.py cleanup job is all about.
https://github.com/matrix-org/synapse-s3-storage-provider#regular-cleanup-job






However, it should not be a problem to simply delete the local files from media_store_path at regular intervals. In this case, the file is then delivered by the first StorageProvider who still holds the file.

https://github.com/matrix-org/synapse/blob/9dfcf47e9bb323f0597ebf8f34a1bcc9f14a02a1/synapse/rest/media/v1/media_storage.py#L143-L151






I hope I have analyzed it correctly.
Perhaps a responsible developer can comment again on whether this is correct ?!

Hey @tristanlins, yes this seems correct to me.

Can we add a link to this issue at the relevant place in the sample config file?

Thanks!

Was this page helpful?
0 / 5 - 0 ratings