Synapse: Remote dynamic thumbnails are stored at the wrong location

Created on 27 Aug 2019  Â·  17Comments  Â·  Source: matrix-org/synapse

Description

The remote thumbnails path and filename are derived from media_id when they're generated and stored with dynamic_thumbnails set to true. The said thumbnails are looked up by filesystem_id which fails every time. Synapse uses a lot of CPU regenerating the thumbnails all the time. When dynamic_thumbnails option is then turned off Synapse can't find the thumbnails generated when the said option was turned on and just returns 404.

Steps to reproduce

  • set dynamic_thumbnails to true
  • try to load a thumbnail of a remote media
  • now set dynamic_thumbnails to false
  • try to load the same thumbnail of the remote media
  • it returns 404

Local media thumbnails are generated and retrieved just fine. For example, I'm trying to retrieve a media with id zOUYnsGQKLQypwBoQkRpnkXh and corresponding filesystem id WFzEWtlvXlCaZShcaIthgzrG. The media itself is stored at /var/lib/matrix-synapse/media/bc/remote_content/matrix.org/WF/zE/WtlvXlCaZShcaIthgzrG and I can retrieve it from my server. The thumbnail should be located at /var/lib/matrix-synapse/media/bc/remote_thumbnail/matrix.org/WF/zE/WtlvXlCaZShcaIthgzrG/30-30-image-png but it's actually stored at /var/lib/matrix-synapse/media/bc/remote_thumbnail/matrix.org/zO/UY/nsGQKLQypwBoQkRpnkXh/30-30-image-png. Why do we even need two identifiers? It confuses things pretty much (it's hard to find an offending file in the storage if you know its id without looking it up in the database) and doesn't add any security.

Version information

  • Homeserver: own server

  • Version: 1.3.1+buster1
  • Install method: apt repository

  • Platform: Debian stable amd64

bug good first issue

Most helpful comment

hopefully fixed by #5915 then. Thanks all.

All 17 comments

This seems to fix it:
```diff
diff --git a/synapse/rest/media/v1/media_repository.py b/synapse/rest/media/v1/media_repository.py
index cf5759e9a6..28073dec4c 100644
--- a/synapse/rest/media/v1/media_repository.py
+++ b/synapse/rest/media/v1/media_repository.py
@@ -526,7 +526,7 @@ def generate_remote_exact_thumbnail(
try:
file_info = FileInfo(
server_name=server_name,
- file_id=media_id,
+ file_id=file_id,
thumbnail=True,
thumbnail_width=t_width,
thumbnail_height=t_height,

thanks @rkfg. Do you know if this is a recent regression, or if it has always always been a problem?

Would you be able to make a PR for your patch? (A regression test would be very helpful too, though I appreciate that may need more time than you have.)

Eh, no idea if it's recent but I noticed that thumbnails didn't load after I cleaned up the cache and database. Didn't have time to dig deeper back then and enabling dynamic thumbnails "fixed" it. But recently I noticed that avatars take too much time to load like 1-1.5 minutes (if they were not cached) so I decided to take a look. I suspected S3 being slow at first but then noticed a big CPU usage and in the end solved it.

I'd like to make a PR but it needs too much hassle for a one-liner so I'm not really excited to do that. Feel free to make that patch yourself, I don't even need to be mentioned. The fix is too trivial. Hope it will help you reduce the load on matrix.org.

I'd like to make a PR but it needs too much hassle for a one-liner so I'm not really excited to do that.

understood, but that means it has to wait for someone else to have time to do it :(

Why do we even need two identifiers? It confuses things pretty much (it's hard to find an offending file in the storage if you know its id without looking it up in the database) and doesn't add any security.

I think basically because the media_id is made up by the remote server, and it's not necessarily a given that it contains only characters which are valid on the local filesystem. @erikjohnston: this seems to have come from #2767: do you remember the logic?

I think basically because the media_id is made up by the remote server, and it's not necessarily a given that it contains only characters which are valid on the local filesystem. @erikjohnston: this seems to have come from #2767: do you remember the logic?

Oh, no. It's because each piece of media can have several thumbnails, and each one needs its own filesystem_id.

(edit: this is bogus)

Why do we even need two identifiers? It confuses things pretty much (it's hard to find an offending file in the storage if you know its id without looking it up in the database) and doesn't add any security.

I think basically because the media_id is made up by the remote server, and it's not necessarily a given that it contains only characters which are valid on the local filesystem. @erikjohnston: this seems to have come from #2767: do you remember the logic?

I think it was mostly that, possibly also so that we could decide where we wanted it on disk (e.g. dedupe the media, or something)

@rkfg I'm very confused here. I agree that something looks bogus, but I cannot reproduce the issue at all. Any hints?

(I have dynamic_thumbnails set to false, but my server seems quite happy to serve thumbnails of remote media)

As far as I can tell, the file that you suggest patching (generate_remote_exact_thumbnail) is only called when dynamic_thumbnails is turned on, so the fix doesn't seem to match the described symptoms?

@rkfg I'm very confused here. I agree that something looks bogus, but I cannot reproduce the issue at all. Any hints?

Do the thumbnail files appear at the correct location? Maybe there were changes between my version and current develop, I can't really test. Is this happening on 1.3.1?

yes, they appear at the location given by the filesystem_id.

I'm testing on current develop, but I don't think this code has been touched in months.

As far as I can tell, the file that you suggest patching (generate_remote_exact_thumbnail) is only called when dynamic_thumbnails is turned _on_, so the fix doesn't seem to match the described symptoms?

It's called when that option is off for me, I cleared the remote media tables and removed the files after applying the fix and it worked. The thumbnails were stored under the wrong names before, I checked the timestamps to make sure they're fresh. Does this fix break them for you then? Something has to change after applying it!

I'm just reluctant to change code without understanding what we're changing.

generate_remote_exact_thumbnail is only called by ThumbnailResource._select_or_generate_remote_thumbnail (https://github.com/matrix-org/synapse/blob/develop/synapse/rest/media/v1/thumbnail_resource.py#L189), which is only called by _async_render_GET (https://github.com/matrix-org/synapse/blob/develop/synapse/rest/media/v1/thumbnail_resource.py#L73), and only when dynamic_thumbnails is on.

Is it possible that the thumbnails were generated in the wrong place when dynamic_thumbnails was enabled?

It's possible, yes. Now that I think about it I really didn't test that case. When there's a database record already for a certain file it won't be regenerated even if the actual file is missing, correct? Then I already had those incorrectly named thumbnails and nuking the tables and data hid the issue.

I fixed the issue description and steps to reproduce.

hopefully fixed by #5915 then. Thanks all.

Was this page helpful?
0 / 5 - 0 ratings