Synapse: Media in the content repo is not authed

Created on 22 Apr 2017 · 35Comments · Source: matrix-org/synapse

Example, this was shared in a private 3 person chat, but anyone can view it: https://matrix.org/_matrix/media/v1/download/matrix.org/bSRWdHBFqtVzowZDhwRGbzDq

Most people I've recruited into Matrix are Google Hangouts refugees looking for an open platform. On Hangouts, you cannot view the web URL of an image in this way unless you're authenticated with the server and the user has shared it with you in a chat.

Would it be possible to support moving past security through obscurity at some point? Or, failing that, at least expire the images after a week or so?

This is concerning because it would be rather trivial for someone to write a simple app querying random alphanumeric strings to harvest images people have shared in private conversations.

feature media-repository security

Source

kethinov

👍26

Most helpful comment

Encryption is nice, but if I have your file, I could deploy infinite time and resources to brute force that encryption. What you want is to make it as hard as possible for me to get your file in the first place, then encrypt it on top of that. That's why people are so reticent to hand over their phones or laptops to border patrol even when they use full disk encryption. Physical security matters perhaps even more than encryption.

As such, what concerns me here is it's so easy to gain physical access (in a sense) to random people's files by stumbling on a random file just by guessing a single key, rather than having to match at least two matching pairs. In other image sharing services, there are similar long, unique keys to access the image itself, but in addition to that you need to present valid account credentials and that account has to have been given explicit permission to view that image.

I do think it would be prudent add those additional layers of security here.

kethinov on 20 Jul 2017

👍8 👎2

All 35 comments

Autre31415 on 9 May 2017

👎1

This is no more security through obscurity than any other key based authentication mechanism, this is called URL based authentication, the key in your example bSRWdHBFqtVzowZDhwRGbzDq is 24 characters long and uses upper and lower case, this is 52^24 which is more than 128bits.

But lets work through your concern.

Imagine we write your trivial app and start it running...

Assuming the CDN can store 1 PB (PetaByte), and an average image size of 1KB, thats a trillion images (10^12 or 1,000,000,000,000).

Lets assume that you have a really high speed Internet link and the CDN will let you do 10^8 (100,000,000) queries per second, tcpdump says that a single query is 4.7KB so were doing 470GB of traffic every second, and apparently both your link and the server are able to handle 3760Gbps.

Lets say that no one notices that the server is getting hit by a Denial of Service attack 6 times larger than anything ever seen before, and they let you keep going for 10 years (60*60*24*365*10)

52^24/10^12/10^8/(60*60*24*365*10)

At this point we can determine that you have a 1 in 4,844,775,310,744 chance of getting a random Cat pic...

Meanwhile you have a better chance of getting struck by lightning... while drowning at 1 in 183 million.

Personally I would be more concerned about someone walking up to the server and stealing it... or the server gets hacked due to a bug somewhere... which is why you should be using encrypted chat...

This is what an image looks like when it is sent to a group using encrypted chat:

https://matrix.org/_matrix/media/v1/download/matrix.org/qctIqdoPymLbqdNpOkWZGtvo

If you grab this file (which was a jpeg of a cat) you will notice that it it encrypted.

mphara8437 on 19 Jul 2017

❤2 😄2

It's still less secure than Hangouts et al though because it only requires correctly guessing one key rather than two or more.

To access a privately shared image via Hangouts, you'd have to gain access to a whole account that has been granted permission to view the image, so you'd have to know both the username and the password, which is much harder to randomly guess.

Moreover, some accounts are configured with 2FA, further increasing the security.

This implementation is far from that, and I think addressing this would be worth doing at some point.

kethinov on 19 Jul 2017

👍7 👎2

My understanding of your concern was that the media-id's which were being generated by Synapse, left users of Synapse open to a brute-force keyspace attack using a simple app (an understandable concern).

The Matrix specification does not provide details on media-id keyspace, so the keyspace for the media-id can be easily increased to increase security without issue, if required.

However a keyspace attack against the Synapse content repository API implementation is already infeasible, so no change is necessary.

Synapse is the reference implementation for the Matrix specification and adding user authentication to the content repository API would require a change to the Matrix Specification.

To propose changes to the Matrix Specifications see the following:

https://github.com/matrix-org/matrix-doc/blob/master/CONTRIBUTING.rst

PS If you are concerned about privacy, use encryption.

mphara8437 on 20 Jul 2017

I do think it would be prudent add those additional layers of security here.

kethinov on 20 Jul 2017

👍8 👎2

I totally agree with kethinov. I can imagine deploying fail2ban on the server to monitor 404 errors would slow down the attacker but still does not solve the main issue.

taurhine on 26 Sep 2017

dup of https://github.com/matrix-org/synapse/issues/1403

uhoreg on 16 Oct 2017

See also https://github.com/matrix-org/matrix-doc/issues/701 for the spec issue here.

richvdh on 16 Oct 2017

It is highly unlikely someone could guess the media url, the key in each media link is reasonably long enough to prevent guessing. The more likely attack vector would be obtaining the URL directly somehow; perhaps it is accidentally posted into a channel or someone who already has the link shares it without permission, your browser has a toolbar that is scraping your URL entries without your knowledge, some other person in the channel has malware on their machine that is sending away data it is collecting from a channel they are participating in with you, etc.

benqrn on 16 Oct 2017

👍7

Crossposting for the purposes of visibility (source):

I don't think this has been answered somewhere, so asking here in hopes people have ideas: How would federated media work?

In theory the server could start signing requests to download media, although that doesn't really guarantee that the person making the request is allowed to do so (ie: is in the room). With the upcoming introduction of users being linked to key-like objects, we could possibly use those to sign the requests, however there's nothing to stop a server lying about which user is requesting the media.

Then there's the question of the user potentially wanting specific media being publicly accessible. The primary use case being the IRC bridge which pastebins long messages.

turt2live on 5 Jun 2018

So this comes up on a regular basis, especially from corporate security folks who don't like the idea that a URL leaked in HTTP logs (or proxy logs) etc could then be simply curl'd by any random user to access the content. It's not a matter of the chances of guessing the URL correctly (or the chances of being hit by lightning) but instead whether an attacker who does manage to get the URL automagically gets access to the content too.

One thing we could do is to auth access to the content itself, but this means tracking the event(s) that the content is referenced by and in turn which users have access to those events and so can view the content. This is a potentially nasty leak of metadata for e2e attachments which we don't currently have otherwise. (It's possible we might need this for quotas as per https://github.com/matrix-org/synapse/issues/3339, but hopefully not). It's also quite heavy for the media repo to have to check auth rules for a room for every piece of content that is viewed (and is a bit unfortunate if the media repo is otherwise independent of the room server).

An alternative naive solution could be to just track a random bearer token alongside each mxc:// URL for each piece of content, stored in the event and in the repo. Clients would then submit this bearer token as Authorization: Bearer <secret> whenever they query the repo, meaning that URLs can't be simply copy-pasted around the place unless the auth token is also provided. This might be enough, in practice?

I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea?

Edit: we could of course also mandate that the user has a valid access_token for the server too when they are accessing the media repo, although that doesn't lock access to any particular piece of content.

ara4n on 5 Jun 2018

👍6

@turt2live did you have any ideas on how this should/could work?

ara4n on 5 Jun 2018

Not too much beyond the verbose spiel above (which ends with "I have no idea"). In any case, we should consider having a way for users/bridges/bots to say "this is supposed to be unauthed" via the API for things like the IRC bridge.

How insane would it be to always end to end encrypt media regardless of room?

turt2live on 5 Jun 2018

on second thought, encrypting everything doesn't really help. The authorization token probably makes the most sense, although I'm curious as to how the HMAC stuff would work.

turt2live on 5 Jun 2018

For bridges, I suspect that users will end up having to request the file using a URL from the bridge, and the bridge would have to do the auth dance. Maybe we could add an endpoint that will return a time-limited download URL that the bridge can 302 the user to, so that it won't have to proxy the whole file. But this would allow to check that the original event hasn't been redacted.

uhoreg on 5 Jun 2018

👍2

Maybe investigate how this done in Hangouts?

MurzNN on 5 Jun 2018

alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public. (In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret))

ara4n on 5 Jun 2018

It's worth noting that we probably want to support being able open media in a separate window, e.g. to view large images or PDFs etc, and I don't think you can make the browser add auth headers in those cases

erikjohnston on 5 Jun 2018

👍2

there are ways of fixing that - e.g. have the client download the content itself with the right headers and then expose it to the user as a blob URL, which can then be viewed in separate windows/tabs etc.

ara4n on 5 Jun 2018

I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea?

Turns out that the way we used to do it was to never send access_tokens in requests at all, but send an HMAC(method, url, access_token) and then use the access_token as a shared secret, so that a leaked URL wouldn't leak an individual user's access_token. I assume we didn't do this for Matrix because calculating that HMAC would be too onerous for trivial HTTP clients, hence passing raw access_tokens around. In practice it doesn't buy us anything in this instance, as the resulting URL could still be passed blindly around anyway; we might as well create a new random secret for each URL and use that instead.

ara4n on 5 Jun 2018

(cf https://github.com/matrix-org/matrix-doc/issues/1043 for "access tokens suck")

richvdh on 5 Jun 2018

What if each user would get its unique link to media or may be a common link with personal auth token, based on his id. When accessing media, the server could check that access token is correct for the user and the user is authenticated.

user318 on 5 Jun 2018

In reply to @ara4n:

alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public.

The reason that I suggested having the Bridge do the auth dance, rather than forwarding the secret in the querystring was so that a file that's redacted Matrix-side would become inacessible to bridged users.

(In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret))

I would just say that a file can be uploaded with a token or without a token. If it's uploaded with a token, then downloads need to be authed; if it's uploaded without a token, then it's a free-for-all.

In reply to @user318

What if each user would get its unique link to media or may be a common link with personal auth token, based on his id.

That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users.

uhoreg on 5 Jun 2018

That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users.

I do not actually know how it works in e2e. I thought that files are embedded there as a base64-encoded message. And not stored as media.

user318 on 5 Jun 2018

Messages have a size limit, so you can't store files within the message itself. You also don't want to send the whole file to everyone until they request it. e2e file events are basically just pointers to an encrypted blob in the media store, along with the decryption key.

uhoreg on 6 Jun 2018

I've written a spec proposal for solving this over at https://github.com/matrix-org/matrix-doc/issues/701, review welcome on the googledoc.

ara4n on 7 Jun 2018

Is #1263 going to be taken care of with this change as well? I'm only seeing concerns of GDPR erasure, which I presume mean when someone deactivates and deletes their account. Right now its fairly easy to have a tragedy if an inappropriate attachment link gos out a bridge.

dr1 on 7 Jun 2018

are there anything news ? does anyone try to re-implement this API to solve problem ?

cuongnv on 5 Jan 2019

Reading this thread, it appears most people mentioned brute force attacks or someone providing the URL to other people.

What I'm really concerned of is if somehow Google or other Search Engines end up indexing these images, because they are, after all, public URLs.

If someone posts the URL in public (like the OP of this thread), the image may potentially become indexed.

This Issue is an important one that needs to be resolved, especially on a project that takes Encryption and Privacy with high priority :)

nunoperalta on 19 Feb 2019

Just relealised this issue. It's quite embarrassing to argue for matrix because of privacy, especially in the advent of GDPR and seeing this issue... If someone can share and post media links wherever they want its quite an issue.

Also the argument of probability of guessing the file url is bogous we have so many examples of unprotected amazon buckets, where the IDs get scanned by security researchers and other people.

of course one could guess also a token but bruteforce filters on 403 are more common than on 404.

ataraxus on 6 Mar 2019

👍5 👎1

When will this considerable security issue be fixed?

menturion on 5 Apr 2019

👀3 👎1

Is #7009 released ?

Let me crosslink to this discussion:

https://mastodon.social/@rzr/104116637044903278

rzr on 6 May 2020

Is #7009 released ?

No, it should be released in the next version.

clokep on 6 May 2020

Is #7009 released ?

Unfortunately not yet. We're working towards getting Synapse 1.13.0 out of the door as quickly as possible since it's now pretty much overdue.
Note that #7009 will not add authentication to media, which would require a spec change - MSC2461 has been open for that purpose.
What #7009 does is to prevent browsers from leaking media URLs through referrer headers.