Example, this was shared in a private 3 person chat, but anyone can view it: https://matrix.org/_matrix/media/v1/download/matrix.org/bSRWdHBFqtVzowZDhwRGbzDq
Most people I've recruited into Matrix are Google Hangouts refugees looking for an open platform. On Hangouts, you cannot view the web URL of an image in this way unless you're authenticated with the server and the user has shared it with you in a chat.
Would it be possible to support moving past security through obscurity at some point? Or, failing that, at least expire the images after a week or so?
This is concerning because it would be rather trivial for someone to write a simple app querying random alphanumeric strings to harvest images people have shared in private conversations.
+1
This is no more security through obscurity than any other key based authentication mechanism, this is called URL based authentication, the key in your example bSRWdHBFqtVzowZDhwRGbzDq is 24 characters long and uses upper and lower case, this is 52^24 which is more than 128bits.
But lets work through your concern.
Imagine we write your trivial app and start it running...
Assuming the CDN can store 1 PB (PetaByte), and an average image size of 1KB, thats a trillion images (10^12 or 1,000,000,000,000).
Lets assume that you have a really high speed Internet link and the CDN will let you do 10^8 (100,000,000) queries per second, tcpdump says that a single query is 4.7KB so were doing 470GB of traffic every second, and apparently both your link and the server are able to handle 3760Gbps.
Lets say that no one notices that the server is getting hit by a Denial of Service attack 6 times larger than anything ever seen before, and they let you keep going for 10 years (60*60*24*365*10)
52^24/10^12/10^8/(60*60*24*365*10)
At this point we can determine that you have a 1 in 4,844,775,310,744 chance of getting a random Cat pic...
Meanwhile you have a better chance of getting struck by lightning... while drowning at 1 in 183 million.
Personally I would be more concerned about someone walking up to the server and stealing it... or the server gets hacked due to a bug somewhere... which is why you should be using encrypted chat...
This is what an image looks like when it is sent to a group using encrypted chat:
https://matrix.org/_matrix/media/v1/download/matrix.org/qctIqdoPymLbqdNpOkWZGtvo
If you grab this file (which was a jpeg of a cat) you will notice that it it encrypted.
It's still less secure than Hangouts et al though because it only requires correctly guessing one key rather than two or more.
To access a privately shared image via Hangouts, you'd have to gain access to a whole account that has been granted permission to view the image, so you'd have to know both the username and the password, which is much harder to randomly guess.
Moreover, some accounts are configured with 2FA, further increasing the security.
This implementation is far from that, and I think addressing this would be worth doing at some point.
My understanding of your concern was that the media-id's which were being generated by Synapse, left users of Synapse open to a brute-force keyspace attack using a simple app (an understandable concern).
The Matrix specification does not provide details on media-id keyspace, so the keyspace for the media-id can be easily increased to increase security without issue, if required.
However a keyspace attack against the Synapse content repository API implementation is already infeasible, so no change is necessary.
Synapse is the reference implementation for the Matrix specification and adding user authentication to the content repository API would require a change to the Matrix Specification.
To propose changes to the Matrix Specifications see the following:
https://github.com/matrix-org/matrix-doc/blob/master/CONTRIBUTING.rst
PS If you are concerned about privacy, use encryption.
Encryption is nice, but if I have your file, I could deploy infinite time and resources to brute force that encryption. What you want is to make it as hard as possible for me to get your file in the first place, then encrypt it on top of that. That's why people are so reticent to hand over their phones or laptops to border patrol even when they use full disk encryption. Physical security matters perhaps even more than encryption.
As such, what concerns me here is it's so easy to gain physical access (in a sense) to random people's files by stumbling on a random file just by guessing a single key, rather than having to match at least two matching pairs. In other image sharing services, there are similar long, unique keys to access the image itself, but in addition to that you need to present valid account credentials and that account has to have been given explicit permission to view that image.
I do think it would be prudent add those additional layers of security here.
I totally agree with kethinov. I can imagine deploying fail2ban on the server to monitor 404 errors would slow down the attacker but still does not solve the main issue.
See also https://github.com/matrix-org/matrix-doc/issues/701 for the spec issue here.
It is highly unlikely someone could guess the media url, the key in each media link is reasonably long enough to prevent guessing. The more likely attack vector would be obtaining the URL directly somehow; perhaps it is accidentally posted into a channel or someone who already has the link shares it without permission, your browser has a toolbar that is scraping your URL entries without your knowledge, some other person in the channel has malware on their machine that is sending away data it is collecting from a channel they are participating in with you, etc.
Crossposting for the purposes of visibility (source):
I don't think this has been answered somewhere, so asking here in hopes people have ideas: How would federated media work?
In theory the server could start signing requests to download media, although that doesn't really guarantee that the person making the request is allowed to do so (ie: is in the room). With the upcoming introduction of users being linked to key-like objects, we could possibly use those to sign the requests, however there's nothing to stop a server lying about which user is requesting the media.
Then there's the question of the user potentially wanting specific media being publicly accessible. The primary use case being the IRC bridge which pastebins long messages.
So this comes up on a regular basis, especially from corporate security folks who don't like the idea that a URL leaked in HTTP logs (or proxy logs) etc could then be simply curl'd by any random user to access the content. It's not a matter of the chances of guessing the URL correctly (or the chances of being hit by lightning) but instead whether an attacker who does manage to get the URL automagically gets access to the content too.
One thing we could do is to auth access to the content itself, but this means tracking the event(s) that the content is referenced by and in turn which users have access to those events and so can view the content. This is a potentially nasty leak of metadata for e2e attachments which we don't currently have otherwise. (It's possible we might need this for quotas as per https://github.com/matrix-org/synapse/issues/3339, but hopefully not). It's also quite heavy for the media repo to have to check auth rules for a room for every piece of content that is viewed (and is a bit unfortunate if the media repo is otherwise independent of the room server).
An alternative naive solution could be to just track a random bearer token alongside each mxc:// URL for each piece of content, stored in the event and in the repo. Clients would then submit this bearer token as Authorization: Bearer <secret> whenever they query the repo, meaning that URLs can't be simply copy-pasted around the place unless the auth token is also provided. This might be enough, in practice?
I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea?
Edit: we could of course also mandate that the user has a valid access_token for the server too when they are accessing the media repo, although that doesn't lock access to any particular piece of content.
@turt2live did you have any ideas on how this should/could work?
Not too much beyond the verbose spiel above (which ends with "I have no idea"). In any case, we should consider having a way for users/bridges/bots to say "this is supposed to be unauthed" via the API for things like the IRC bridge.
How insane would it be to always end to end encrypt media regardless of room?
on second thought, encrypting everything doesn't really help. The authorization token probably makes the most sense, although I'm curious as to how the HMAC stuff would work.
For bridges, I suspect that users will end up having to request the file using a URL from the bridge, and the bridge would have to do the auth dance. Maybe we could add an endpoint that will return a time-limited download URL that the bridge can 302 the user to, so that it won't have to proxy the whole file. But this would allow to check that the original event hasn't been redacted.
Maybe investigate how this done in Hangouts?
alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public. (In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret))
It's worth noting that we probably want to support being able open media in a separate window, e.g. to view large images or PDFs etc, and I don't think you can make the browser add auth headers in those cases
there are ways of fixing that - e.g. have the client download the content itself with the right headers and then expose it to the user as a blob URL, which can then be viewed in separate windows/tabs etc.
I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea?
Turns out that the way we used to do it was to never send access_tokens in requests at all, but send an HMAC(method, url, access_token) and then use the access_token as a shared secret, so that a leaked URL wouldn't leak an individual user's access_token. I assume we didn't do this for Matrix because calculating that HMAC would be too onerous for trivial HTTP clients, hence passing raw access_tokens around. In practice it doesn't buy us anything in this instance, as the resulting URL could still be passed blindly around anyway; we might as well create a new random secret for each URL and use that instead.
(cf https://github.com/matrix-org/matrix-doc/issues/1043 for "access tokens suck")
What if each user would get its unique link to media or may be a common link with personal auth token, based on his id. When accessing media, the server could check that access token is correct for the user and the user is authenticated.
In reply to @ara4n:
alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public.
The reason that I suggested having the Bridge do the auth dance, rather than forwarding the secret in the querystring was so that a file that's redacted Matrix-side would become inacessible to bridged users.
(In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret))
I would just say that a file can be uploaded with a token or without a token. If it's uploaded with a token, then downloads need to be authed; if it's uploaded without a token, then it's a free-for-all.
In reply to @user318
What if each user would get its unique link to media or may be a common link with personal auth token, based on his id.
That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users.
That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users.
I do not actually know how it works in e2e. I thought that files are embedded there as a base64-encoded message. And not stored as media.
Messages have a size limit, so you can't store files within the message itself. You also don't want to send the whole file to everyone until they request it. e2e file events are basically just pointers to an encrypted blob in the media store, along with the decryption key.
I've written a spec proposal for solving this over at https://github.com/matrix-org/matrix-doc/issues/701, review welcome on the googledoc.
Is #1263 going to be taken care of with this change as well? I'm only seeing concerns of GDPR erasure, which I presume mean when someone deactivates and deletes their account. Right now its fairly easy to have a tragedy if an inappropriate attachment link gos out a bridge.
are there anything news ? does anyone try to re-implement this API to solve problem ?
Reading this thread, it appears most people mentioned brute force attacks or someone providing the URL to other people.
What I'm really concerned of is if somehow Google or other Search Engines end up indexing these images, because they are, after all, public URLs.
If someone posts the URL in public (like the OP of this thread), the image may potentially become indexed.
This Issue is an important one that needs to be resolved, especially on a project that takes Encryption and Privacy with high priority :)
Just relealised this issue. It's quite embarrassing to argue for matrix because of privacy, especially in the advent of GDPR and seeing this issue... If someone can share and post media links wherever they want its quite an issue.
Also the argument of probability of guessing the file url is bogous we have so many examples of unprotected amazon buckets, where the IDs get scanned by security researchers and other people.
of course one could guess also a token but bruteforce filters on 403 are more common than on 404.
When will this considerable security issue be fixed?
Is #7009 released ?
Let me crosslink to this discussion:
Is #7009 released ?
No, it should be released in the next version.
Is #7009 released ?
Unfortunately not yet. We're working towards getting Synapse 1.13.0 out of the door as quickly as possible since it's now pretty much overdue.
Note that #7009 will not add authentication to media, which would require a spec change - MSC2461 has been open for that purpose.
What #7009 does is to prevent browsers from leaking media URLs through referrer headers.
^ MSC701 (https://github.com/matrix-org/matrix-doc/issues/701) is another MSC on the matter with a slightly wider scope.
Most helpful comment
Encryption is nice, but if I have your file, I could deploy infinite time and resources to brute force that encryption. What you want is to make it as hard as possible for me to get your file in the first place, then encrypt it on top of that. That's why people are so reticent to hand over their phones or laptops to border patrol even when they use full disk encryption. Physical security matters perhaps even more than encryption.
As such, what concerns me here is it's so easy to gain physical access (in a sense) to random people's files by stumbling on a random file just by guessing a single key, rather than having to match at least two matching pairs. In other image sharing services, there are similar long, unique keys to access the image itself, but in addition to that you need to present valid account credentials and that account has to have been given explicit permission to view that image.
I do think it would be prudent add those additional layers of security here.