A really useful privacy feature would be to be able to strip off any metadata from images (or audio or video) before uploading them to the server. This could be done at the server itself, but then the server would be able to sniff it and it wouldn't work with E2E uploads which are probably the most important ones anyway.
So instead we should probably implement it as an option for the client to mangle the image before uploading to strip it out.
(Given we already have the ability to mangle file-uploads before sending them to the server, when we do E2E encryption on them clientside before sending, this isn't impossible, although could be quite fiddly to get right and not pose performance problems).
http://jsfiddle.net/mowglisanu/frhwm2xe/3/ has a relatively plausible looking example of stripping exif metadata from JPEGs. The right place to insert this is around https://github.com/matrix-org/matrix-react-sdk/blob/443ab1add73390176478fc7ecd1a334aa157e833/src/ContentMessages.js#L293 - either replacing the file with a new file object pointing to the updated data stream, or changing all the downstream code to work on data streams rather than file objects.
actually, https://github.com/matrix-org/matrix-react-sdk/blob/443ab1add73390176478fc7ecd1a334aa157e833/src/ContentMessages.js#L237 is another option: just before we do the upload, we could check the mime type here for image/jpeg etc and then load the file into RAM and mangle it before sending (and then pass the same buffer to the e2e code as needed).
another complication is that if we strip off EXIF colour profile metadata we should be re-compressing the image to 'bake in' the correct profile first. the right solution here is probably to leave colour profile meta intact.
One concern with loading it into RAM would be issues like https://github.com/vector-im/riot-web/issues/4264
For some reason i couldn't get this out of my head, so i've pushed a totally untested and almost certainly broken proof-of-concept to https://github.com/matrix-org/matrix-react-sdk/commit/a0eea2a2713a6f32f40ad0cf8f6d56b404a60f62. If someone felt like picking it up and testing/finishing it it'd be hugely appreciated, as I should be focusing on organisational stuff atm rather than writing code, sadly :(
(see also https://github.com/matrix-org/matrix-doc/issues/558)
https://github.com/matrix-org/matrix-react-sdk/pull/1307 ended up being the PR for this, but we couldn't get it to work, and it bitrotted and got closed :( It should still be resurrectable by some kind soul in future though.
Just some 2ct: I want to use matrix as a kind of shitty dropbox for sharing files including photos with perfectly crafted exif metadata. I wouldn't want matrix to mangle the files in any way, they should be bit for bit identical when I download them again.
:arrow_right: so IMHO stripping exif should be configurable, although probably enabled by default.
after a brief 3 year hiatus, https://github.com/matrix-org/matrix-react-sdk/pull/1307 now implements this.
Slack is now offering something similar: https://yro.slashdot.org/story/20/05/11/2051209/slack-now-strips-location-data-from-images . Does it make any sense at all to also allow this in synapse, so that every client wouldn't have to implement it separately? There are toolkits designed for this: https://0xacab.org/jvoisin/mat2
This is to prevent even your synapse knowing the metadata
@t3chguy I get that it's better to remove it on client than server. But as with many things in matrix, there can be different levels of doing stuff (not only e2e, but unencrypted rooms with ssl encryption between servers) and I think it could help many clients if there was a synapse fallback for clients that don't support removing metadata from all different media formats.
Oh and of course the fact that if you use synapse to do it for in an encrypted room then you'll leak that media.
Sure, but aren't also url previews disabled for encrypted rooms. I think public/unencrypted rooms may actually benefit more from having the publicly available media scrubbed of metadata
Huh? This is for media uploads, not URL previews.
it was a comparison
Most helpful comment
Just some 2ct: I want to use matrix as a kind of shitty dropbox for sharing files including photos with perfectly crafted exif metadata. I wouldn't want matrix to mangle the files in any way, they should be bit for bit identical when I download them again.
:arrow_right: so IMHO stripping exif should be configurable, although probably enabled by default.