Mastodon: Load images from IPFS

Created on 16 Dec 2016 · 31Comments · Source: tootsuite/mastodon

It would be great if mastodon was using ipfs to load/store images. This way if a server goes down the content would stay up if someone else has loaded it. It would also reduce the bandwidth on the server hosting the original content.

To elaborate only the backend needs to run IPFS. The users could but that needs to be optional.

deployment suggestion

Source

geir54

👍6

Most helpful comment

I'm now adding IPFS media backend support by writing custom Paperclip Storage (mecab/paperclip-ipfs-storage) and integrated to mastodon. The storage is built on top of hjoest/ruby-ipfs-api so it is not depends on Golang. 😊

If you are interested, you can try it from my fork. (note you should use ipfs branch). It has an issue which is unstable connection with Docker so I will need to fix it before raising PR.

Here is the screen capture. You can see the image is served via IPFS gateway.

screen shot 2017-05-10 at 1 27 32 am

Anyway I'm just posted here to show current progress. Any suggestions and questions are welcome 😆

Possible related issues: #477, #1847

mecab on 9 May 2017

🎉6 👍6

All 31 comments

Related to #477 but has a more limited scope (just media), which sounds more feasible, as alternate media backends already exist (i.e. S3)

ineffyble on 18 Jan 2017

Although I'm a big fan of the ideas in IPFS, I would prefer it not to be a depdency of Mastodon.
Mastodon already depends heavily on Ruby and Nodejs; adding IPFS would significantly complexify instance creation because it requires a Golang environment and a rather huge set of dependencies.

ProgVal on 3 Apr 2017

If you are interested, you can try it from my fork. (note you should use ipfs branch). It has an issue which is unstable connection with Docker so I will need to fix it before raising PR.

Here is the screen capture. You can see the image is served via IPFS gateway.

screen shot 2017-05-10 at 1 27 32 am

Anyway I'm just posted here to show current progress. Any suggestions and questions are welcome 😆

Possible related issues: #477, #1847

mecab on 9 May 2017

🎉6 👍6

I'm worried about not being able to delete files. What does everyone else think?

Gargron on 9 May 2017

👍4

I would really like to take this and work on it at some point. I think that running an IPFS server alongside a mastodon instance as an optional enhancement (instead of using S3) makes a ton of sense—decentralized file storage for decentralized social media.

@Gargron I think that not deleting things is probably a red herring as far as our application goes. If the status gets deleted then there will be no references to the media file, and there will be no way to find it except by knowing the hash. Furthermore, it can (and should) also get unpinned by any of the nodes that are pinning it, causing them to garbage collect it, meaning that the only ways it would still be accessible is if any non-mastodon IPFS nodes had mirrored it. (same as status deletion, basically)

we should not use the IPFS gateway though—instead, mastodon instances should have their own local gateways and we should document setting that up. There's some bikeshedding to be had here about how to make these gateways non-public (some nginx magic?)

nightpool on 9 May 2017

❤2

re: deletion, I'll try asking the IPFS team if they'd consider best-effort deletion as an improvement to the protocol.

EDIT: this may be impractical—I don't know if IPFS nodes currently have any way of tracking the ""originator"" of content, so there's no real way to authenticate a deletion request. more investigation needed

nightpool on 9 May 2017

👍5 👎1

@Gargron Yes, I understand that not being able to delete files could be a problem, but I agree to @nightpool that the files are slightly going be deleted as the hash got forgotten.

One concern is that some bad guy will spread the media's URL (i.e., with the hash) in other places after the toot has been deleted. However in this case he/she will try to paste the original content (instead of the URL) if we don't use IPFS.

mecab on 10 May 2017

Avoiding use of public gateway but setting up special gateway that hide the hash but it checks the availability of the original toot then proxy the contents from IPFS network could solve the problem.

mecab on 10 May 2017

You see often screenshots of tweets (in the media) and i download pictures that i like. So you can't really delete something from the internet.

The idea of IPFS is to archive the internet when people care about the content (e.g. pin it).

davidak on 10 May 2017

😕1 👍1

@davidak Yes, it is what I wanted to say. I'm for that no way to delete the media could be not a big problem.

mecab on 10 May 2017

@mecab Does your implementation support unpinning files that should be deleted?

ProgVal on 16 Jun 2017

@mecab Other question (sorry for the notification spam): is it possible to migrate an existing Mastodon instance to the IPFS storage, without breaking existing URLs?

ProgVal on 16 Jun 2017

Is this branch still compatible with the current state of mastodon? If not, would you be willing to port it @mecab?

wxcafe on 26 Sep 2017

@ProgVal Oops, I found your mention now, sorry.

Does your implementation support unpinning files that should be deleted?

Currently not, but I think it is possible to implement.

is it possible to migrate an existing Mastodon instance to the IPFS storage, without breaking existing URLs?

Could be. But it needs some script to done it.

mecab on 27 Sep 2017

@wxcafe

Is this branch still compatible with the current state of mastodon?

I'm sorry but I'm not sure about that since I cannot have enough time to develop recently. If IPFS support is urgent, it is no problems for me that you implement using my pieces of code, or start from scratch.

I think it needs just few modification to adapt to current state even if had conflicts. But please note I have still not resolved this (https://github.com/mecab/paperclip-ipfs-storage/issues/1) issue.

mecab on 27 Sep 2017

it's not urgent (last comment before mine was over 3 months ago ^^), just checking. Thanks for the work you've already done on this 👍

wxcafe on 29 Sep 2017

I see, thanks!

mecab on 29 Sep 2017

IPFS should not be used in my opinion.

I want my stuff to be deleted if I want it to be deleted (even if rogue instances or people can copy it).

Never being able to delete pictures should not be the default behavior. (As in, it shouldn't assume that I want it to be archived forever and ever using IPFS)

This could be an option in the settings if possible.

ghost on 25 Mar 2018

👍3

@Lionirdeadman IPFS does not automatically replicate your content. Unless someone manually pins the content, the content will only be cached temporarily by other nodes that access it.

ProgVal on 25 Mar 2018

But that still means it's forever on the instance my account is on. Does it not?

(I may have misunderstood the talk above)

@ProgVal

ghost on 26 Mar 2018

@Lionirdeadman It's forever on that instance, unless the instance unpins it. (search for "unpin" in previous messages in the thread)

ProgVal on 26 Mar 2018

If I understood correctly, unpinning only forgets the location of the data and not the data itself.

So my data still exists there and I don't want that.

(Please correct me if I'm wrong)

@ProgVal

ghost on 27 Mar 2018

unpinning causes your instance to eventually forget the data.

ProgVal on 27 Mar 2018

Eventually? How much time would it generally take for it to be completely gone/overwritten/deleted after being unpinned?

I feel that if it depends on instance activity, it's a bad idea because you can't guarantee to the user that the data will be deleted in any kind of timeframe and this could be legal trouble too.

ghost on 27 Mar 2018

It looks like scheduled garbage collection is disabled by default, but collection after reaching the watermark is not.

A bit on your earlier comment:

IPFS is content-addressed. When you ask the network to find a file, you are giving it the content (or at least a 'fingerprint' of it) and obtaining locations to download from. This is roughly the reverse of more traditional/common location-addressed systems like website urls.

If you add something to your repo, it's 'private' as long as only you know the hash. As soon as one other person knows, keeping it private relies on trusting them (and this doesn't apply if it's leaked in public, of course).

If others know the hash but the file is unique (meaning only you own it), then they can't get it unless you bring your client online while the the file is added to your repo--but if the file is small enough that someone could 'guess' it via brute force, then it's also no longer private. That probably isn't likely unless they already have a significant fragment of it and know how you chunked it (if you didn't use the default)--if not, then even 1 KiB represents 2^8192 - 1 possible files.

Nearly every possible file has a unique hash (or at least the chances of collisions are extremely low). For an example, I hashed "lol" (with a linefeed at the end because echo does that on unix) while offline on one machine and then (on a different one) queried the DHT for the resulting hash with ipfs dht findprovs QmQsZSD... and found that it already existed on nine other hosts. You can do this for any possible value, but well-known and/or short values will be much more likely to pop up.

There's more I could say here, but this comment is already pretty long. Hope this helps and that it isn't too pedantic.

petersjt014 on 27 Mar 2018

Eventually? How much time would it generally take for it to be completely gone/overwritten/deleted after being unpinned?

I feel that if it depends on instance activity, it's a bad idea because you can't guarantee to the user that the data will be deleted in any kind of timeframe and this could be legal trouble too.

The instance can trigger a flush of its cache (or only remove your content from the cache)

If you add something to your repo, it's 'private' as long as only you know the hash. As soon as one other person knows, keeping it private relies on trusting them (and this doesn't apply if it's leaked in public, of course).

No. As for most systems with a DHT, IPFS "gossips", which allows other people in the network to get the hash of the content. See: https://github.com/ipfs/faq/issues/181

Edit: I just realized this line is actually a very strong argument against using IPFS on Mastodon. Too bad :(

ProgVal on 27 Mar 2018

Why have this in the first place if other hosts can simply cache the image and content? Why use IPFS if it can leak data (which may or may not be personal in the case of a direct toot)?

I feel there is little benefit and that users should be in control of whether or not they want their data handled this way.

ghost on 27 Mar 2018

Why have this in the first place if other hosts can simply cache the image and content?

Why have what? pinning?
Pinning makes sure that even if the original uploader disappears, someone else will still have it, even if the content hasn't been accessed in a while.

Why use IPFS if it can leak data (which may or may not be personal in the case of a direct toot)?

IPFS can deduplicate content across instances and prevents an instance from being a single point of failure.

I feel there is little benefit and that users should be in control of whether or not they want their data handled this way.

Yes. One possibility would be to only push content to IPFS if the toot is public (or unlisted?)

ProgVal on 27 Mar 2018

Ah, right. That makes sense. I was wondering why there's almost always a trickle of traffic--that'd be the gossip then.

For distributed content replication, something key-based would probably be best. There's at least one standard for this, and Zeronet is a working example (I think they implement the BEP or at least something like it).

petersjt014 on 27 Mar 2018

Yeah, I think pushing it for public would be good but it should still be optional to the user.

As for unlisted, I'm not sure.

@ProgVal

ghost on 28 Mar 2018

It seems that in many ways the current implementation acts very similar to what IPFS would offer. IPFS pinning means the original instance is still on the hook for the required storage. One benefit is file-based deduplication, but this is possible without IPFS too. Both with and without IPFS it's obviously a not perfect solution because the same image can end up with a different hash based on compression levels, dimensions, or format. There is an outstanding issue for deduplicating #2317 (which is unfortunately most difficult due to the necessary data migrations on quite large tables required). IPFS and an IPFS gateway would also be an additional deployment dependency. Closing this.

Gargron on 17 May 2018

Was this page helpful?

0 / 5 - 0 ratings