It is clear we will need a tool / protocol on top of IPFS to coordinate IPFS nodes together. This issue will track design goals, constraints, proposals, and the progress.
Things worth coordinating between IPFS nodes:
and more (expand this!)
Many of these require consensus, and thus we'll likely bundle a simple (read: FAST!) consensus protocol with ipfs-cluster. this could be RAFT (etcd) or Paxos, and does not require byzantine consensus. Though having byzantine consensus would be useful for massive untrusted clusters-- though this approaches Filecoin and is a very different use case altogether.
One goal is to represent a virtualized IPFS node sharded across other nodes. This makes for a very nice modular architecture where one can plug ipfs nodes into clusters, and clusters into larger clusters (hierarchies). This makes cluster a bit harder to design, but much, much more useful. Backing up of massive data (like all of archive.org or all of wikimedia, or all scientific data ever produced) would thus become orders of magnitude simpler to reason about.
The general idea here is to make ipfs-cluster provide an API that matches the standard ipfs node API, (i.e. with an identity, being able to be connected to, and providing the ipfs core methods).
:+1:
CC: @kyledrake
this is great stuff.
ping @DevonJames
As a sort of "hack" solution people can use right away, this gist allows you to make a quick service for pinning all data that's on a different IPFS node using a simple ncat one liner: https://gist.github.com/kyledrake/e8e2a583741b3bb8237e
This could be tossed into a cronjob and would basically replicate a single node as a mirror.
I see there being two different uses here:
The first is IPFS nodes replicating eachother, as in one IPFS node pins _all_ the content provided by a different IPFS node. Ideally there is a mechanism in the protocol to inform the node of new data on the node it's replicating.
The second use case is helping to back up a chunk of something crazy large, like the Internet Archive. There's no way a hobbyist could back up the entire thing, the best they could do is agree to host some of that content. If you wanted to do this without pinning, you would be agreeing to share some amount of information that you didn't previously agree to pin. It would be more like "I'll help pin 2TB of data for you of whatever you want". This introduces all sorts of questions, like how do we make sure that the data is being evenly federated across multiple nodes.
There are some big questions and I'm not sure there are clean and obvious solutions to them.
In terms of priority, I would generally put IPNS at a much higher priority than this, because I feel that whatever solution we came up with would sit on top of IPNS in some capacity. It seems to make more sense for the design of things to agree to replicate an IPNS pubkeyhash rather than a bunch of random IPFS hashes. That pubkeyhash could point to all of the data that you want to archive, and then it would be easier to break up that to figure out what you want to replicate. Otherwise you're right back to the location addressing problem (as you are with my ncat solution).
So, maybe this flow:
That would be a good start for replication anyways. Then the case of federating a portion of that data rather than the whole thing could be considered. There's also some performance questions there (the Internet Archive IPNS pubkeyhash would be pointing to an enormous IPFS unixfs object that changes quite often!).
I think we can reuse kademlias distance metrics to help distribute the content to some degree. Although, reassignments may make sense as the number of peers in a cluster grows. As long as peers in the cluster agree to abide by the rules, it should work out pretty well, and it synergizes (I actually get to use this word for real?) well with normal lookups. By the time you find the provider entries, you will likely have found the people actually providing.
@whyrusleeping I'm not super familiar with the tech behind this, but I think it's reasonable to assume abiding by the rules by nodes, since the usage in this case is intrinsically philanthropic and I'm not sure why someone would want to participate and also mess with the distribution at the same time. I also wouldn't expect any guarantees on the degree and evenness of federation from the originating nodes.
@kyledrake In terms of how to break things between nodes, a strategy similar to what IA.BAK is doing could work:
@kyledrake
The first is IPFS nodes replicating eachother, as in one IPFS node pins all the content provided by a different IPFS node. Ideally there is a mechanism in the protocol to inform the node of new data on the node it's replicating.
yep. mirroring mode. (though can actually be the same as the next case, if local_disk > pinset_size)
The second use case is helping to back up a chunk of something crazy large, like the Internet Archive. There's no way a hobbyist could back up the entire thing, the best they could do is agree to host some of that content. If you wanted to do this without pinning, you would be agreeing to share some amount of information that you didn't previously agree to pin. It would be more like "I'll help pin 2TB of data for you of whatever you want".
yep! this is what i mean by "collaborative pin sets -- back up large pin sets together, to achieve redundancy and capacity constraints (including RAID-style modes)." above.
This introduces all sorts of questions, like how do we make sure that the data is being evenly federated across multiple nodes.
Accounting, and historical consensus. i mean it to be auditable.
In terms of priority, I would generally put IPNS at a much higher priority than this,
agreed. This issue is here now because people keep asking about this (i wanted something to point them to), and to maybe inspire someone to take a stab at it.
because I feel that whatever solution we came up with would sit on top of IPNS in some capacity. It seems to make more sense for the design of things to agree to replicate an IPNS pubkeyhash rather than a bunch of random IPFS hashes.
It wouldn't be "a bunch of random IPFS hashes", it would always be a single IPFS head, which would point to the rest. And, one IPNS name can only point to one IPFS hash at a time anyway. In practice will want to use IPNS for this, yes, but to point to the accounting/allocation index instead, not directly to the data. (the metadata will point to/include the pinset, which points to/includes the data)., Such an allocation index could be an object like this:
parent: <parent-hash>
pinset: <pinset-hash>
members: <list-of-cluster-members-hash>
allocations: <allocation-log-hash>
and the IPNS name could point to it.
The allocation/accounting index mentioned here does not need to be exhaustive (i.e. include every hash) instead can work like the pinset, as it is possible to write a _precise_ allocation of all objects to all cluster members as a compact expression (trivial example is sharding with mod, though we would want something more clever here).
So, maybe this flow:
- A data archive publishes an IPNS pubkeyhash signing the IPFS hash you want to federate.
- The archiving node "subscribes" to the IPNS pubkeyhash, and then all IPFS data there is pinned.
- The archiving node is informed when the IPNS pubkeyhash changes, which tells the archive node to store the updated information.
Yep! something of this sort. +1 to the idea of signing the _allocations_. btw, the _allocations_ could be done automatically by the cluster-leader (a program who manages the replication, and is likely elected by consensus), or by a cluster-administrator (a program or person who created the cluster and may want to express manual allocations -- instead of getting automatic balancing -- according to some external user policy.
Maybe it's even possible to unpin data that's no longer referenced as an optional feature if you want it to.
this will already happen in dev0.4.0. better gc.
@whyrusleeping
I think we can reuse kademlias distance metrics to help distribute the content to some degree. Although, reassignments may make sense as the number of peers in a cluster grows. As long as peers in the cluster agree to abide by the rules, it should work out pretty well, and it synergizes (I actually get to use this word for real?) well with normal lookups. By the time you find the provider entries, you will likely have found the people actually providing.
:-1: For ipfs-cluster i want _explicit_ tracking of every single copy. I want to keep exact allocation logs for which node is storing what (this doesn't mean a big log, can use precise AND short expressions), this helps to know if things fail, and how to rebalance. i'd like to represent a strong auditable contract (i.e. if someone loses a copy, you know who it was, and that is actionable data to an organization). ipfs-cluster is meant to _also_ address the needs of orgs and groups of orgs to collaboratively back up critical data.
@kyledrake
@whyrusleeping I'm not super familiar with the tech behind this, but I think it's reasonable to assume abiding by the rules by nodes, since the usage in this case is intrinsically philanthropic and I'm not sure why someone would want to participate and also mess with the distribution at the same time. I also wouldn't expect any guarantees on the degree and evenness of federation from the originating nodes.
there could always be attackers, but yeah. lots of the use will be trusted.
but, regardless, for ipfs-cluster i would like to get some concrete non-byzantine scenarios working first (way easier), and only then attempt to reduce trust in the designs. (that said, i do want all the comm messages signed, so that node's signed "ACK" in a consensus round would mean binding agreement to replicate content according to the agreed-upon allocation. (i.e. you can see who failed to keep the promise, important in cross-org backing up of stuff).
@kyledrake
the Internet Archive IPNS pubkeyhash would be pointing to an enormous IPFS unixfs object that changes quite often!
There's an issue about that: https://github.com/ipfs/ipfs/issues/96
If you, say, just took the huge directory object as data and applied chunking by a rolling hash to it, you could have rather efficient updates.
What would be nice, would be a smart replication model. For example new data and data that is frequently accessed is replicated to more nodes, while old and infrequently accessed data is less replicated.
would it be possible to have per-object replication policies?
It could-- though would get trickier. We could have things like the pins
that mark subdags with a RAID type.
On Sat, Jan 2, 2016 at 15:19 Grant Haywood [email protected] wrote:
would it be possible to have per-object replication policies?
—
Reply to this email directly or view it on GitHub
https://github.com/ipfs/notes/issues/58#issuecomment-168426020.
I can see a way to achieve consensus in the ipfs-cluster using just Conflict-free replicated data types if all nodes in the cluster can be trusted not to lie (although someone from the outside could subscribe to the list and pin things). This gives for example option of rebalancing cluster in case of prolonged split and guarantees that information about pin added by one node operator will sooner or later (at first possible time) propagate through the network.
Those two structures allow cluster to operate conflict free and to balance the data, request one that is not distributed and so on.
Big data management
When node decides to pin only fragment of requested pin it would publish that it has this part pinned and publish percentage of initial file that it pinned. Other nodes while wanting to also pin parts of the file will look if part is not already pinned in the network and decide for parts that have minimal coverage. This gives as possibility of storing bit data in the cluster.
I think that there should be traffic and diskspace constraints that are set by the node itself.
E.g., in a file on the node there should be 2 parameters saved that limit the traffic and the diskspace that are contributed to the cluster at max.
A cluster operator should be able to see those restraints and the underlying replication server (auto-pin process) should take that into account.
@Kubuxu Why would a node decide to pin parts of a file? Isn't the whole point of ipfs-cluster that the node gets instructed to pin a certain file or part of it?!
Hi, last week I put some ideas together on this topic: https://github.com/hsanjuan/ipfsclusterspec/blob/master/README.md and how a pure on-top-of-IPFS implementation might look like. Hopefully it can serve as a start point for further iterations. It aligns a lot with @jbenet proposals although I left the vIPFS nodes aside for the moment.
I used IPNS to publish messages because it is the only way of passing messages around nowadays, but I have heard work is being done to provide a message-passing/subpub solution, which would allow to not abuse IPNS for this, so obviously this would have immediate application for implementing RAFT etc.
Hola. So, I started to work on an IPFS cluster using kubernetes for scheduling and Nginx as reverse proxy. How can I help to the development of ipfs-cluster?
(Externalizing some notes)
We want:
Likely construction (notes for discussion):
ipfs-clusterd is a "per-ipfs-node" service that talks to a given ipfs node process.ipfs-clusterd speaks with other instances of ipfs-clusterd with its own wire protocolipfs-clusterd COULD be mounted onto the ipfs-node)ipfs-clusterd exposes the IPFS API (over HTTP, like ipfs node)ipfs-clusterd process can respond to requests, but commits them to consensus protocol before externalizing the result.Some important principles here:
My use case: Guarantee that a given set of hashs is shared only among a given set of nodes. Would a cluster be the right tool for that?
And: Can a node be a direct member of different clusters at the same time, or is only a hierarchy/onion-like structure possible?
See the first ipfs-cluster design meeting notes https://github.com/ipfs/ipfs-cluster/issues/1


Design notes/discussions on ipfs-cluster should probaly happen in that repo now. it's graduated out of notes into a thing. I'll keep the issue open though cause the open/close thing makes it annoying for search.
Just found this issue after having received a helpful pointer from someone on IRC. I'm certain that you'd answer more questions if you changed the title of this issue to: "Storing data permanently on IPFS". More likely to show up in a google search. How permanent is data stored on IPFS? does show up in a google search, but it takes a lot of attention to find a link to this issue.
@petrsnm this is not a "how to" issue, check the FAQ for that. (issues like https://github.com/ipfs/faq/issues/47 or https://github.com/ipfs/faq/issues/93 which are appropriately named, have extensive explanations, and show up in google results). This is a development issue, not meant as an entry point.
cc @nicola
Hi everyone! In case you missed it, ipfs-cluster is a big thing now! Check it out

Most helpful comment
ipfs-cluster now has its own repo https://github.com/ipfs/ipfs-cluster
See the first ipfs-cluster design meeting notes https://github.com/ipfs/ipfs-cluster/issues/1