Notes: pub/sub - publish / subscribe

Created on 9 Oct 2015  Â·  77Comments  Â·  Source: ipfs/notes

We've known for some time we need to layer a pub/sub system on IPFS. We should try to reuse other things. The least work, the better. But it should be a simple protocol, easy to implement, well-layered, and meshing well with the rest of IPFS abstractions.

Requirements

  • _very, very fast_
  • flexible (maybe different topology-forming algorithms)
  • multiple modalities (single publisher, multiple publishers, etc)
  • support both encrypted and unencrypted streams (encrypted again, this is above the regular libp2p encryption -- and specific to the pub/sub group)
  • support privately encrypted channels (ie user supplied keys)
  • layers over IPRS to do discovery

We need to:

  • [ ] do a survey of relevant {literature, protocols, and implementations}.
  • [ ] decide on a protocol
  • [ ] build it into libp2p.

I likely won't have time to do a proper survey until late Nov or Dec. If you'd like to speed this along, post links to _great_ {papers, systems} here for me.

Relevant to research:

Candidate Open Problem private content pubsub topilibp2p

Most helpful comment

@fazo96 Matrix is indeed split into server<->server and client<->server API, but that doesn't stop the client running in the same device/app as the server, at which point we have more p2p semantics rather than being so-called "client/server". The server<->server bits of it are currently HTTP-discovered-by-SRV-records, but we have some longer term plans to look at a purely p2p soln for server<->server using something like libp2p (https://github.com/matrix-org/GSoC/blob/master/IDEAS.md#peer-to-peer-matrix).

The account thing isn't a big obstacle as we support mapping any types of identifiers (public keys, email addresses, MSISDNs, whatever) onto matrix IDs, which are currently @username:domain style identifiers. Again, in future we're looking at switching to a fully decentralised DNS-less ID system - probably public keys again - as per https://github.com/matrix-org/GSoC/blob/master/IDEAS.md#decentralised-accounts.

Eitherway: the main thing that Matrix brings to the party is a set of semantics for eventually-consistent distribution of real-time comms data - specifically, a timeline of messages (expressed as a DAG, which may bifurcate and re-form), and a simple set of key-value data. One could layer this on top of IPFS semantics, or keep it in its own layer (as we're currently doing). Whatever happens, Matrix is called Matrix because we're obsessed with building bridges to everything - so it should be trivial to build bridges between an IPFS pubsub ecosystem and Matrix's decentralised communication DAGs :).

If you have any questions, drop by https://vector.im/develop/#/room/#matrix-dev:matrix.org.

All 77 comments

Cc #42

It might be easier start with a basic [slow] implementation before doing the high performance multicast P2P thing. For example, this paper (https://www.cs.utexas.edu/~yzhang/papers/mad-info09.pdf) has two modes depending on how active the group is.

The basic implementation would be as simple as:

  • Pick the IPNS address you want to subscribe to changes at
  • Append signed message with nodeID and TTL/expiry to DHT at IPNS address
  • Owner of IPNS address checks all subscriptions and sends a notifyIpnsUpdate message to each node

Encrypted streams with group-specific encryption, multicast and multiple publishers can then be deferred to the more advanced implementation.

I guess the basic could be made even simpler by using the backlink ideas (ipfs/ipfs#31) instead of a new interface to the routing. You then only need a new P2P message to notify a node of changes.

I stumbled upon this "Decentralized Reddit using a DHT to store content and a blockchain to rank it"
https://news.ycombinator.com/item?id=10391996

The first comment seems particularly relevent:
liamzebedee 21 hours ago
Re: the hosting of topics/subreddits in the DHT, I've done quite a lot of research [1] into a very innovative yet not well known P2P publish-subscribe network design [2] from some Norweigan computer scientists that removes the role of hosting for nodes not interested in a topic, even designing a decentralised microblogging platform on top of it [3].

[1] http://liamz.co/wp-content/uploads/2015/03/Computer-Science-Extended-Essay_Liam-Edwards-Playne.pdf
[2] http://acropolis.cs.vu.nl/~spyros/www/papers/PolderCast.pdf
[3] BitWeav http://liamz.co/wp-content/uploads/2015/03/whitepaper.pdf

I'm reading the papers, but it seems like some very interesting discussion for pubsub. In particular Poldercast's finding a subset of nodes with a particular interest and only those nodes host that interest reminds me of IPFS's policy of not downloading anything unless you ask for it.

I'll read the papers before I make any specific pub/sub recommendations.

@bharrisau agreed on starting simple to get something working and moving to more efficient constructions later.

@spikebike [1] link is broken.

@davidar @jbenet thanks, right, fixed.

The xmpp stuff proved really hard in comparison to federation doing pubsubhubub.

Maybe not new to folks here, but new to me:
https://hackpad.com/Probabilistic-data-structures-7UPPH2soDvw

Now I see that discussion about the use of Bloom filters is already underway:
https://github.com/ipfs/ipfs/issues/31#issuecomment-55875124

See also http://www.w3.org/TR/push-api/ (HT @nicola)

Came across this (IPFS) while looking for pubsub + rpc over webtorrent. Out of the all existing pubsub and rpc protocols, the best one that comes close to practical use is WAMP / AutoBhan. It facilitates both pubsub and rpc over web (means browser to browser rpc + pub/sub) and light-weight enough to run on IOT devices (raspberry-pi etc.).

Its easy to setup and highly performant. However, the major problem with WAMP is: it is 'routed' / 'brokered'. If there is a way this WAMP can be integrated with webRTC/bittorrent (webTorrent), for P2P, then it would pave way for next generation IOT apps.

For PubSUB - here is one sample functionality what we are trying to achieve with IOT:

  • Imagine large file served through bittorrent (means, many chunks of the file are served from multiple sources)
  • Now, imagine these chunks are all updated by different sources / sensors independently (like rows in a databases)
  • Whenever a chunk is updated, all the readers connected to that chunk should be updated with that new data. Just like BitTorrent, except the connection between the file chunk and reader 'stays alive' (like comet / long-poll of http).

Similarly for RPC - here is one sample functionality that we are looking to achieve:

  • Imagine large scientific data file (once again spread across hosts as chunks served by, say bittorrent), each chunk containing large set of records
  • We should be able to run a function over all the records, but the data should not be moved across machines. Rather copies of the function should get executed as an RPC call over each host (with only part of the data local to that host), similar to HDFS + map-reduce.

Is it possible to achieve above kind of functionality with IPFS? If not yet, I would suggest to strongly consider integrating WAMP as pubsub+rpc part of the protocol (rather than reinventing the wheel). Autobahn implementations of WAMP comes with clients for many languages and performance metrics are also very good.

@KrishnaPG as soon as pub/sub is implemented, you should be able to do all that! You can use IPNS to expose each node's chunks. You can aggregate the data by having a list of the nodes on each node. For RPC calls, you can't do that using IPFS, but you can have the nodes download the code to execute from somewhere using IPNS, publish their results, then you'd have to aggregate them.

You can't really do the RPC stuff you described but your usecase would work really well using only IPFS for all your networking (assuming it has pub/sub implemented) you just need to implement that functionality in a different way.

Also keep in mind that IPFS has global deduplication :+1: But pub/sub is not implemented yet.

Thank you @fazo96 .

Yes, you are right - Doing RPC (moving the code to the data and executing it locally on each chunk) may require an additional layer on top of IPFS (which involves additional functionality that involves job queues, schedulers, retry mechanism and result aggregators).

If we look at it, fundamentally RPC is kind of _opposite_ to the basic file-read or pub/sub (in terms of data flow direction).

For example, in a simple file-read/copy, data goes from _the disk/chunk_ --> _the client/reader_. Whereas, in RPC the data (the code to be executed) has to go from _the client_ --> _the disk/chunk_ and get executed, and the results should either go back to the _client_ (if the size is small) or get stored as additional files/chunks on the local machine (if the results data size is large).

This requirement to _be able to create additional chunks/files_ on the host locally may need support from the base IPFS, though.

On the other-hand, there is another radical way to look at this.

That is, treating every operation (including file_read, copy, delete etc.) as RPC call, and allowing _transformations_ over basic operations. For example, consider this

                Fn()
client ------------------> chunk

In a normal read operation, the Fn = get_me_chunk(), withget_me_chunk being the usual built-in file-read Op.

And when we need to execute, say do_something on the chunk it would become Fn = do_something(get_me_chunk()) with both functions getting executed locally on that chunk. The client would send the do_something code to the chunk and get back results as usual (or gets back the details of additional chunks created and stored as results).

Fn here can be thought of being similar to HTTP Verb (GET/PUT etc.). The verb can take pre-defined functions (the usual CRUD ops), and also custom-defined ops (where the function code is passed along as the request body).

This model treats RPC as first-class citizen (where all the regular operations, such as file CRUD operations and notifications are implemented on top of RPC). Not sure how difficult/easy it would be to do this with present architecture, though.

For the pubsub, wondering how easy/complex it would be to reuse/integrate the Autobahn. If it can be done, then RPC comes for free on top of it (Demos).

AFAICT, autobahn requires websockets, that's too much, we need something more basic that can run over any transport. we also need a pub/sub that can scale to millions of subscribers -- this isnt going to cut it: http://wamp-proto.org/why/#unified_routing -- we need a protocol that creates spanning trees/dags with fan out on its own, using measured latencies/bw in the network, etc. basically, a serious protocol from the multicast literature.

@KrishnaPG you should read more into how IPFS works and how the protocols work. suggest also looking at:

Thanks @jbenet I was looking for the spec info, your pointers are helpful.

As for WAMP, yes - it started out as websocket based initially. Now, it is decoupled and works with any message based transport (http://wamp-proto.org/faq/#is_websocket_necessary_for_wamp)

However, my intention in pointing to WAMP was not to use it as is, but rather to adapt its pubsub+RPC part of the spec while removing the broker part (replacing it with whatever routing the ipfs uses, dht etc..)

WAMP is a perfect protocol for IOT requirements, but the strong dependency on router/broker is a deal-breaker.

@KrishnaPG ah ok, good they generalized it

Hi,
I see a lot of discussion on how to implement pub/sub, but not really about the semantics of pub/sub for IPFS (is it that too obvious?).
How will pub/sub be exposed to users (the API)?
The idea is to have something as simple as

machine-1$ ipfs sub <topic>
<hash-1>  # receive when this is published
<hash-2>
...

machine-2$ ipfs pub <topic> <hash-1>

machine-3$ ipfs pub <topic> <hash-2>

or I am missing something here?

@fsantanna

I think the bare minimum pub/sub should expose this kind of interface:

ipfs.sub(target_node_id, handler)

handler would be called every time IPFS finds a new hash published by target_node_id. Of course a more complicated API could be built, this is the bare minimum but still very useful implementation.

CLI version:

$ ipfs sub $target_node_id
/ipfs/...
/ipfs/...
# A new line is emitted every time a new record is found

Re RPC: we've discussed this briefly before, but I'm not sure what the actual plans are

@fazo96
In this case, the subscriber has to know about (and subscribe to) every single potential content provider of his interest.
Shouldn't pubsub decouple publishers from subscribers?

Hi,
I couldn't resist and made a proof-of-concept implementation of pub/sub. :)

https://github.com/fsantanna/go-ipfs/blob/ipps/IPPS.md

I describe the desired API, show some examples (with screencasts), and discuss the naive implementation.

Thanks,
Francisco

One more question, what about "promises" or "futures"? It would be great to be able to get some "bucket" (ID) in advance and once value is available user can add it under that bucket, resolving the promise/future. Of course the issue here is that in plain IPFS hash is based on the content, but content is not yet known at that moment.

Then pub/sub could be seen as a series of such promises. Where one value would tell you the next bucket on which to wait for next value. And so on. (Not sure how performant this would be, but it could be a nice minimal API.)

I was thinking about p2p pubsub yesterday, not directly tied to IPFS, but the idea may apply since my toy system is a Kademlia network of content addressed storage servers. Joe Armstrong's gittorrent schtick has infested my mind as has IPFS & Tahoe-LAFS.

My idea for a solution was to have subscribers store matchers (on key/values, xquery, bytecode) on the network that need refreshing. These matchers are distributed like a Kademlia network distributes data: to the nodes that are nearest to some origin key. The matcher's keyS for distribution would be derived from the keys the matcher checks.

Now subscribers have matchers located near where the data is expected to arrive, and these matchers know where to send any matches. So once a message is distributed according to it's keys (really thinking email style key/values + body, or a standard json like body linked to a blob on the network), it hits a node that has a matcher that matches, sending the message to the subscriber/matcher owner.

For one to many I was thinking of matching on the publisher's key used to sign the message. That would be authenticated too. Authenticated many to many may be tricky. Unauthenticated spam channels would be arbitrary key/values: X-Geolocation: ABQ. Combine with an authenticated Sender spam may be limited.

And this could probably morph into a real-time map/reduce network. But there's my drive-by help.

BTW, how does pub/sub relate to streaming? Will IPFS support streaming (is there a ticket for it)? So that one could stream movies or never-ending streams of data in real-time?

Had very similar thoughts to @mitar over the weekend. One nice property is that you can implement one-way pubsub trivially without changing anything about the current IPFS client or network, and without introducing new p2p message types of any kind. Here's a writeup on the topic:

Proposed low-perf IPFS/IPNS pubsub

Thanks @eboto .

After going through you document, _one-way pubsub_ means:

  • Not directly possible to discover new content, since you need a channel identifier before hand.
  • Only the key owner can publish to that channel, since you need IPNS.
  • Publishing is centralized, since you need a precomputed forward link attached to the current post.

Is this correct?

In any case, not requiring any changes to IPFS is a huge benefit!
It is a good drop in replacement for RSS.

@eboto: nice work and write-up!

One thing I wasn't clear on was how this would differ in performance vs simply periodically polling the user's IPNS directly? If Alice shared /ipns/alicepubkey/blog/1 widely, I could poll for that until it resolves. Once it does, I could poll for /ipns/alicepubkey/blog/2 periodically, and so on. (Maybe she also publishes /ipns/alicepubkey/blog/index too, which lets me get speedy random access to any of per posts.)

@fsantanna @noffle thanks for reading the post!

@noffle You may be right. I was working on a few assumptions -- are they incorrect?

  1. I had assumed pushing preferable to polling. I thought enough pollers would challenge whatever system is responsible for name resolution.
  2. I had assumed the wantlist provides a push solution, as I understand that the wantlist gets satisfied by a peer publishing that it has acquired a particular block, which I interpretted as a _push_.

Is there anywhere I can go to read more about how the router and wantlist behave? I have read the IPFS paper already. Telling me to just go read the code is also OK =)

@fsantanna, I hope I understand your points well. These are I think some workarounds:

HOW TO DISCOVER NEW CONTENT

It's true that with this scheme you do need to receive exactly one publication event to discover a content stream. But that link could be provided in various ways that facilitate discovery. e.g. Alice embeds Bob's first publish event as an ipfs:// link to the PayloadTree in her webpage. By clicking, and therefore wanting that link, her viewers can now subscribe to Bob too.

I think this is exactly how content gets discovered currently: you go to reddit and that links you to the content you're actually interested in.

MULTIPLE PUBLISHERS AND DISTRIBUTED PUBLICATION

You could simulate multiple-owner publication (and distributed publication) by creating a small network of peers that re-publish each other's work.

e.g. Imagine Alice and Bob are co-authoring their marriage vows vows.txt, which they want to share with Mama and Papa.

  • Alice and Bob subscribe to each other with the scheme in my previous post.
  • Mama subscribes to Alice. Papa subscribes to Bob.
  • Alice publishes an event with payload I, Alice, take you, Bob, as my husband. to /ipns/alice/vows/1
  • Bob and Mama receive the event, updating their vows.txt. Papa still doesn't have the event.
  • Bob immediately publishes an event _with the same payload_, but on his _own_ stream: /ipns/bob/vows/1
  • Papa dereferences /ipns/bob/vows/1/data, and receives the data from either Alice, Bob, or Mama (each of which have already pinned a copy of the payload). He updates his vows.txt and puts Bob's next update (/ipns/bob/vows/1/next) on his wantlist, continuing his subscription to Bob.
  • Bob, not to be left out, publishes an event with payload I, Bob, take you, Alice, as my wife.
  • Papa and Alice receive the event, updating their vows.txt.
  • Mama receives the event via Alice the same way that Papa received the first via Bob.
  • At the end, all 4 vows.txt copies say I, Alice take you, Bob, as my husband. I, Bob, take you, Alice, as my wife.

This solution sucks a bit because clients that subscribe to republishers pay the IPNS delay cost for each index of graph distance they subscribe to away from the original source. (e.g. Papa in the example above had to wait for Alice's IPNS to update, then for Bob's to update, before he could receive the payload from Bob)

You'd also need some scheme for concurrent update management and preventing infinite republications.

DISTRIBUTED PUBLICATION

You mentioned that publication is centralized in this model. That is true. However that may not be a problem considering these points:

  • Knowledge has to come from somewhere, so initial publication of some new datum is likely to come from a single peer.
  • If I understand the wantlist correctly (and its very likely that I don't) I think that although publishing is centralized, _distribution_ is distributed. A peer that is distant from the publisher in the network should be able to receive both the resolved publication event and that event's payload through the network.

Why not integrate the Matrix protocol into IPFS? It could function as a pub/sub layer and would allow an IPFS node to interop with many other existing projects.

@amstocker looks like a client/server protocol, I don't think it's the right fit for IPFS Pub/sub because some nodes will need to talk directly and a direct TCP connection won't be possible and UDP packets won't arrive (for example due to NAT, or restrictive firewalls).

This could be solved by relaying on other nodes and it doesn't look like the Matrix protocol was built with that in mind. Matrix also uses an account system, while IPFS uses asymmetric encryption for authentication.

We can learn from it for sure, but it doesn't look like a fit.

@fazo96 Matrix is indeed split into server<->server and client<->server API, but that doesn't stop the client running in the same device/app as the server, at which point we have more p2p semantics rather than being so-called "client/server". The server<->server bits of it are currently HTTP-discovered-by-SRV-records, but we have some longer term plans to look at a purely p2p soln for server<->server using something like libp2p (https://github.com/matrix-org/GSoC/blob/master/IDEAS.md#peer-to-peer-matrix).

The account thing isn't a big obstacle as we support mapping any types of identifiers (public keys, email addresses, MSISDNs, whatever) onto matrix IDs, which are currently @username:domain style identifiers. Again, in future we're looking at switching to a fully decentralised DNS-less ID system - probably public keys again - as per https://github.com/matrix-org/GSoC/blob/master/IDEAS.md#decentralised-accounts.

Eitherway: the main thing that Matrix brings to the party is a set of semantics for eventually-consistent distribution of real-time comms data - specifically, a timeline of messages (expressed as a DAG, which may bifurcate and re-form), and a simple set of key-value data. One could layer this on top of IPFS semantics, or keep it in its own layer (as we're currently doing). Whatever happens, Matrix is called Matrix because we're obsessed with building bridges to everything - so it should be trivial to build bridges between an IPFS pubsub ecosystem and Matrix's decentralised communication DAGs :).

If you have any questions, drop by https://vector.im/develop/#/room/#matrix-dev:matrix.org.

@ara4n thanks for the detailed reply! I didn't know the Matrix protocol was so wide in scope.

@eboto have you started working on your pubsub proposal ?

reacting to @fsantanna :

  • Not directly possible to discover new content, since you need a channel identifier before hand.
  • Only the key owner can publish to that channel, since you need IPNS.

Is there anything preventing the secret key not to be that secret, but shared among those allowed to publish changes ? It could even go public.

  • Publishing is centralized, since you need a precomputed forward link attached to the current post.

Knowing the secret key, the forward link payload could be known as well.

Is there anything preventing the secret key not to be that secret, but shared among those allowed to publish changes ?

@mildred I'm wondering the same thing. I want to create shared folders of files that specific people can modify but that anyone in the world can read.

So is this likely to be the way forward? https://github.com/libp2p/go-floodsub

To everyone following this thread, check out the latest Tutorial published by @pgte on how to use PubSub to create Real-Time apps over IPFS

image

Full discussion here: https://github.com/libp2p/research-pubsub/issues/18

Folks here might be interested on this thread https://github.com/ipfs/ipfs/issues/244

Is there a description of how the floodsub implementation works?

+1

How many hops does Floodsub currently propagate from sending node? Thanks!

Is there a description of how the floodsub implementation works?

  1. Peers advertise the topics they're subscribed to to all connected peers.
  2. When receiving a message on a subscribed topic for the first time, a peer will forward it to all connected peers (except the sender) that are subscribed to that topic.

If this sounds really dumb, it is. That's why it's still "experimental" (there's a lot of room for optimization).

How many hops does Floodsub currently propagate from sending node? Thanks!

Forever (cycles are prevented by remembering recently forwarded messages).

So peers that aren't subscribed to a topic won't propagate it?

@RangerMauve yes. It would be nice to decouple subscribing and broadcasting (to allow passive rebroadcast nodes for certain topics) but you can currently just emulate that by subscribing and throwing the results away (with little overhead).

We don't want all nodes to just rebroadcast everything they receive. That would be a perfect DoS vector.

@Stebalien Is there anything in place for discovering peers that are interested in a given topic?

@Stebalien
Q1) +1 for @RangerMauve question on discovering peers topic subscriptions

Q2)
Also, if two peers subscribing to the same node are connected via a third node (not subscribing to topic), would that third node not bridge the two subscribing nodes by rebroadcasting? I tried this with 3 separate IPFS nodes, by bootstrapping the first two nodes located behind firewalls with only the address of the third node, and the two nodes received topic messages. So I assumed it was rebroadcast by the non-subscribed man in the middle ... did I miss something here?

@RangerMauve Sort of... I'm not sure about js-ipfs but the go-ipfs ipfs pubsub command has an optional 'discovery' option that uses the DHT. Unfortunately, this is kind of a hack. To register "interest" in a topic, one uses the provider system to provide a block with the content floodsub:$topic_name and peers looking for other peers interested in the topic search for peers providing that block.

@jachiang

would that third node not bridge the two subscribing nodes by rebroadcasting

Are you sure the other two nodes weren't directly connected? IPFS tries pretty hard to connect to new peers and can bypass some NATs via hole punching. I'd take a look at the output if ipfs swarm peers on both of those peers.

@Stebalien So is the floodsub:$topic_name pattern the way to go for bypassing the DHT not allowing arbitrary get/put? :P

Thank you for the information! Would a new pubsub implementation be welcome if it conforms to the same public interface?

@Stebalien Thanks for the answers!!! @RangerMauve thanks for the great questions I am also learning from!

@RangerMauve Sort of... I'm not sure about js-ipfs but the go-ipfs ipfs pubsub command has an optional 'discovery' option that uses the DHT. Unfortunately, this is kind of a hack. To register "interest" in a topic, one uses the provider system to provide a block with the content floodsub:$topic_name and peers looking for other peers interested in the topic search for peers providing that block.

So the topic needs to be known apriori? If a secret channel topic is not made public for example, there is no way it can be exposed?

@Stebalien So is the floodsub:$topic_name pattern the way to go for bypassing the DHT not allowing arbitrary get/put? :P

How does OpenBazaar put profile info into DHT? Is it only on their OB network I suppose?

@RangerMauve

So is the floodsub:$topic_name pattern the way to go for bypassing the DHT not allowing arbitrary get/put? :P

It's not really an arbitrary get/put. It puts a record mapping CidOfBlock("floodsub:$topic_name") to the peer's ID.

Would a new pubsub implementation be welcome if it conforms to the same public interface?

You're free to do so but it may break in the future. There's a reason this is marked experimental (although we will try to avoid breaking things too badly).

@jachiang

So the topic needs to be known apriori? If a secret channel topic is not made public for example, there is no way it can be exposed?

You do need the topic to subscribe to it (there's no "list all topics" command). However, your node will tell all of its peers which topics it's interested in so they're hardly secret.

How does OpenBazaar put profile info into DHT? Is it only on their OB network I suppose?

Yes. We explicitly do not want to allow arbitrary data on the DHT (too easy to abuse). The correct way to do that is with IPNS (IPNS record points to a "profile" stored in IPFS). IPNS is currently a bit... finicky in practice.

Thank you for all the details @Stebalien!

From the README, it looks like the JS version of the DHT isn't totally implemented. How does pubsub work in the browser if the DHT support is iffy?

As I said, that particular peer-finding mechanism is a bit of a hack. I doubt that JS uses it. @diasdavid?

Is there any progression on the pubsub front?

I've been thinking about the state of pubsub, was wondering if I could get comments on the following:

I think that constructing a minimum spanning tree (MST) for groups is the way to go.

  • Use one of the algorithms out there for Distributed Minimum Spanning Trees
  • For the "distance" use the logical distance between peer IDs

    • Using latency is hard because you'd need to connect to peers before knowing what your latency is

    • Logical distance is what the DHT already uses for logic, so you're likely to be connected to peers with a close distance to you already if you've bootstrapped into the dht

  • Individual nodes that may not be subscribed to much will end up getting more traffic, but there will be less traffic overall in the network
  • Instead of relaying based on listeners, join "networks" with a given ID and broadcast all events through your network
  • Have peers publish to the DHT with which pubsub network they're on for bootstrapping the MST
  • Find more peers by querying the peers you're connected to
  • Subscribing becomes filtering what data your application will react to when events are passed through the node

    • This allows fancier patterns for subscribing like MQTT wildcards

  • Network IDs can be hashes of public keys and packets should be automatically encrypted with the key using AES or something

    • Maybe allow for signing instead of encryption so that there can be nodes that only consume and relay and can't publish

  • Allow for cleartext network names for where security isn't a consideration (a la current pubsub)
  • Maybe network ID is based on a hash of a network description object in IPLD (name, encryption type, etc)

Distributed minimum spanning tree does look like the right approach for this, I have to say.

@RangerMauve interesting!

About the "distance" function, I'm thinking that poorly connected nodes could impact the network if the latency or failure rate is not taken into account.
Discovering new nodes will be done through some sort of gossip protocol, right?
Idea: why not also spread the perceived "health" of nodes in a cooperative way?
I think there are already some models for this, from the top of my head:

Perhaps this could bring some type of decentralised measure of "health", which could then be used to calculate distance?

@pgte The distance function is there to determine which nodes are logically "close" to each other in order to determine who should connect together to form the minimum spanning tree. The main benefit is that it's computationally cheap to determine the logical distance since all you need is the ID of the peer, a node can calculate it's distance to any peer without connecting to it or knowing anything else about it. I'm taking this concept directly from the kademlia DHT design (which is the DHT used in IPFS)

I think peers will only bother learning about the nodes connected to their immediate neighbors, so some sort of gossip protocol will be useful here. I'm still reading up about the different strategies out there for distributed MSPs so I'm not 100% sure what the requirements are yet.

I like the idea of adding weights based on what a node things the "health" of their peers is, I'd be worried about spreading that information though, because two nodes may have a poor connection between each other, but another node may have a good connection to both based on what their physical relationship is.

I think that for an MVP of a MSP type pub-sub protocol, adding in health measures will complicate the implementation, but it should definitely be investigated once we can actually construct a MSP.

@pgte https://github.com/pgte The distance function is there to determine which nodes are logically "close" to each other in order to determine who should connect together to form the minimum spanning tree. The main benefit is that it's computationally cheap to determine the logical distance since all you need is the ID of the peer, a node can calculate it's distance to any peer without connecting to it or knowing anything else about it. I'm taking this concept directly from the kademlia DHT https://en.wikipedia.org/wiki/Kademlia#System_details design (which is the DHT used in IPFS)

Gotcha.
I think peers will only bother learning about the nodes connected to their immediate neighbors, so some sort of gossip protocol will be useful here. I'm still reading up about the different strategies out there for distributed MSPs so I'm not 100% sure what the requirements are yet.

I like the idea of adding weights based on what a node things the "health" of their peers is, I'd be worried about spreading that information though, because two nodes may have a poor connection between each other, but another node may have a good connection to both based on what their physical relationship is.

I think that it's exactly what both the Swim.js protocol and the Accrual failure detection model solve: it gives a cooperative measure of health, not a one-sided measure (the swim.js less so).
I think that for an MVP of a MSP type pub-sub protocol, adding in health measures will complicate the implementation, but it should definitely be investigated once we can actually construct a MSP.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/ipfs/notes/issues/64#issuecomment-362608541, or mute the thread https://github.com/notifications/unsubscribe-auth/AAC7Jhr2UhfYsk9_lC0xA1SOKC6uVS2Lks5tQyI5gaJpZM4GL9fG.

I've been tinkering around and here's a demo of an algorithm I've made for building a MSP.
In our case the actual weights don't matter too much because they're arbitrary logical distances.

  • Every node will first boostrap by getting a list of random other peers.
  • It will then sort those peers by their logical distance using an xor of their IDs.
  • It will then choose the closest peer that it isn't already connected to and connect to it

I've made a demo with 32 nodes which boostrap with a list of 6 random nodes.

Here's the link

I'm not sure how viable this is yet, so I'll work on other visualizations in order to catch possible partitions forming.

I've updated my demo to better visualize what's happening.

I've found that having nodes find a random list of other node IDs, then sorting them my logical distance and connecting to the closes one, I always get a fully-connected tree.

I'll explore logic for visualizing messages being sent down the tree (for pub sub) and dropped connections.

@RangerMauve what you're describing really is just a network topology mesh, where your route priority metric is driven only by logical distance.

Look at the Babel routing algorithm, there is a lot we can learn from that. Batman-adv as well.

@paralin Yes. I'm not focusing on routing data across the mesh at the moment. I'm focusing on forming the mesh in a way where individual nodes know every little about other nodes or the shape of the graph itself.

The existing pubsub connects to nodes at random and has a chance of having "supernodes" which have too much bandwidth going to them and network partitions which prevent groups of peers from sending events between each other.

Once an efficient and self-healing mesh is set up, it can be a basis for the protocols that will be built on top (pubsub to start, but other things can probably make use of it).

My initial goal is to make a more efficient pubsub.

I love what the Kademlia DHT does for discovering peers that have a resource, but there needs to be something for forming a persistent network for distributed applications that need to get data to all nodes interested in it.

@RangerMauve Wrt. Babel: I am suggesting we use it as a reference for more reasons -

  • They also limit knowledge of the network topology to only the next hop.
  • They avoid routing loops (black holes).
  • They successfully balance multiple routing metrics, and have experience with understanding which to prioritize when.

The same goes for batman-adv, except that project focuses more on radio transmission optimizations.

This matches with what you're talking about with mesh balancing. It's the same problem we are solving. Routing in the typical internet sense is the same thing as routing here.

It might be a good idea to look at generalizing the concept of routing a bit, and applying it in pub-sub as well as content and network routing.

@paralin Sounds good. Is the RFC a good place to start?

Babel looks like a protocol that should be used after the actual connections are made between nodes. I'm still at the stage where I'm figuring out which nodes should connect to which, but after that this will be great for making the actual traffic over the mesh be more efficient.

@RangerMauve I see what you're saying, they are actually two different problems actually when you consider the structure of the open internet, and don't limit connections to link-local.

In a way though it is actually the same thing. Consider this: all of the peers online in the world are actually just next-hop peers in the babel algorithm. What you're doing is trying to figure out what subset of that global set to connect to, and then prioritize further based on that. But it's still the same problem.

The 'brute-force' solution would rank all of the known peers based on available information about them, used to compute a routing metric. You then sort by the routing metric, chose the top N nodes, connect to those, etc.

Babel fits here as a mechanism of generating that routing metric. It's one of many. I think maybe a pluggable system for deriving routing metrics might be a good approach to this. You can run the fast route scoring algorithms first, before exploring a more advanced plan. It might be possible to map this to some sort of opportunistic graph solver.

@paralin I get what you're saying, but I think that's a bit heavy and requires a node to know a lot more peers. With my current algorithm, a node can search the DHT for the first 20 random nodes it finds, and then connect to the mesh without focusing the load too much in once place. All it needs to do to decide who to connect to is their ID without having to somehow measure latency or any other metrics about them. Otherwise a node would need to connect to the network, learn about potential peers and how they might relate to themselves, and then actually connect to the network. Though this will be more efficient in the long run, the initial setup is likely going to be too long for the time it takes for a typical web app to load.

@RangerMauve We are in agreement, I'm just generalizing what you're describing a bit.

You chose a next peer to send traffic to, then you build a connection to that peer, then you send the packet to the peer. The question of where is best to send the traffic is the same as the question of where to connect to next. What you are proposing, is to start with no information about the peers, and to randomly connect to some from the DHT, and then optimize from there. This is the same as what I described in my last comment - you start with no information, you make a decision at random, and then as you learn more information about the network, the routing can be optimized further.

Babel is just one example of a protocol to quickly exchange information about available routes in a closed network. It does not apply to this issue directly, but it is an example of an efficient way of exchanging information about the network between peers that happen to already be connected.

If we structure the code in such a way that these routing metrics, and the way they are prioritized and balanced together, are pluggable, then we can actually start to make use of the traffic metrics in LibP2P to make better routing decisions.

There must be some way to make a incentive-based system like BitSwap, but for incentivizing routing packets through the network. (Different problem, of course :))

@RangerMauve It's a little bit like a particle filter in a way... If you make a measurement of the ping to a peer, for example, the confidence and derived weight that information has on routing decisions can decay over time (temporal decay) as well as when you have other detected changes in the network topology (event-based invalidation). If we have the structure described above for pluggable metrics, you could get really fancy really quickly with algorithms to make, probably.

The BitSwap ledger is already based on bytes sent and received, so I'd assume it should be relatively simple (and I do mean _relatively_) to extend it to also account for routed bytes (maybe as a separate category, or with some weight etc. since message sizes are likely negligible compared to blocks)

Oh, and I don't know if someone's already looked into Chord, but there's been some work on how to build pub/sub systems with it:

  • This paper gives the basic idea
  • This paper shows how to implement equality, suffix, prefix and contains predicates for subscriptions
  • This paper gives an overview of content-based pub/sub systems, including DHTs

There are also a couple of different pub/sub-like schemes over Pastry:

I'd argue that mimicing their approaches might be a good way to go. There's likely similar systems on top of Kademlia, so it might be worth while to look into them. Chord, for example, is provably correct and functions well even with high churn.

edit: for an example of a wildly different approach, there's Secure Scuttlebutt that builds on the Scuttlebutt gossip protocol. Having a gossip protocol in libp2p doesn't sound bad at all.

revenge of the edit: @spikebike talked about PolderCast some years ago, and as he said, it seems like an extremely good choice.

I'm curious if you folks are aware of what these folks are up to.
https://github.com/libp2p/go-floodsub/pull/67

Submitting aeron for us to read about and investigate:

It seems to be a publisher/subscriber system that uses efficient UDP multicast or inter-process communication to get the job done.

What's the relationship between this discussion and floodsub? Is floodsub the "official" pub/sub for IPFS, or is this discussion completely independent?

Was this page helpful?
0 / 5 - 0 ratings