Beaker: Change Default Seeding Behavior to Seed Everything Forever

Created on 6 Aug 2019 · 10Comments · Source: beakerbrowser/beaker

Okay, this one is going to be a bit of a long one, but I think it's really, really fundamental!

I'd like to make a strong plea that you radically change Beaker's default seeding behavior, such that all accessed content be permanently seeded unless the user explicitly says otherwise by manually deleting local files. This would bring Beaker's behavior in line with the default behavior of all other significant peer to peer software. (The current behavior most closely matches BitTyrant, which was an experiment in how to be an evil peer.)

The current behavior of only seeding when actually simultaneously visiting the same page (or when manually opting in) means that there is essentially zero peer activity for even the most popular datsites, and always zero (non-hashbase) peer activity for less popular sites. For instance, I'm currently unable to access even the Beaker homepage via dat due to lack of peers:

Screen Shot 2019-08-06 at 6 44 49 PM

The Beaker homepage should be the most popular Dat of all time, and yet there aren't even enough clients to access that, likely because nobody is currently visiting the site. If that's not enough for the project itself, it will never, ever be useful for low traffic sites like people's personal homepages, let alone experimental new services as they're getting off the ground.

However, if this is changed so that every person who has ever accessed the page is seeding, then suddenly this page goes from being unusable to _extremely_ robust - which should be the point of this whole thing! That's the part when we can start building new types of services and things get really interesting.

I really like Beaker and Hashbase so far, but for it to ever be useful for anything "real", this change needs to be implemented. Otherwise, it simply reduces to being just another browser that accesses a single server for the overwhelming majority of cases, and the whole experiment is basically pointless.

As this is peer to peer software, seeding all consumed content is actually what I _expected_ the default behavior to be, as that's been the default case in all P2P software since Napster. (I only found out this wasn't the default behavior when I realized I was unable to connect to what should be well-seeded dats, like the Beaker homepage. Even then, it was a bit confusing - are 'cached' items in my preferences seeded? I still don't know, but I don't think so.)

Since this project originates from Portland, I'm guessing that the original reason for this design decision was probably political in nature, as some people won't want to redistribute politically incorrect content. However, there is an obvious response to this - if you don't want to redistribute unpleasant content, _don't consume unpleasant content!_ (Obviously that's not always possible since you don't always know what's on a website in advance, but even then it shouldn't be too difficult to manually unseed recent content.)

There may be some other policies to explore here, such as seeding until an upload ratio or quantity have been met, but for the health of the network, I strongly suggest that the default behavior match software such as Transmission, where all content consumed is content shared.

Anyway, I've got quite a bit of experience building P2P services with BitTorrent clients/trackers and with WebP2P services using WebRTC, and I'd like to start using and contributing to Beaker in a major way, but I think this issue is going to be really fundamental to the success of the network. Happy to chat more about it if you're open to considering this idea!

Thanks for reading,
Rich

discussion

Source

Miserlou

👍3

Most helpful comment

I think the CP case described is pretty much obscure enough to be irrelevant

I think people would do it to smear the technology, to smear an individual in a targeted fashion, or for the lulz. If we're talking about copyrighted material or malware, people would do it to maintain uptime on content that they want access to. I don't think it's an edge-case; it's the first thing I'd do if I had an interest in any of that.

A malicious actor can already use the vanilla web to ... so on and so on. Obviously, these are real threats to web users and should be taken seriously, but it doesn't mean that we should block all JavaScript or POST requests, or only visit websites that our social network has whitelisted

You're right that we do need to match the response to the size of the threat.

So let's imagine a hypothetical scenario that the people of Hong Kong are protesting a new authoritarian law.

Dat is a peace-time network. It is not designed to protect against state actors. It could be adapted for that but it's not now. Access to Hashbase is not the only problem they'd have.

The peer discovery DHT is not resilient to an ISP shutdown.
The network traffic is encrypted but not masked and so an ISP could detect and block it.
The peer discovery DHT broadcasts reader IPs which would put them at risk.

In the scenario you're describing, you really have 3 options: 1, build a network that's designed to operate covertly in existing ISP networks, 2, build a highly secure intranet, or 3, do it all in person.

To put it even more plainly: reseeding doesn't imply any endorsement of content, but it is a way to ensure a healthy network that everybody can participate in.

"To ensure a healthy network" is what I want us to solve.

There are three policies we've brought up so far. seed-forever-by-default, seed-temporarily-by-default, and leech-by-default. Even though I'm arguing for leech-by-default, I'm not resolved on the best choice (which is why I'm having the discussion).

Currently beaker is seed-temporarily-by-default and I think there's a strong case to be made that it's necessary to handle spikes in visitor traffic. I'm much less sure of seed-forever-by-default.

I could see making seed-forever-by-default a policy you could opt into in the settings page. I could see making all 3 options available to people. Obviously the default policy will be the most common so we still need to have the discussion.

Why is this trust necessary?

It's a fairly minimal form of trust we're talking about here: trust that they're not going to publish illegal content.

It's also not necessary. You can manually seed any site you like without it being a part of your social graph. Therefore I'm not overly concerned about your points about peer pressure; there are alternative mechanisms for when you want to less-publicly seed things.

I'd much rather have a world where we're all able to browse and share ideas with some level of privacy without censorship or corporate control.

I personally believe we should be able to read freely and privately but publish with responsibility.

pfrazee on 8 Aug 2019

❤2

All 10 comments

Hi Rich,

First a quick aside: the issue with the Beaker site seems to be some kind of bug in the Dat networking stack. There are dedicated peers for it but I get the same connectivity issue. I have no idea what's causing it and I've been waiting for Dat 2.0 to land (which rewrites a lot of the stack) before I dig into it.

Beaker actually originates from Austin! And this decision isn't about political speech. It's much more motivated by resource constraints -- you can only seed so many sites. Legal concerns are a second thing to consider: it wouldn't take much to get users seeding CP without them knowing it. In fact, that's a concern I have with the current design of auto-seeding upon visit, not just with the continuous seeding that you're proposing.

In the next version of Beaker, we'll be adding an identity and social system. Users have personal sites and follow each others'. We can use that to create a social-seeding mechanism that has more accountability: you can choose to seed sites published by people you trust. The upside of that approach is it derisks the legal concerns.

With any approach, we'll need to have the browser balance the number of seeds against the available resources.

pfrazee on 6 Aug 2019

👍2

Hi Paul! Thank you for your swift reply!

🤠 Texas is the reason! Nice! 🐮 👨 🔫 (..why did I think Portland? Oops! No offense intended!)

Resource constraint is for sure a valid concern, but I think there is room for vastly more permissive sharing policy even with those constraints, such as seeding until a value or ratio, or time shared is achieved.

Do we have any statistics for what those resource constraints actually _are_, as a starting point for the conversation? I'm imagining that since we're only talking about websites in most cases, the constraints for the average user are far, far lower than the equivalent in BitTorrent. Still, a modern BitTorrent client on an ordinary PC can fairly easily host hundreds to thousands of multi-gigabyte torrents, each with thousands of chunks and swarms in the hundreds, so I really don't see any reason why a P2P web browser would need to be leech-by-default in the name of resource constraints.

I suppose that legal concerns could be a valid reason, but I think that changing a default behavior in order to defend the legal rights of hypothetical child pornography consumers is a really counterproducitve position to take for default behavior of the flagship client, as it immediately kneecaps any censorship-resistance utility of the client. As I mentioned before, the easiest way to avoid redistributing CP with Beaker is to _not consume CP_. Furthermore, the current policy of only simultaneous-seeding doesn't actually mitigate any legal risks, it simply narrows the exposure window, so the point is a bit moot anyway.

However, I actually think that this is actually straw man argument, and if anything a seed-by-default policy will _reduce_ the amount of nasty things consumed on the web. And, really, Beaker/dat make no claims about anonymity, which is what enables the nefarious activity on Tor to begin with. (I have been a Tor HS op for many years, but this problem has driven me away from Tor entirely, which is why I am so eager for dat to succeed in the first place!) As I mentioned before, I was _expecting_ that everything on the P2P web would be seeding by default, and I expect most other users do as well.

I have to say that unfortunately, as a user/developer/operator, a social identity system baked into the client is the exact opposite of what I want right now, at least at the protocol/client layer. A policy where an inside group of popular people have better content distribution than new users is not the peer to peer web I want to see. I'm old enough to remember the GPG web-of-trust problem and this seems very similar to that. Social publishing will be great, but that seems like an application-layer decision, not a protocol/client decision, as it comes at the cost of all of the other exciting possible applications of a peer to peer web (curing link rot, censorship avoidance, lower cost content distribution, more liberated publishing, and on and on and on..)

Please don't kill the peer to peer web by making leeching the default behavior! It will be impossible to grow the network! If resource constraints are the primary concern, let's figure out what those constraints are and build a policy that maximizes sharing while being minimally constraining! Be a kind neighbor, seed by default!

Cheers,
Rich

Miserlou on 7 Aug 2019

Thank you for your swift reply!

Thank you for engaging me on this! It's important we have these discussions, and I appreciate the thought you're putting into your position.

First, on the resource usage: let's drop that point, because it's going to apply for any policy we choose. Beaker is going to need to balance its resource usage in all cases. Back to the more substantive discussion.

My concern about CP is an edge-case but it's an important one: you visit somebody's site and it seems to be fine but it turns out there's CP nestled into a subdirectory. (Or: copyrighted material. Or: malware.) You have no idea that you're hosting that content because all you did was click on some link and you never saw that content. You never knew you were consuming it.

We can discuss an approach where only the files you view are seeded, but it'll be a difficult cat-and-mouse game. People will find ways to get the unsavory content downloaded... iframes hidden in the page, the download() API, etc.

If we come to decide that fundamentally the P2P web can't work without default-seed, then let's give it a look. But since we're engaging in engineering tradeoffs, I think it's worth looking at alternative mechanisms such a social seeding.

Some quick comments on ideology and why I'm introducing an identity/social layer. This discussion does connect to free speech and personal responsibility. I believe in both and I believe they are linked. Should you be able to publish anonymously? I think so! But should your visitors help broadcast your anonymous sites by default? I question that. I think visitors should either decide to seed a person because they trust in them, or they should decide to seed a specific site because they believe in it. Broadly speaking, our network is a self-governing society and it needs the tools to self-govern effectively, or else we risk somebody else coming in to govern for us.

You're right that social-seeding lends toward a world where early-adopters and the well-connected get better uptime. That's a tradeoff that we'd need to correct against, just like seed-by-default has its tradeoffs to correct against.

And, really, Beaker/dat make no claims about anonymity, which is what enables the nefarious activity on Tor to begin with.

Quick note on this: I'm not entirely sure how I feel about the current situation. I don't believe that people should have their browsing history broadcast to the world. Publishing should come with responsibility, but should visiting? There are plenty of legitimate things that people would want to read privately: medical information, legal information, history, politics. Broadly speaking: I don't want users to ever be afraid to click on a link or type a search query.

pfrazee on 7 Aug 2019

Thanks again for your swift and thoughtful reply!

Let me try to unpack some of that.

I think the CP case described is pretty much obscure enough to be irrelevant, and certainly too obscure to be the basis of the fundamental behavior of the network, for the specific reason that most of the nastiness described can already be done on the normal web. A malicious actor can already use the vanilla web to surreptitiously download illegal content to a victims's local storage, redistribute that content to open directories or the FBI or god knows where, scan a victim's internal work to drop malicious executables in network shares, mine BitCoin, redress websites to hijack credentials to a victim's bank accounts, take creepshots with a webcam, and so on and so on. Obviously, these are real threats to web users and should be taken seriously, but it doesn't mean that we should block all JavaScript or POST requests, or only visit websites that our social network has whitelisted, or live in wooden shacks in the forest. The threats are just too unlikely to be a major worry for normal web users going about their routine business.

Now, let me describe a far more likely scenario.

I think it's very likely that the peer to peer web could become popular in places like Hong Kong, or Cuba, or Australia, or Iceland, where simple geography means a lack of redundant connectivity to the main internet, and provides a chokepoint for authoritarian regimes to control the flow of information into and out of a country.

So let's imagine a hypothetical scenario that the people of Hong Kong are protesting a new authoritarian law. Since the physical internet goes through mainland China, access to Hashbase is blocked and most of the internet is slow or completely unavailable. Meanwhile, protestors are trying to discuss ideas for a new constitution for an new independent nation of Hong Kong.

In this scenario, the most vital thing is that _everybody in the country_ has _equal and maximal_ access to read and contribute their viewpoints about what should be done on a peer to peer web. It is the duty of the network to maximize universal connectivity and participation, and to ensure that ideas of popular people, the protest leadership, or the more technically inclined have don't have better priority than anybody else. For the sake of democracy, all voices should be heard, not just those with more internet friends. (Furthermore, not only is equal and maximal access the most morally and technically robust behavior for the network, it's also the correct security behavior, as those who become powerful social seeds also make the best targets for censors and goons.)

To put it even more plainly: reseeding doesn't imply any endorsement of content, but it is a way to ensure a healthy network that everybody can participate in. (As an example, if I download and seed _Pulgasari_, it doesn't mean that I think it's a good movie, but it does mean that more people have greater diversity of films that they can consume.)

I think visitors should either decide to seed a person because they trust in them, or they should decide to seed a specific site because they believe in it. Broadly speaking, our network is a self-governing society and it needs the tools to self-govern effectively, or else we risk somebody else coming in to govern for us.

Why is this trust necessary? I don't trust the New York Times, but I still read it and want other people to read it. I think it's far better for diversity and access that the network be as minimally governing as possible. Trust should be de-fanged. To me, "somebody else coming in to govern for us" sounds like a pretty good way to describe social seeding. The "Twitterati" using social pressure to decide what people can and can't share sounds like a nightmare scenario to me. What happens when somebody gets "cancelled" and doxxed by a left-wing mob because they're re-seeding an interesting article about limiting immigration, or by gets attacked by a right-wing mob because they're interested in reseeding information about feminism? Will our so-called friends browse all of our reseeds to judge our moral purity? It's easy to see how this could devolve into ideological mudslinging right from the get-go. The health and redundancy of the network should be detached from it's content as much as possible, and detached from the real identities of the users even further.

I don't believe that people should have their browsing history broadcast to the world.

Isn't this exactly the problem that social seeding _creates_? If my social network can see what I'm seeding, and reseeds are considered endorsement, then we will all be judged on what we're seeding, and we'll develop a constant anxiety about who is seeing what we're choosing to browse and reseed.

I'd much rather have a world where we're all able to browse and share ideas with some level of privacy without censorship or corporate control. My neighbor won't be able to attach my browsing history or my political beliefs to my real identity, and because of how dat's lookup works (peers only swarm when they know the common address), nobody will be able to deduce my browsing history without already knowing it.

If we come to decide that fundamentally the P2P web can't work without default-seed, then let's give it a look.

I strongly, strongly believe that this is the case, and that the symptoms of other treatments are worse than the disease. What will it take to show that the P2P web can't work with leech-by-default? Should there be more voices and experts in this discussion? Should we run some statistical simulations about the connectivity of the graph? I'm happy to engage on both.

I'd also like to point out that it'll be a lot easier for us to be permissive now and then to find solutions to problems as they arise later, rather than to try to anticipate problems in the future and "solve" them before they exist at the expense of the health and growth of the network. It also means that our solutions to future problems will be in response to the way they actually manifest, which we cannot yet actually anticipate. So, I think it's important that we get this sorted out sooner rather than later.

(Okay, that's probably enough for you to chew on for now!)

Cheers again,
Rich

Miserlou on 8 Aug 2019

I think the CP case described is pretty much obscure enough to be irrelevant

A malicious actor can already use the vanilla web to ... so on and so on. Obviously, these are real threats to web users and should be taken seriously, but it doesn't mean that we should block all JavaScript or POST requests, or only visit websites that our social network has whitelisted

You're right that we do need to match the response to the size of the threat.

So let's imagine a hypothetical scenario that the people of Hong Kong are protesting a new authoritarian law.

Dat is a peace-time network. It is not designed to protect against state actors. It could be adapted for that but it's not now. Access to Hashbase is not the only problem they'd have.

The peer discovery DHT is not resilient to an ISP shutdown.
The network traffic is encrypted but not masked and so an ISP could detect and block it.
The peer discovery DHT broadcasts reader IPs which would put them at risk.

To put it even more plainly: reseeding doesn't imply any endorsement of content, but it is a way to ensure a healthy network that everybody can participate in.

"To ensure a healthy network" is what I want us to solve.

Why is this trust necessary?

It's a fairly minimal form of trust we're talking about here: trust that they're not going to publish illegal content.

I'd much rather have a world where we're all able to browse and share ideas with some level of privacy without censorship or corporate control.

I personally believe we should be able to read freely and privately but publish with responsibility.

pfrazee on 8 Aug 2019

❤2

@pfrazee when was this Dat 2.0 planned to land?

I am having some issues with Beaker again, when dat-cli works fine vs. Beaker Browser having all connections failed

nettiopsu on 13 Aug 2019

@nettiopsu We're trying to get dat 2 and beaker 0.9 in beta by the end of september at the latest

pfrazee on 13 Aug 2019

I think this is a very important and profound discussion that needs to be had and I'm glad @Miserlou raised it.

Paul said: _"I personally believe we should be able to read freely and privately but publish with responsibility."_

I would suggest, as many have before, that in fact we are entirely responsible for our own experience of this world and because self-censorship is a bitch, I would argue that it's this way round:

_I personally believe we should be able to read (consume information) freely, privately and responsibly, but publish freely._

We cannot be responsible for other people's experience (that is their responsibility). So publishing responsibly is a loose and dangerous term in my book. Because any authoritarian can decide what they don't like to hear or see is "irresponsible". So I would prefer an "it is forbidden to forbid" bias and the responsibility placed on the consumer of data over the producer of data.

I could after all make a very strong case that it's highly irresponsible to state that Dark Matter exists, when paper after paper is flying off the peer-reviewed journal presses stating that all evidence points to the contrary. The reader has to decide what passes for nonsense in this world.

As far as CP is concerned. Certain things are illegal, there are bad people in this world. Likewise pavements are hard and made of concrete. We could cover them with polystyrene or we could entrust people not to fall and hurt themselves. Some people will have accidents, and some people may inadvertently or by nefarious means get CP on their HDs. But do we sacrifice so much freedom, because wickedness exists in the world?

I don't have answers I'm afraid, but I do have a bias, and that is toward decentralisation and toward liberty and personal responsibility and I hope eventually Beaker will choose this leaning.

Personally, I think the default should be "seed for this session", but it should be easy for people to edit their seeding preferences per site. So for example. let's say I like where both Paul and Miserlou are coming from (even though they may disagree) I would then say seed permanently, because they seem to be intelligent and sincere types unlikely to stoop to highly immoral practices.

Again, this is such an important conversation, especially since Beaker is the vanguard for this DAT experiment.

Thanks for listening and sorry my two-pennies worth was more pounds and pence.

parfish on 30 Sep 2019

We cannot be responsible for other people's experience (that is their responsibility). So publishing responsibly is a loose and dangerous term in my book. Because any authoritarian can decide what they don't like to hear or see is "irresponsible". So I would prefer an "it is forbidden to forbid" bias and the responsibility placed on the consumer of data over the producer of data.

Bear in mind that we're not debating censorship. We're debating whether users ought to automatically host other people's speech for free. There are complexities to the options which is why I'm still undecided on the issue, but I don't think we can characterize either position as authoritarian.

pfrazee on 30 Sep 2019

In that case I don't understand what "but publish with responsibility" means.

I didn't characterize anyone's position (here) as authoritarian. I just don't know who decides what's responsible.

I guess a lot of this comes down to how people are going to find content on the DAT network and whether that comes down to a popularity (re-seeding / trust) contest. I'm reminded of the line in The Big Short: "The truth is like poetry and most people **' hate poetry".

parfish on 30 Sep 2019

Was this page helpful?

0 / 5 - 0 ratings