Peertube: Add video server redundancy (cache in the federation)

Created on 27 Nov 2017  ·  47Comments  ·  Source: Chocobozzz/PeerTube

For now, a video is only webseeded by the server on which it was uploaded.

It would be very interesting if other servers seed this video:

  • Better resilience (if the origin server is down, the video is still accessible)
  • Better availability: servers share their bandwidth

The major part is the algorithm that choose which video to duplicate.

Type

Most helpful comment

At least, any admin should be able to choose to be a redondant serveur or not, and be able to have 2 serveur, one main and one redondant.
I think video should be "at least" on 2 serveur if we the serveur provide enough space for this feature.

All 47 comments

I guess the algorithm to choose which video to duplicate can be seen as a cache replacement problem.

It all sums up to defining a filter (either all, or only those favorited by the admin, those with not enough average availability, etc…) and based on that list generate the ordered list of videos using an algorithm chosen above. Then on that ordered list we can take the first n videos or the first videos whose total corresponds to a limit fixed by the server.

I guess we want to go with a simple algorithm, and all of these are considerations around it might change the algo choice later on. LRU, LFU and LFRU are the most prevalent, along with variants used in YT.

Not a JS dev here, but there seems to be a few examples of LRU caches around:

Hi @rigelk, thanks!

LRU should be fine. We already use it for the video previews cache in PeerTube.

The difficulty is to define which video we want to cache:

  • Manually set by the administrator?
  • Videos watched on the actual instance (example: 2 viewers watched the same video on a 2 hours interval. so cache it)?
  • Ask the fediverse who needs help?

Hello there, I'm discovering PeerTube right now, and I just wanted to share my 2 cents on this :
"Manually set by the administrator?". No, you do not want to do that, and admins do not want to spend their time choosing which video they should share, either. It's just technically the easiest way...

The obvious would be the most viewed ones (which means we count which are viewed), and the new ones ?

Rather than wondering which video we want to seed, what about thinking of it the reverse way : seed them all, then delete the less bandwith hungry ones ? The most availables on known peers ? etc.

"Manually set by the administrator?". No, you do not want to do that, and admins do not want to spend their time choosing which video they should share, either. It's just technically the easiest way...

Of course I agree :)

The obvious would be the most viewed ones (which means we count which are viewed), and the new ones ?

Yes we count the view of each video. So it's a good idea to cache most viewed within a certain interval. But it does not solve the problem where a server with many videos would need help if many people watch different videos.

Rather than wondering which video we want to seed, what about thinking of it the reverse way : seed them all, then delete the less bandwith hungry ones ? The most availables on known peers ? etc.

We cannot cache all videos of the fediverse :)

We cannot cache all videos of the fediverse :)

Of course you can't. But you do not know all videos of the fediverse, not even every server out there. You may want to add videos that have been requested through your server. Or the ones your friend's servers have the fewer copies of. Or... there's a lot of strategies.

Servers with a lot of videos, and visitors could request others to store some, either.

How about doing it the way that's being used to try to make a backup of the Internet Archive and for each video try to have three servers seeding it (the original instance where it was uploaded, and two others)? I guess "maximum filesize" for this could be configured, as well as "maximum number of videos mirrored"? Perhaps a feature could be implemented where an additional instance would only seed a friend's videos for a certain amount of time, when that time was up it would notify the uploaded instance so it could poll for another potential seeder?

At least, any admin should be able to choose to be a redondant serveur or not, and be able to have 2 serveur, one main and one redondant.
I think video should be "at least" on 2 serveur if we the serveur provide enough space for this feature.

I second what @Letiteuf55 suggests − I guess that's a nice feature for those who want to ensure maximum performance and not impact the videos of their server, plus some people might want to help but not provide a fully-featured instance of their own.

However should it be a requirement? Some instances will just want to mirror a handful of videos, and all that extra setup might just deter them from doing so.

idk but it seems sane to me that it should be seeded if the video has been shared to your instance. Any videos you have on your server that are shared to my server I should help seed.

Manually set by the administrator?

I would prefer this. Not on a per-video basis necessarily, but the same way I can choose specific instances (or no instances) to follow, I would only want to seed videos from certain other instances as well. It could be something simple like a 'Seed videos' checkmark next to each entry in the admin following page.

There has been discussion in Framacolibri revolving around potential incentives for users to provide redundancy.

what’s needed is for normal users to easily assist with the task, even when they aren’t actively watching a video.

If one were to look at traditional torrent systems there is an answer to the question of “why”. Offer something in return to users who help host the content. Pride and status is always cheap. But increased upload sizes for users at the discretion of each instance is an obvious answer. Tit for tat so to say.

What you are looking for is a ratio system (or at least what I’ve experienced of it in many private and semi-private torrent trackers/communities). They provide an elegant solution in the torrent realm, where you can trade the ability to download with your capacity to share enough or long enough.
We can’t however trade the ability to download without loosing a fair share of users, so this tit for tat effectively needs to rely more on social status. It’s however a weaker incentive, and arguably enough to appeal to enough users. We should find more retribution mechanisms, ideally.
In the meantime, we could store the volume of uploaded content and the time for which a given user has seeded a given content.

Maybe this algorithm can help https://raft.github.io . I found this on some decentralized presentation

@Serkan-devel on what level would raft help? It isn't obvious.

Raft offers a generic way to distribute a state machine across a cluster of computing systems, ensuring that each node in the cluster agrees upon the same series of state transitions.

Namely, we haven't agreed upon a state machine to distribute among instances willing to help host the content, and that was more or less what we were discussing. Feel free to suggest one.

As someone interested in both hosting a PT instance, and supporting the PT federation as an average user, I can think of at least two issues at play here:
a) Unless I'm hosting on a spare tower in my broom closet, storage from my instance may well cost me as much or more than bandwidth.
b) I may not mind if people find videos I disapprove of via from my instance via federated search, but I may balk at contributing resources to seeding videos I disapprove of ("no platform" attitudes, legal liability issue etc)

One way to help solve issue a) is to empower home users to store videos they like on their local hard drive (which is finite but cheap storage), and seed them into the PT network, just the way people do currently with BitTorrent, without having to keep loads of browser tabs open. This would need to be done in a way that doesn't require the PC to be running a full webserver and PT instance. See this discussion for some context:
https://github.com/Openmedianetwork/visionOntv/issues/1

There could be an official PT desktop client that does this? Helping with WebTorrent support in desktop BT clients (and maybe desktop media players like VLC?) might make maintaining and supporting an official client unnecessary?

As for issue b), allowing users to download and seed videos they like from PT instances (see above) would reduce the need for instances to cache each others' content, but I can definitely see the benefit of some kind of automated redundancy. At the same time, instance admins are definitely going to want control over how much storage space is used for caching remote videos on their own instance, which videos are stored on their own instance (not only for resource reasons, but for aesthetic and political ones), and which instances videos can and can't be cached from on their own instance. They're also not going to want to spend a lot of time moderating this stuff, or it cancels out the benefits of making the redundancy automated.

Perhaps the best solution is to build in a way for like-minded instance admins to be "backup buddies". For example, the admins of MeowTube and KittyTube could agree to mirror all the videos uploaded to each others' instances, which they could do by entering each others' URLs into a 'backup' tab in the admin UI. This could also work in clusters. The admins of MarxTube, ChicoTube, HarpoTube, and ZeppoTube, could similarly agree to mirror the videos on each others' instances, so the 'backup' tab would need to allow multiple backup URLs. Whenever a video is uploaded to instance, it would have to push a notification to the instance at any URL entered in the 'backup' tab, which would then download the video, and make it available in the same WebTorrent swarm.

I would like to add that there are people like myself who have access to a fair bit of bandwidth and disk space, so making it possible to to offer this to others as a "dynamic" seed would be no issue.

Perhaps the best solution is to build in a way for like-minded instance admins to be "backup buddies".

I think this is the way to go for starters, as it's the easiest to implement and has the least legal and technical implications.

The replication process could be simply pointing to an ActivityPub collection, that is a channel or a whole instance. There could be support for "short" URLs like https://myptinstance.org for the whole instance, or mastodon-style URIs (@user@instance) for user channels. There should be an option indicating how disk space to allocate to it (with a default setting provided).

There should also be a connection to the remote tracker to find out about all the files to replicate, so that the UI for replication can present how much space would be used (at the given moment) for replication of the channel/instance.

We could have default server-wide settings for :

  • how much much storage to allocate per replica by default?
  • (potentially, in the future) allow users to replicate other user's content using their own diskspace quota? ⁽¹⁾

What do you think about this proposition?

⁽¹⁾ I believe this is also desirable as it will discourage users from reuploading the same videos on so many different instances. If an instance provides you with a certain storage space, then you should be free to use it to upload your own content, or to replicate other content that you like. Please note that in this case of user-based replication, several users may want to replicate the same remote content, and all of them should have their diskspace impacted so that when one user wants to withdraw from replicating the content from their diskspace, the video is still replicated using other people's quota.

There are two ways which I would like to see this handled myself:

  • Set instances (or maybe accounts) which an automatic mirroring of the video happens
  • When a video is watched on an instance, mirror it

Now there are some limitations I would want to set to this, especially with a heavily federated instance such as my own, with the second option it would be nice to have limits. And I think maybe it could be handled by "cache user settings" type thing.

That is, create some settings for each cache type we operate (and I'd say get one setup and in testing first), which would set things like a "cache size". If there are multiple cache types, there likely would be issues with figuring out how to drop cached videos.

This relates to #674 as well. It would be cool if we could support an option to use S3 instead of local paths.

Imagine 3 servers behind a load balancer, a user uploads a video to one of the servers, the server in turn sends the file to the staging S3 bucket. An event is triggered, a worker picks up the newly uploaded file, transcodes it, and writes the output files to the storage S3 bucket. Any of the 3 servers can now serve the video. The server and the transcoding layer can be autoscaled independently depending on server load.
Sounds pretty nice, no?

Where can one find some info about how this very feature (now that the "V1 de luxe" has been funded, kudos BTW) is being implemented? I'm not only asking that because all my videos are unwatchable because the instance is down 😱 as a music producer, looking at full weeks of disappearance in the middle of summer is like staring at death itself 💀

It's not yet being implemented :wink:

There are still things to figure out, and things to settle on. For instance we should settle on how deep the replication goes: is the replica fully independent from the origin? Or is the replica "just" here to help spreading the load, without removing the need for the origin instance to orchestrate the video federation?

@yPhil-gh I guess your use case is clearly needing option 1 (a.k.a. replication as fail-safety), but the issue so far has been more leaning towards option 2 (a.k.a. replication as cache). As for which we will implement, option 1 can be thought as option 2 + extra mechanisms to announce the video on the federated network. That way we will go incrementally.

For instance we should settle on how deep the replication goes: is the replica fully independent from the origin? Or is the replica "just" here to help spreading the load, without removing the need for the origin instance to orchestrate the video federation?

Isn't 2 what we have right now, with webtorrent?

I guess your use case is clearly needing option 1 (a.k.a. replication as fail-safety)

But, 2 should (well, could) take care of that, shouldn't it?
If the video X, published on instance A, has been watched a number of times, and that it is still in the _cache_ (in the webtorrent / seed sense) and I'm on instance B, when I request this video (like from the search results page) said video should stream from a number of torrent clients, right?

announce the video on the federated network

BTW, one thing is implementable pretty quickly: The error message ; currently, in our use case, the user stares at a black screen. We should tell him what's going on: "This video is hosted on a machine that does not respond right now. Please try again later" now if you'll excuse me, I'm gonna go cry up on the roof :|

For instance we should settle on how deep the replication goes: is the replica fully independent from the origin? Or is the replica "just" here to help spreading the load, without removing the need for the origin instance to orchestrate the video federation?

Isn't 2 what we have right now, with webtorrent?

yes. That's client replication, and right now it has the drawback of being ephemeral (when you quit the video view page, you stop seeding the video). Here we are talking about server replication (other instances willing to help seed the video).

If the video X, published on instance A, has been watched a number of times, and that it is still in the cache (in the webtorrent / seed sense) and I'm on instance B, when I request this video (like from the search results page) said video should stream from a number of torrent clients, right?

No, because the origin instance is still the one telling which peers and webseeds exist. It still acts as an orchestrator.

one thing is implementable pretty quickly: The error message ; currently, in our use case, the user stares at a black screen.

Yes, we should do this :+1: . Would you mind opening an issue for it?

now if you'll excuse me, I'm gonna go cry up on the roof :|

It's really sad the instance you put your videos on is not online anymore :worried: One day we'll have enough tools and options in and around PeerTube to make sure that kind of dreadful event can be overcome more easily.

Would you mind opening an issue for it?

No I wouldn't, it's done.

right now it has the drawback of being ephemeral (when you quit the video view page, you stop seeding the video). Here we are talking about server replication (other instances willing to help seed the video).

That seems to be already possible thanks to busy fediverse bees :+1:

the instance you put your videos on is not online anymore

Wait, it will be up again, please don't talk doom ; what I'm wondering is: Suppose it goes up again. And then, I set up a seedbox like described in the linked article, that specifically seeds those videos. And now the hosting instance is down again ; Would that transparently and gracefully solve the current (my) problem? That would be a somehow volatile system, but at least it would be a failsafe..?

That seems to be already possible thanks to busy fediverse bees

Wow! I wasn't aware of that one :open_mouth: ! Let me add that to the wiki right away

Suppose it goes up again. And then, I set up a seedbox like described in the linked article, that specifically seeds those videos. And now the hosting instance is down again ; Would that solve my problem?

Again, the origin instance is necessary for clients to discover peers. Your seedbox is just going to register itself as a potential peer…

Again, the origin instance is necessary for clients to discover peers. Your seedbox is just going to register itself as a potential peer…

Ah, I see (I hope I do) but then it looks like a much easier problem to solve... Like an hourly list of all peers, distributed across them? In our case, the fallen instance has been registred before it fell, and the seedbox IS registered, so the client can put 2 and 2 together... Sorry if I misunderstood _again_ :p

No, you got it :) But…

Like an hourly list of all peers, distributed across them?

Clashes with the problem of privacy of the viewers. More specifically it clashes with the countermeasures we put in place/plan to put in place as explained in the new about page.

the new about page.

Wow. Basically the opposite of youtube's EULA, huh ? :1st_place_medal:

Now what I don't get (and then I'm gone, I promise) is the connection "privacy of the viewers" / "list of all peers" ooh yes I see it now. Hum.

@yPhil-gh what do you mean by "the opposite of YouTube's EULA"?

@yPhil-gh what do you mean by "the opposite of YouTube's EULA"?

I mean they don't care about user privacy half as much as you guys do and if they do they won't tell us anyway.

I believe that one way to handle the issue at hand by @yPhil-gh is to have the system work that at least certain instances acting as an additional tracker. But I'm not sure if you can change the lists of trackers in the torrent file, without changing the torrent.

I know that if I'm running a torrent client, I can add trackers there, and it doesn't break things, but I don't know if adding the tracker (and federating the addition ideally) would mean it's a different torrent, or if it's only a different torrent if the "contents" change. My quick looking at http://www.bittorrent.org/beps/bep_0003.html does not make that clear.

@JigmeDatse

"I believe that one way to handle the issue at hand by @yPhil-gh is to have the system work that at least certain instances acting as an additional tracker. But I'm not sure if you can change the lists of trackers in the torrent file, without changing the torrent."

That occurred to me too. If you can't add trackers to an existing torrent, again, the 'backup buddies' concept (see: https://github.com/Chocobozzz/PeerTube/issues/123#issuecomment-392578415) could be helpful here. 2 or more instances could automatically add their backup buddies as extra trackers for any video uploaded to them. If video creators like @yPhil-gh have a seedbox running for their own channel, there will always be at least one complete copy of the video available to the swarm. So as long as at least one of the instances in the backup cluster of the instance hosting their channel is up, the video ought to remain available to the fediverse.

I don't read all but I'm working on desktop version with electron, this version will use webtorrent-hybrid and keep back video a little time (the user will allow a space in hard drive and new video will erase older and while video will be on keeped it will be shared).

@Jorropo that sounds like a good way to start with this. And it works well for my use case right now.

If I may offer an idea for the "which videos should we mirror?" question?

Well it seems to me that the videos most in need of mirroring, are the ones being watched the most, with the smallest number of seeders.

So to start with a simple algorithm, why not just have the instance admin set the storage space for how much storage they're prepared to give to mirroring. After that just look at each existing video the instance knows of in the fediverse, and start mirroring as many videos as can for the space, in order of the video with the lowest ratio of seeders to views?

It doesn't depend on admins manually curating lists. It limits storage to what the admins are comfortable with. It allocates bandwidth to videos that need it most. It's privacy friendly, using just info already available.

Possible enhancements could be:

  1. Factor in the length of the video as long videos need more bandwidth and get less full views. Maybe make the video ratio (views*minute length)/# of seeders.
  1. Periodic updating of which videos are mirrored, to ensure new videos get a chance, even if the number of new instances slows.

I agree with After that just look at each existing video the instance knows of in the fediverse, and start mirroring as many videos as can for the space but I don't think we should mirror videos depending on the number of seeders because:

  • Some people cannot use the P2P, we need to think about them too
  • Even if a video has many seeders, we don't really know how many upload bandwidth the swarm has

Instead, I think we should just have a scheduler checking every x hour(s) videos to cache, ie having score > threshold in the last y hours (score could be views/num_instances where num_instances is the number of instances caching this particular video). If the cache is full, just evict videos with the smallest score.

If an instance decides to cache a video, we need to send the information to the fediverse so a video is not cached by 100 instances. This information should be propagated on regular basis, one way of saying "I keep caching this video".

When an instance caches a video:

  • It sends the information to the origin instance
  • Origin instance forwards the info to its followers
  • Origin instance updates the torrent file to add the webseed URL of the new instance

Would there be a way to mirror only the first N bytes/seconds of a larger collection of videos?

(I'm trying to optimize performance and user experience on my PeerTube server a bit, and it seems like video startup time can be pretty bad if the client and server are geographically far away. This seems like it might help more videos start quickly, despite a long tail of content that needs to be available but won't be watched by a huge number of people.)

@Chocobozzz if we only want to cache part of a video (say the first 10%) so that we can cache more videos as suggested by @scanlime, there is a way to mirror only part of a video: the WebSeed endpoint just has to answer with a Range header and respond with 206 Partial Content.

@scanlime if you want to replicate videos already - outside the scope of the current development of PeerTube instances replicating each others' videos relying on WebSeed for the diffusion -, you can certainly select files to be downloaded as per the webtorrent API, but I couldn't find a way to select which parts of a file.

Just out of interest, despite the fact that I support using how much a video is watched as a factor in choosing which videos to cache, I am wondering how easy views would be to spoof... Say an instance is modified to say that every video uploaded by it's owner instantly has 1,000,000 views, so it a) looks popular and b) gets automatically cached by a ton of legitimate servers and thus streams faster and more reliably at the expense of non-cheating uploaders...

Also, just using views will penalize long videos (which would need more help seeding) in favour of short videos (which can easily rack up a large number of views). While it's equally open to abuse as in my example above, it seems to me that time watched might be a better metric than views...

@Bugsbane as a third party trying to gauge videos in need of redundancy, you have to trust the origin server for its metrics anyway. But view counts are already difficult to trust as a good metric for the origin server. They are only accounted for by clients using the web interface, and can indeed easily be spoofed.

There is no silver bullet in that matter. But if you know of ways around or research papers (even partially answering the question), I'm interested.

@Bugsbane as a third party trying to gauge videos in need of redundancy, you have to trust the origin server for its metrics anyway. But view counts are already difficult to trust as a good metric for the origin server. They are only accounted for by clients using the web interface, and can indeed easily be spoofed.

There is no silver bullet in that matter. But if you know of ways around or research papers (even partially answering the question), I'm interested.

While views are a simple metric, im sure its posible to track the torrent/dht traffic? Looking at the number of peers and seeders.

@Bugsbane In the PR (https://github.com/Chocobozzz/PeerTube/pull/1054), admins can specify multiple strategies. For now we only have most-view for the sake of simplicity, but we can imagine many other sin the future (like you said: by video duration etc).

Most of the metrics we will use in the different strategies could be spoofed by a bad server, but it's the administrators that choose what server they want to cache. So they can easily remove bad ones.

Are you going to add some documentation how this works? For example the different caching strategies available, what cache replacement policy is used, etc?

@Nutomic edges are still rough and we should still test it functionally before documenting at length, so please bear with us.

Okay no problem, just curious how it will work in detail.

I'm wondering if it might be possible to share the load with something that might be called a broadcast node. The idea is that if I want to provide bandwidth to a peertube instance, I use some API to tell them some maximum bandwidth, and total daily / hourly data limits. Then when requested for a video, the peertube instance can simply push content directly to the broadcast node, who can take on the burden of seeding that content up to clients up to the specified limits.

@MobiusHorizons You could probably set up a Peertube instance and disable signups, then set it to cache other instances. Bandwidth limits can be set in Nginx.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

XenonFiber picture XenonFiber  ·  3Comments

filmaidykai picture filmaidykai  ·  3Comments

zilti picture zilti  ·  3Comments

JohnXLivingston picture JohnXLivingston  ·  3Comments

gegeweb picture gegeweb  ·  3Comments