Notes: Comparison of IPFS and BitTorrent for Archives

Created on 30 Dec 2016  Â·  9Comments  Â·  Source: ipfs/notes

For a project that's looking to store a lot of data redundantly and validate it (ie. #ClimateMirror, what's the best way to explain the differences between IPFS and BitTorrent? What advantages and weaknesses should a project like that consider?

As a starting point, there's this bit on page 4 of the ipfs whitepaper

Unlike BitTorrent, BitSwap is not limited to the blocks in one torrent. BitSwap operates as a persistent marketplace where node can acquire the blocks they need, regardless of what files those blocks are part of. The blocks could come from completely unrelated files in the filesystem. Nodes come together to barter in the marketplace

Most helpful comment

The key distinguishing factor in my mind is the fact that IPFS allows you to use any hash, of any content or any subset of content, as an identifier. You can use that hash to ask the network who has that exact content. This makes the system much more flexible than bittorrent, because you can precisely identify exactly the content you are providing or requesting, regardless of whether it's a huge set of files, a single file, a part of a file, or a single entry from some dataset. Contrast this with bittorrent's reliance on torrent files, which bundle data together according to however that torrent file was originally structured by its creator.

As far as I can tell, bittorrent v2 does not decrease this reliance on torrent files.

All 9 comments

The main distinction I'm aware of is the fact that BitTorrent relies on torrent files, each of which contains a content-addressed manifest of the blocks that make up particular content. This has some ramifications:

  • forces you to choose what is in each torrent file -- ie. do you create one huge torrent file for all of your datasets or do you make a torrent file per-dataset?
  • forces you to track the torrent files themselves with some other tool/system
  • requires you to create metadata _about_ the torrent files
  • does not natively provide a way to identify torrent files themselves using cryptographic hashes
  • does not handle versioning of content

By contrast, IPFS lets you build a DAG of arbitrary size and structure.

Some advantages that occur to me:

  • You can track both the content and the metadata in the IPFS DAG
  • You can add multiple versions of a dataset to IPFS. Each version gets a unique hash and IPFS does its best to avoid storing duplicate blocks
  • You have complete control over which blocks are stored on which IPFS node -- this has huge advantages for distributing storage/backup (see ipfs-cluster)

Oh- and you can reference contents/files within a datasets using merkle paths and link to them with merkle links.

For Climate Mirror, the big advantages include:

  • Being able to access files in folders without downloading an entire dataset (especially for the researchers who need to use this data)
  • IPNS. Need I say more? We can host an index of both IPFS hashes and normal mirrors, and update it frequently. Thus, we have a content discovery mechanism. https://ipfs.io/ipns/QmRsCTmkqL35LZ7uBGDoPnLtgJuyiEDDXjLaFYmMWsmTaM
  • No duplicate blocks is huge.

That's among several other advantages, but those are some key points I've found.

NOTE: That index is simply a sampling for an explorer I'm building. The real index will have IPFS datasets, etc.

@flyingzumwalt, @20zinnm I'd be interested to hear your thoughts on how IPFS compares to BitTorrent v2 — it seems to me the gap has gotten smaller.

The key distinguishing factor in my mind is the fact that IPFS allows you to use any hash, of any content or any subset of content, as an identifier. You can use that hash to ask the network who has that exact content. This makes the system much more flexible than bittorrent, because you can precisely identify exactly the content you are providing or requesting, regardless of whether it's a huge set of files, a single file, a part of a file, or a single entry from some dataset. Contrast this with bittorrent's reliance on torrent files, which bundle data together according to however that torrent file was originally structured by its creator.

As far as I can tell, bittorrent v2 does not decrease this reliance on torrent files.

or a single entry from some dataset

@flyingzumwalt This seems like a big advantage over torrent, can you point to the reference on how to do that in IPFS ?! Thanks!

@DougAnderson444 This is a consequence of the merkle tree structure of IPFS. BitTorrent breaks up a folder of files into equally size blocks, that cut between files, whereas IPFS treats each file as its own unit (and some times a collection of units). A "folder" in IPFS is a merkle node that contains links to other nodes, that IPFS then retrieves down a tree, using content addressed data.

For people more familiar with BitTorrent, one can think of IPFS as a single large swarm, where each folder is a link to another torrent (within the same swarm).

What about the content immutability?

This is a consequence of the merkle tree structure of IPFS. BitTorrent breaks up a folder of files into equally size blocks, that cut between files, whereas IPFS treats each file as its own unit (and some times a collection of units).

BitTorrent can do per-file blocks by adding padding blocks that are then ignored by the clients:

https://en.wikipedia.org/wiki/BitComet#Padding_files

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jbenet picture jbenet  Â·  4Comments

pgte picture pgte  Â·  4Comments

reit-c picture reit-c  Â·  4Comments

pgte picture pgte  Â·  4Comments

pgte picture pgte  Â·  5Comments