Js-ipfs: 馃専 adding Torrent support to IPFS

Created on 8 Mar 2017  路  17Comments  路  Source: ipfs/js-ipfs

I've started working in enabling Torrent support for js-ipfs, very much in the same way that we have support for: dag-pb, dag-cbor, eth-blocks, eth-tx, zcash (go-ipfs only), git (go-ipfs only) and bitcoin (go-ipfs only).

The end goal is to expose two top level commands to add and retrieve files that are Torrents, from the IPFS or BitTorrent network (through a bridge and in the future, by connecting directly). The commands being:

  • jsipfs torrent add
  • jsipfs torrent cat

However, I stumbled upon a question in which we will have to make a decision and I would like to get feedback before going at full speed. In BitTorrent, torrent files are not referenced by a Cryptographic hash due to their ephemeral and mutable nature (in fact, decoding and encoding is not even always idempotent by spec), the only thing that has a cryptographic identifier is the info field in the torrent file.

I started implementing the IPLD format for a Torrent file, but I'm guessing that most people will want to fetch their torrent through the infoHash of the torrent that they get from a thing like a magnetic URI, the crux is that there is never a file for the info field, as soon as a infoHash query is performed, a Torrent file is retried, rising the question of:

Should dag.get(<infoHash>/somePath) resolve through the retrieved Torrent file or only over the info field?

  • Resolve through the Torrent file - This is weird to the IPLD resolver, as it would be resolving an immutable pointer to something that has more fields
  • Resolve only within the info field - This would force us to make the info field a full standalone object that can be transferred independently (the solution I'm leaning towards). This option would result in two multicodecs for Torrents, torrent-file and torrent-info.

Thoughts? //cc @jbenet @whyrusleeping @nicola

P2 diexpert ipld statuready

Most helpful comment

@lgierth I only received your comments after I posted, it seems that we had this chat while you were writing those :)

Notes from a chat with @jbenet and @whyrusleeping

  • Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).
  • When importing a Torrent, two objects need to be created, one for the info and one for the Torrent file itself.
  • New command added: torrent import <torrent-file, magnetic-uri, infohash>
  • jsipfs torrent will be available through a module called ipfs-torrent that exposes both a CLI and a module (like ipfs-unixfs-engine).

This leads to the following steps

1. Implement the IPLD Formats to support torrents

  • [ ] ipld-bencode
  • [ ] ipld-torrent-file (see https://github.com/ipfs/js-ipfs/issues/779#issuecomment-285126839)

    • [ ] make sure to canonicalize them (see https://github.com/ipfs/js-ipfs/issues/779#issuecomment-285126839)

  • [ ] ipld-torrent-info (see https://github.com/ipfs/js-ipfs/issues/779#issuecomment-285126839)
  • [ ] integrate in ipld-resolver
  • [ ] integrate in js-ipfs

2. Implement a blockstore that uses webtorrent as it's storage driver

  • [ ] torrent-pull-blob-store
  • [ ] confirm that we can dag.get(<torrentHash or infoHash>) and traverse through those objects

3. Implement the ipfs-torrent service (like ipfs-unixfs-engine)

  • module

    • [ ] .import (adds the torrent file and creates an infoHash object too)

    • [ ] import by magnetic URI

    • [ ] import by infoHash

    • [ ] .add

    • [ ] single files support

    • [ ] directories support

    • [ ] .cat (single files)

    • [ ] .get

  • cli

    • [ ] spawn a js-ipfs daemon or connect to a remoteDaemon

All 17 comments

Resolve only within the info field - This would force us to make the info field a full standalone object that can be transferred independently (the solution I'm leaning towards). This option would result in two multicodecs for Torrents, torrent-file and torrent-info.

This sounds like the pragmatic way for me too -- we'll likely get a better idea of what to do with the whole torrent in the process of working on this.

Given that the torrent file itself is not already content-addressed, it's also the "correct" way I think. Magnet URIs address the info hash anyway.

{
  "infoHash": "d2474e86c95b19b8bcfdb92bc12c9d44667cfa36",
  "infoHashBuffer": {"/": "$infoHashAsCID"},
  "name": "Leaves of Grass by Walt Whitman.epub",
}

@lgierth I only received your comments after I posted, it seems that we had this chat while you were writing those :)

Notes from a chat with @jbenet and @whyrusleeping

  • Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).
  • When importing a Torrent, two objects need to be created, one for the info and one for the Torrent file itself.
  • New command added: torrent import <torrent-file, magnetic-uri, infohash>
  • jsipfs torrent will be available through a module called ipfs-torrent that exposes both a CLI and a module (like ipfs-unixfs-engine).

This leads to the following steps

1. Implement the IPLD Formats to support torrents

  • [ ] ipld-bencode
  • [ ] ipld-torrent-file (see https://github.com/ipfs/js-ipfs/issues/779#issuecomment-285126839)

    • [ ] make sure to canonicalize them (see https://github.com/ipfs/js-ipfs/issues/779#issuecomment-285126839)

  • [ ] ipld-torrent-info (see https://github.com/ipfs/js-ipfs/issues/779#issuecomment-285126839)
  • [ ] integrate in ipld-resolver
  • [ ] integrate in js-ipfs

2. Implement a blockstore that uses webtorrent as it's storage driver

  • [ ] torrent-pull-blob-store
  • [ ] confirm that we can dag.get(<torrentHash or infoHash>) and traverse through those objects

3. Implement the ipfs-torrent service (like ipfs-unixfs-engine)

  • module

    • [ ] .import (adds the torrent file and creates an infoHash object too)

    • [ ] import by magnetic URI

    • [ ] import by infoHash

    • [ ] .add

    • [ ] single files support

    • [ ] directories support

    • [ ] .cat (single files)

    • [ ] .get

  • cli

    • [ ] spawn a js-ipfs daemon or connect to a remoteDaemon

\o/

@diasdavid maybe wait with the torrent blob store for the datastore refactor?

@dignifiedquire I see the value, but won't block Torrent support because of the datastore refactor, it is not a dependency.

To keep on log, here is the real structure of both Torrent file and info fields - https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure

Bringing this one back (馃帾 )

Instead of having ipld-torrent-file and ipld-torrent-info, we just need ipld-bencode to be able to resolve through bencode encoded objects. After rethinking this, I remembered why we need ipld-torrent-file and ipld-torrent-info, we need them to enable the resolver to resolve through paths inside these objects (i.e. dag.get(infoHash/pieces/0) should return the piece and not the string that is the sha1 of the piece).

It turns out that we might actually just need to do the bencode, because the format, as described in -- https://wiki.theory.org/BitTorrentSpecification#Metainfo_File_Structure -- prescribes that the SHA1 hashes of the pieces be all concatenated, which means that there won't be any <infohash>/info/pieces/<insert piece number>, unless we apply a transformation to the bencoded data in the first place.

This means that we won't be able to use IPLD resolver to traverse through, without transforming the data, as that pieces field will just be a very long byte array value.

It's pretty ironic, but we can exploit the fact that it prescribes SHA1 and split every 40 bytes.

20 bytes*, @lgierth we can indeed, that falls into the 'Transformations' category, as IPLD compatible format goes, we are strict about not messing with the data.

@diasdavid I dont think splitting on 20 bytes for each piece id is any different than biting off the first N bytes for the first parameter of any binary serialization

i would say its not a transformation if the serialization doesn't need to change

our thinking seems to diverge here, based on previous discussions around ethereum resolvers

@kumavis agreed that there might be space to be a little less strict with the separation of local resolver vs transformation. Note: I intuitively did the same as you with dag-pb https://github.com/ipld/js-ipld-dag-pb/blob/master/src/resolver.js#L44-L47 .

I'll be with @nicola next week and revisit this question for IPLD transformations. Let's continue this thread on the IPLD repo https://github.com/ipld/ipld/issues/13.

I think ipld/ipld#13 is slightly more complicated (pre-process with hash, split into halfbytes).

splitting the concatenated SHA1 refs still falls under (consume path part, return result) which is no more of a transformation than any IPFS resolver performs.

I wanted to note the release of The BitTorrent Protocol Specification v2. I don't expect it to be fully supported soon, but it's probably worth being aware of them when designing v1 support. My understanding may not be entirely correct, but here are the key points as I understand them:

v2 torrents use different structures than v1 in the info dictionary and metainfo .torrent files. v2 torrents are identified using SHA-2-256 hash of the info dictionary, truncated to 20 bytes to match the length of v1's SHA-1 hashes. It's possible to create hybrid torrents that contain both v1 and v2 structures, and can be identified by either hash.

Because a different hash function is used, v1 and v2 torrents' IPFS paths be distinguished (because that's included in their multihash):

/ipfs/f 017b 11 14123456fc77d23aca05a8b58066bb55fe06c72f8e - SHA-1, v1
/ipfs/f 017b 12 14cd5877ccec0ebc8c231ecc70265ce239a90bdb9e - truncated SHA-2-256, v2

EDIT: the following is wrong, see my next comment.

BitTorrent magnet links do not have this information; v1 and v2 magnet links cannot be distinguished. I think you need to connect connect to the torrent swarm and download the metadata before you can check which version and hash algorithm were used.

So it may not be strictly correctly possible to map BitTorrent magnet URLs (e.g. ipfs/ipfs-companion#256) to a specific IPFS path, because the hash algorithm will not be known.

ping @arvidn Maybe you know if magnet: links uniquely identify content, or if it聽needs network discovery, and if this is considered a feature or bug for v2?

What I wrote above is wrong! I apologize for the misinformation. >_<

The updated BEP-9 does in fact use a multihash under a different key to identify a v2 torrent data. I thought that this was cut out before the final version. (The idea of using multihash elsewhere in the protocol was cut, I didn't realize it remained here.) So I think the direct mapping is like:

SHA-1, v1
/ipfs/f017b1114123456fc77d23aca05a8b58066bb55fe06c72f8e
magnet:?xt=urn:btih:123456fc77d23aca05a8b58066bb55fe06c72f8e

truncated SHA-2-256, v2
/ipfs/f017b1214cd5877ccec0ebc8c231ecc70265ce239a90bdb9e
magnet:?xt=urn:btmh:1214123456fc77d23aca05a8b58066bb55fe06c72f8e

Hybrid torrents still have two possible addresses, but that shouldn't be a problem.

yeah, the hash in the magnet link definitely identifies the content. However, it also identifies some other metadata such as piece size, file names, etc. So even with bittorrent v1, it's possible to have two separate magnet links refer to exactly identical content (but with different piece sizes for instance).

Was this page helpful?
0 / 5 - 0 ratings