Notes: Converting IPFS hash format to SHA-256

Created on 14 Oct 2017 · 4Comments · Source: ipfs/notes

I was looking into IPFS's design and noticed that it doesn't use SHA-256 as the final hash, but rather "Qm"+stuff, which appears to be a base58-encoding of a merkle dag (with a SHA-256 hash in there somewhere).

Anyways, I was wondering if it is possible to only have knowledge of the IPFS hash and not the file and still obtain the SHA-256 hash via base58-decoding and then reversing whatever operations that IPFS does. I know this probably wouldn't work in the case that the IPFS hash is that of a directory, but maybe it would work for the IPFS hash of a specific file?

Source

Crypt-iQ

Most helpful comment

iPFS is using multihash to address data, the Qm.. is in fact multihash prefix of sha256 hash.

When you add a file to IPFS it gets chunked into unixfs protobufs - https://github.com/ipfs/go-ipfs/blob/master/unixfs/pb/unixfs.pb.go, so the hash won't be direct sha256 of the file you added.

There is --raw-leaves option for ipfs add, adding files smaller than 256k will yield multibase-base58 CIDv1, which will contain raw sha256 of the file.

So no, by default it's not possible to derive file sha256 using hashes from add, --raw-leaves allows to do that for smaller files

magik6k on 14 Oct 2017

👍3

All 4 comments

iPFS is using multihash to address data, the Qm.. is in fact multihash prefix of sha256 hash.

When you add a file to IPFS it gets chunked into unixfs protobufs - https://github.com/ipfs/go-ipfs/blob/master/unixfs/pb/unixfs.pb.go, so the hash won't be direct sha256 of the file you added.

There is --raw-leaves option for ipfs add, adding files smaller than 256k will yield multibase-base58 CIDv1, which will contain raw sha256 of the file.

So no, by default it's not possible to derive file sha256 using hashes from add, --raw-leaves allows to do that for smaller files

magik6k on 14 Oct 2017

👍3

Thanks for the quick reply. Answered my question so I'll close this issue!

Crypt-iQ on 15 Oct 2017

One thing we've been hoping for is that the IPLD produced by files.add would start including as a field the multihash of the content so we could check that what we got back is what was intended.

Apart from adding a confirmatory check that we got back the expected exact file, it would also allow for better tracking of IPFS bugs such as in https://github.com/ipfs/js-ipfs/issues/1049.

mitra42 on 13 Nov 2017

Unfortunately, that would double the amount of hashing we'd have to do and hashing is already one of our more expensive operations. At the end of the day, it's IPFS's job to verify the file's hash and give you the correct file.