Nix: Nix and IPFS

Created on 24 Mar 2016 · 159Comments · Source: NixOS/nix

(I wanted to split this thread from https://github.com/NixOS/nix/issues/296#issuecomment-200603550 .)

Let's discuss relations with IPFS here. As I see it, mainly a decentralized way to distribute nix-stored data would be appreciated.

What we might start with

The easiest usable step might be to allow distribution of fixed-output derivations over IPFS. That are paths that already _are_ content-addressed, typically by (truncated) sha256 over either a flat file or a tar-like dump of a directory tree; more details are in the docs. These paths are mainly used for compressed tarballs of sources. This step itself should avoid lots of problems with unstable upstream downloads, assuming we could convince enough nixers to serve their files over IPFS.

Converting hashes

One of the difficulties is that we use different kinds of hashing than in IPFS, and I don't think it would be good to require converting those many thousands of hashes in our expressions. (Note that it's infeasible to convert among those hashes unless you have the whole content.) IPFS people might best suggest how to work around this. I imagine we want to "serve" a mapping from the hashes we use to the IPFS's hashes, perhaps realized through IPNS. (I don't know details of IPFS's design, I'm afraid.) There's an advantage that one can easily verify the nix-style hash in the end after obtaining the paths in any way.

Non-fixed content

If we get that far, it shouldn't be too hard to manage distributing everything via IPFS, as for all other derivations we use something we could call _indirect_ content addressing. To explain that, let's look at how we distribute binaries now – our binary caches. We hash the build recipe, including all its recipe dependencies, and we inspect the corresponding narinfo URL on cache.nixos.org. If our build farm has built that recipe, various information is in that file, mainly the hashes of the _content_ of the resulting outputs of that build and crypto-signatures of them.

Note that this narinfo step just converts our problem to the previous fixed-output case, and the conversion itself seems _very_ reminiscent of IPNS.

Deduplication

Note that nix-built stuff has significantly greater than usual potential for chunk-level deduplication. Very often we do a rebuild of a package only because something in a dependency has changed, so there are only very minor changes expected in the results, mainly just exchanging the references to runtime dependencies as their paths have changed. (In seldom occasions even lengths of the paths would change.) There's a great potential to save on that during distribution of binaries, which would be utilized by implementing the section above, and even potential in saving disk space in comparison to our way of hardlinking equal files (the next paragraph).

Saving disk space

Another use might be to actually store the files in a FS similar to what IPFS uses. That seems a little more complex and tricky thing to deploy, e.g. I'm not sure someone already trusts the implementation of the FS enough to have the whole OS running of it.

It's probably premature to speculate too much on this use ATM; I'll just write I can imagine having symlinks from /nix/store/foo to /ipfs/*, representing the locally trusted version of that path. (That's working around the problems related to making /nix/store/foo content-addressed.) Perhaps it could start as a per-path opt-in, so one could move only the less vital paths out of /nix/store itself.

I can help personally with bridging the two communities in my spare time. Not too long ago, I spent many months on researching various ways to handle "highly redundant" data, mainly from the point of view of theoretical computer science.

Source

vcunat

👍72 ❤32 🎉16

Most helpful comment

The IPFS team apparently made package managers their top priority for 2019 :tada:

timokau on 6 Apr 2019

🎉23

All 159 comments

I'm curious what the minimalist way to associate store paths to IPFS objects while interfering as little as possible with IPFS-unaware tools would be.

ehmry on 24 Mar 2016

I described such a way in the second paragraph from bottom. It should work with IPFS and nix store as they are, perhaps with some script that would move the data, create the symlink and pin the path in IPFS to avoid losing it during GC. (It could be unpinned when nix deletes the symlink during GC.)

vcunat on 24 Mar 2016

I was thinking about avoiding storing store objects in something that wouldn't require a daemon, but of course you can't have everything.

ehmry on 24 Mar 2016

@vcunat Great write up! More thoughts on this later, but one thing that gets me is the tension between wanting incremental goals, and avoiding work we don't need long term. For example it will take some heroics to use our current hashing schemes, but for things like dedup and the intensional store we'd want to switch to what IPFS already does (or much closer to that) anyways.

Maybe the best first step is a new non-flat/non-NAR hashing strategy for fixed output derivations? We can slowly convert nixpkgs to use that, and get IPFS mirroring and dedup in the fixed-output case. Another step is using git tree hashes for fetch git. We already want to do that, and I suspect IPFS would want that too for other users. IPFS's multihash can certainly be heavily abused for such a thing :).

Ericson2314 on 24 Mar 2016

For me the end goal should be only using IPNS for the derivation -> build map. Any trust-based compatibility map between hashing schemes long term makes the perfectionist in me sad :).

Ericson2314 on 24 Mar 2016

👍2

For example it will take some heroics to use our current hashing schemes, but for things like dedup and the intensional store we'd want to switch to what IPFS already does (or much closer to that) anyways.

I meant that we would "use" some IPFS hashes but also utilize a mapping from our current hashes, perhaps run over IPNS, so that it would still be possible to run our fetchurl { sha256 = "..." } without modification. Note that it's these flat tarball hashes that most upstreams release and sign, and that's not going to change anytime soon, moreover there's not much point in trying to deduplicate _compressed_ tarballs anyway. (We might choose to use uncompressed sources instead, but that's just another partially independent decision I'm not sure about.)

vcunat on 24 Mar 2016

For single files / IPFS blobs, we should be able to hash the same way without modification.

Ericson2314 on 24 Mar 2016

But for VCS fetches we currently do a recursive/nar hash right? That is what I was worried about.

Ericson2314 on 24 Mar 2016

@ehmry I assume it would be pretty easy to make the nix store an immutable FUSE filesystem backed by IPFS (hopefully such a thing exists already). Down the road I'd like to have package references and the other things currently in the SQLite database also backed by IPFS: they would "appear" in the fuse filesystem as specially-named symlinks/hard-links/duplicated sub-directories. "referees" is the only field I'm aware of that'd be a cache on top. Nix would keep track of roots, but IPFS would do GC itself, in the obvious way.

Ericson2314 on 24 Mar 2016

one idea i had, was to keep all outputs in NAR format, and have the fuse layer dynamically unpack things on-demand, that can then be used with some other planned IPFS features to share a file without copying it into the block storage

then you get a compressed store and don't have to store 2 copies of everything (the nar for sharing and the installed)

cleverca22 on 8 Apr 2016

@cleverca22 yeah, I had same thoughts about that, its unclear how hard this would impact performance though

nmikhailov on 8 Apr 2016

could keep a cache of recently used files in a normal tmpfs, and relay things over to that to boost performance back up

cleverca22 on 8 Apr 2016

@cleverca22 another idea that was mentioned previously was to add support for NAR to ipfs, so that we can transparently unpack it as we do with TAR currently (ipfs tar --help)

davidar on 8 Apr 2016

NAR sucks though---no file-level dedup we could otherwise get for free. The above might be fine as a temporary step, but Nix should learn about a better format.

Ericson2314 on 8 Apr 2016

👍1

@Ericson2314 another option that was mentioned was for Nix and IPFS (and perhaps others) to try to standardise on a common archive format

https://github.com/ipfs/archive-format

davidar on 9 Apr 2016

@davidar Sure that's always good. For the shortish term, I was leaning towards a stripped down unixfs with just the attributes NAR cares about. As far as Nix is concerned this is basically the same format but with a different hashing scheme.

Ericson2314 on 9 Apr 2016

Yeah looking at Car, it's seems to be both an "IPFS Schema" over the IPFS Merkel DAG (Unless it just reuses unixfs), and then an interchange format for packing the dag into one binary blob.

That former is cool, but I don't think Nix even needs the latter (except perhaps as a new way to fall back on http etc if IPFS is not available while using a compatible format). For normal operation, I'd hope nix could just ask IPFS to populate the fuse filesystem that is the store given a hash, and everything else would be transparent.

Ericson2314 on 9 Apr 2016

https://github.com/cleverca22/fusenar

i now have a nixos container booting with a fuse filesystem at /nix/store, which mmap's a bunch of .nar files, and transparently reads the requested files

cleverca22 on 9 Apr 2016

👍8 ❤4

What is currently missing for using IPFS? How could I contribute? I really _need_ this feature for work.

knupfer on 20 Jul 2016

Pinging @jbenet and @whyrusleeping because they are only enlisted on the old issue.

knupfer on 20 Jul 2016

@knupfer I think writing a fetchIPFS would be a pretty easy first step. Deeper integration will be more work and require touching Nix itself.

copumpkin on 20 Jul 2016

👍6 😕1

Ok, I'm working on it but there are some problems. Apparently, ipfs doesn't save the executable flag, so stuff like stdenv doesn't work, because it expects an executable configure. The alternative would be to distribute tarballs and not directories, but that would be clearly inferior because it would exclude deduplication on file level. Any thoughts on that? I could make every file executable, but that would be not very nice...

knupfer on 28 Jul 2016

@knupfer it's not great, but would it be possible to distribute a "permissions spec file" paired with a derivation, that specifies file modes out of band? Think of it like a JSON file or whatever format and your thing pulls from IPFS, then applies the file modes to the contents of the directory as specified in the spec. The spec could be identified unique by the folder it's a spec for.

copumpkin on 28 Jul 2016

In fact, the unit of distribution could be something like:

{
  "contents": "/ipfs/12345",
  "permissions": "/ipfs/647123"
}

copumpkin on 28 Jul 2016

Yep, that would work. Albeit it makes it more complicated for the user to add some sources to ipfs. But we could for example give an additional url in the fetchIPFS which wouldn't be in ipfs, and if it fetches from normal web automatically generate the permissions file and add that to ipfs... I'll think a bit about it.

knupfer on 28 Jul 2016

ipfs doesn't save the executable flag

should it? @jbenet

how is ipfs-npm doing it? maybe also just distributes tarballs. that is of course not the most elegant solution.

davidak on 28 Jul 2016

I think @knupfer is talking about that thin-waist dag format. This is supposed to be a minimal building block for building more complex data structures. [This is one of the reasons why I am annoyed it supports keys at all---that's an unneeded feature that just causes confusion.]

Ericson2314 on 28 Jul 2016

additional url in the fetchIPFS which wouldn't be in ipfs

why would it have to be not in IPFS? it seems like you'd just need a single URL pointing at some sort of structure that points to another IPFS path

copumpkin on 28 Jul 2016

It can be everything in ipfs, the question is how does it enter there. So I thought there could be a conventional url to fetch from when it isn't already in ipfs.

knupfer on 28 Jul 2016

Ok, I've got a working draft of fetchipfs which reuses a lot of code of fetchzip:

http://lpaste.net/172901

And an example of hello with fetchipfs:

http://lpaste.net/172904

You'd have to add the following to all-packages.nix

fetchipfs = import ../build-support/fetchipfs {
   inherit fetchurl lib unzip;
};

The ipfs-path now contains a file named executables which lists all files which should be executable seperated by newlines and a directory which contains the source (hello-2.10 in this case).

But I'm not sure, perhaps it would be better to list the executables as optional argument to fetchipfs instead of storing that directly into ipfs. This would be much nicer for updates: Someone just has to add the directory of the updated source, change the hash and change the sha256. Now, you'd have to write a file with executables, even if these didn't change and add this together with the source, like for example:
ipfs add -r -w executables hello-2.10

Any thoughts?

knupfer on 30 Jul 2016

This is a longer term concern, but I just opened an issue for Nix to use the git tree object format as an alternative to Nar https://github.com/NixOS/nix/issues/1006 . Similarly, I'd like to hack multihash to support that format too (I.e. a "hashing algorithm" which only supports a subset of IPFS dags isomorphic to git trees wrt hashing and DAG shape.)

Ericson2314 on 31 Jul 2016

Also note that it looks like @wkennington already added support for ipfs to fetchurl in triton.

copumpkin on 2 Aug 2016

Oh, I didn't know about triton (what's the point of it?). But there are some issues if I'm not wrong. He uses tryDownload, so he's forced to serve tarballs via ipfs. My approach downloads directories from ipfs, but asks the ipfs api to wrap it up as tarball. In this way, it can just use curl but deduplicates on file basis. Afterwards, you'd have to unpack the tarball and make a recursive hash, the tarballs from the ipfs api aren't stable.

At the moment I'm on vacation, in about a week I'll dump a branch here.

knupfer on 4 Aug 2016

@knupfer my understanding is that triton is an attempt to "clean slate" nixpkgs/NixOS, and shed a lot of historical stuff we've accumulated over time. Also, it assumes linux and gets rid of the various hackery we need to support darwin and other platforms.

@wkennington can probably comment more sensibly on the actual approach to the IPFS stuff. I just wanted to make people were aware of related work 😄

copumpkin on 4 Aug 2016

will respond more thoughtfully soon.

cc @nicola @davidar @whyrusleeping @diasdavid

The work we're doing with IPLD will simplify a lot of this. More soon,
maybe others can fill in too
On Thu, Aug 4, 2016 at 10:57 Daniel Peebles [email protected]
wrote:

@knupfer https://github.com/knupfer my understanding is that triton is
an attempt to "clean slate" nixpkgs/NixOS, and shed a lot of historical
stuff we've accumulated over time. Also, it assumes linux and gets rid of
the various hackery we need to support darwin and other platforms.

@wkennington https://github.com/wkennington can probably comment more
sensibly on the actual approach to the IPFS stuff. I just wanted to make
people were aware of related work 😄

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nix/issues/859#issuecomment-237579274, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoUqf0O0fi9N-zqdMbRZCmVWfsoJdks5qcf3ggaJpZM4H32W0
.

jbenet on 4 Aug 2016

👍1

So, here is my first draft. It tries in order

And I've changed the hello program to use fetchipfs.
Ipfs doesn't store exec flags, so I've added a optional list of execs to fetchipfs, which will be applied (this seems cleaner to me than a different file which describes the execs like discussed in this thread).

The ipfs path expects a directory of the source code and not a tarball. This allows deduplication on file basis and delta updates of big code bases (like latex for example).

Downsides:
1) It doesn't depend on ipfs and therefore can't add files to ipfs
2) Ipfs (the newer versions) is hard to install on nixos and the older version is nonfunctional
3) it needs an ipfs path and a sha256 because of not depending on ipfs
4) when ipfs is running, it increases storage requirements

Comments:
1) It would make a lot of things easier to bite the bullet and depend on ipfs (the exec is about 30mb). Alternatively, we could offer the option in an ipfs module to add new store paths to ipfs (with opt out).
2) We need some script to convert a gx package to a normal go package
4) This is alleviated a bit by btrfs. The dangerous alternative would be to mount ipfs and replace nix store paths with symlinks to ipfs paths. This would also allow some curious features like lazy source code, where only the files which are used for compilation will be downloaded. The itch with that idea is, that we would have to symlink every file and not the directory, to be able to download files with exec flags and change these to have these exec flags (ipfs doesn't store exec flags and symlinks can't change the flag).

https://github.com/knupfer/nixpkgs/tree/fetchIPFS

Any thoughts on how to proceed?

knupfer on 23 Aug 2016

This is assuming the input source is already in IPFS (such as the example with the hello program) right? @knupfer we (me and @plintX) are working on a similar solution but approaching from a different angle. Basically to make it a bit more seamless integration, we will have a package archiver that supplies an ipfs store as a content addressed "input" cache. When a build expression is evaluating a fetchUrl it will first contact the input cache check if the input is already available there, and if not the input is fetched and archived. At some later point this can be integrated into hydra and combined with the "output" cache. Our project is here: https://github.com/MatrixAI/Forge-Package-Archiving

There are extra issues relating to deduplicating compressed inputs...

How we collaborate?

CMCDragonkai on 23 Aug 2016

@knupfer awesome work so far, this is really exciting :)

One thing you might really like is the ipfs tar subcommand set. Its not advertised well because we aren't sure what will become of it once ipld lands (I think it belongs inside unixfs). But it allows you to import and export tar files directly into ipfs. It will expand the tars structure inside ipfs so files get deduped nicely (especially important when dealing with multiple tar files) and retains all executable flags, symlinks and other fun filesystem stuff.

If you want to avoid running a full ipfs node, you could take a look at ipget but i'm not sure if its functional (we havent looked at it in quite a while).

1) It would make a lot of things easier to bite the bullet and depend on ipfs (the exec is about 30mb).

With go1.7 and some other build flags, you can get this down into the 12-15MB range, or even lower depending on what guides youre following.

2) We need some script to convert a gx package to a normal go package

Could you elaborate here on why this is needed? and what it would entail?

4) ...

If the exec flags things is a feature that would really help you, we can add it fairly easily. And maybe only expose it at first behind a feature flag. We havent put it in yet because it presents interesting security concerns that need to be thoroughly thought through.

whyrusleeping on 23 Aug 2016

N.B. https://github.com/ipfs/specs/issues/130 here are some plans for IPFS to support foreign data, including git tree objects which support exactly the FS metadata we care about. I previously opened an issue https://github.com/NixOS/nix/issues/1006 for teaching Nix to use git trees in addition to NAR. If that IPFS spec is accepted, then we and IPFS can implement git trees in parallel.

Also I wonder if @shlevy's new DaemonStore abstraction is a first step towards eventually teaching Nix to use an arbitrary content addressable storage+networking layer.

All this is much more work and shouldn't impead progress on a more simple fetchIPFS however.

Ericson2314 on 23 Aug 2016

ipfs doesn't save the executable flag

It should, yes. it doesn't yet. we've run into this problem. unixfs should store some unix flags, as much as git.

jbenet on 24 Aug 2016

@CMCDragonkai How do you convert the hashes of fetchurl to ipfs paths? Isn't that impossible? I've read your repo, but I don't understand if you're trying to replace nix or to improve it.
@whyrusleeping Thanks for the hint with ipfs tar add, I've seen it but I misused it (I tried to ipfs get the resulting path instead of ipfs tar cat), this actually solves my current issues with exec flags.
For me, it would be best to just depend on ipfs, but I can't expect that an entire operating system will depend on it. So by not depending and just checking whether there is something that looks like ipfs under the appropriate url, I can expect to integrate this at least into some packages. For example into grsecurity, which has very wonky tarballs.
I've looked into ipget, but I dislike the fact that the user reaps the benefit without helping the community.

The problem with gx (don't misunderstand me, I think gx is a great project) is, that it is a quite new package manager which hasn't got any infrastructure on the ipfs side. A fast hack would be to write something like fetchgx. Writing a normal package just for ipfs isn't possible, because gx needs network access, which doesn't exist in the install phase to ensure reproduceability.

knupfer on 28 Aug 2016

my original idea to solve this was to use fuse to mount .nar files onto /nix/store, and it sounds like IPLD could be used to add a nar in a de-duppable fasion maybe?

but that still leaves the issue of how you access a nar from the initrd to mount the rootfs, my original plan was just a directory of bare nar's for the entire store, and then ipfs was free to read/share them

cleverca22 on 28 Aug 2016

@knupfer The package archiver acts like the binary cache. For a nix client, given a URL and other metadata such as the hashes, it should first check the package archiver to see if it has it. If the package archiver doesn't have the input package, it will download the package and multihash it and store it in ipfs and also add an entry into a multikeyed data structure which acts like a multiindex into ipfs and then serve it to the nix client. This should work for any client wishing to content address its inputs and not just ipfs enabled clients. But the multikeyed index is what I think should be able to map the arbitrary hashes that people have specified to the ipfs hash. Also later this can be integrated into hydra so build expressions in nixpkgs can be automatically archived.

CMCDragonkai on 29 Aug 2016

@CMCDragonkai It seems to me that your project takes a lot more work, so in the mean time a fetchipfs makes sense (and helps future ipfs paths).

So, now it's using ipfs tar. It doesn't require anymore to specify execs and it does now when the ipfs path doesn't produce anything download from the given url and upload afterwards to ipfs (when the demon is running). While uploading it does verify the ipfs hash.

Any review would be appreciated!

https://github.com/knupfer/nixpkgs/tree/fetchIPFS

knupfer on 29 Aug 2016

Currently the stream (constant memory) downloading works, parallel multihashing is also working. We are currently working on concurrent threads for each download (supporting just http protocol for now) and perhaps parallel conduit for each step in the pipeline and just benchmarking the best performance parameters. Afterwards we just have to construct the multikeyed index and see how well that integrates to lookups on ipfs.

The fetchipfs would only work for new build expressions right? Also who and where will be hosting the ipfs nodes?

CMCDragonkai on 29 Aug 2016

Also who and where will be hosting the ipfs nodes?

We're happy to help run some nodes for you and pin graphs. Ideally we wouldn't run the only nodes, but we can help support bandwidth

jbenet on 29 Aug 2016

Yes, it would work only for new build expressions. Every person which has got a daemon running will be hosting exactly the build expressions which this person installed. I think that's sensible, because there are a lot of people which would use it. And it's a strictly better solution than fetchurl, because it has got a fetchurl as a fallback.

knupfer on 29 Aug 2016

Btw. I'm really interested in your project, perhaps I'll join forces, haskell is my favorite language.

knupfer on 29 Aug 2016

So the package archiver could potentially serve ipfsed content to clients that don't have IPFS enabled. I was thinking there would have to be a node of last resort (sort of like the "reserve node") that is just for NixOS community which only archives nixpkgs stuff. Other people hosting their own packages will need to run their own node and I'm not sure but if there was a way to merge different IPFS graphs together that would make cross-graph querying easy.

But whoever wants to join in a torrent like network, every nix client could act like that, and help distribute bandwidth.

I think fetchipfs can also support this package archiver once its ready.

CMCDragonkai on 29 Aug 2016

@knupfer No problem, our repo is a bit messy atm, we should clean it up so its clearer what's happening and the progress. @plintX

@jbenet That will be cool. Does anyone have an estimate for what the total input package size of all nixpkgs would be?

CMCDragonkai on 29 Aug 2016

Well, the question is how much redundancy is needed. The somewhat guaranteed last node would be the plain url, for example https://ftp.gnu.org/gnu/hello/hello-2.10.tar.gz. But considering that there are generous people with a lot of disk space and that there are more than 1000 nix users, I think there won't be any issue.

If we're talking only about source, I'd guess a TB.

knupfer on 29 Aug 2016

Are people still working on this? It sounds interesting

arianvp on 21 Dec 2016

👍3

I'm not aware. I've been a bit overloaded lately and thus neglecting "larger" issues.

vcunat on 21 Dec 2016

Yes, see https://github.com/MatrixAI/Forge-Package-Archiving

We are currently working on deep integration into ipfs (reading about
libp2p). That is we need something closer to the storage than the http api.
On 21/12/2016 11:29 PM, "Arian van Putten" notifications@github.com wrote:

Are people still working on this? It sounds interesting

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nix/issues/859#issuecomment-268511394, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAnHHXFPvsLes6iJ2HaJCd507PEubc2Vks5rKRujgaJpZM4H32W0
.

CMCDragonkai on 21 Dec 2016

👍3

Please have a look at #1167 and let me know what you think.
It adds IPFS support to the binary cache generation. If a binary cache is being generated (nix copy) each .nar will be added to IPFS and the resulting hash written into the corresponding .narinfo.
When retrieving the .narinfo a signed IPFSHash will be found and instead of downloading the .nar from the same cache, IPFS can be used.

mguentner on 1 Jan 2017

@mguentner: I wondered why you decided adding *.nar files into IPFS. I would find it much more practical to add the /nix/store/xxx subtree as it is, because that would be (almost) directly usable when mounted at /ipfs/. (The only remaining step is to add a symlink /nix/store/xxx -> /ipfs/yyy.)

vcunat on 1 Jan 2017

👍1

@vcunat: Currently the unixfs implementation of IPFS lacks an execute bit which is quite useful for the store which is why I opted for .nar distribution until the new unixfs implementation (using IPLD) is done.
Then, IPFS contents can be symlinked/bind-mounted to the store like you describe it. However, this requires ipfs running on the system while the .nar method also works using a gateway.
While the concept of a almost decentralized distribution is awesome, it requires that each instance of Nix(OS) also runs an IPFS daemon which impacts not only the memory footprint but also is a security concern among other things.
Don't get me wrong, I really like the idea of using IPFS at the FS level but for some use cases this might not be the ideal choice.

Basically there are two scenarios:
Scenario A:

[Machine 1] ----|
[Machine 2] ----|---------HTTP--------[ IPFS Gateway ] -------- IPFS
[Machine 3] ----|

Scenario B

[Machine 1] ----|
[Machine 2] ----|---------IPFS
[Machine 3] ----|

In A a local IPFS gateway which fetches/distributes content and local Nix(OS) machines fetch their content via this gateway using HTTP. This gateway is not necessarily a dedicated machine but can also be some form of container (e.g. nixos-container).
You just need to manage IPFS on the gateway like setting storage, networking quotas and limiting resources IPFS uses (memory, CPU, IO). The distribution method should be uncompressed .nar files.

In B you need to manage IPFS on all machines with the upside that IPFS can be used at the FS level, i.e. mounting /ipfs to /nix/store.

A is better suited for laptops and servers since your machine will not start distributing files when you don't want it to.
B is nice for desktops and/or machines where IO and bandwidth can be donated.

We should focus on a distribution of uncompressed nar files using IPFS and later on directly symlinking/mounting IPFS contents to /nix/store.

mguentner on 1 Jan 2017

👍3

I didn't realize/remember the +x problem. Thanks!
I meant that "moving to IPFS" would be per-path. In particular, I preferred to avoid having system-critical stuff on such an experimental FS.

Gateways

I really like the idea of gateways, and *.nar is a very good fit there. For now it truly seems better if most NixOS instances don't serve their /nix/store directly and instead they upload custom-built stuff to some gateway(s). People could contribute by:

providing such gateways (perhaps each with some policy about what paths are accepted and from whom);
uploading new builds and/or verifying existing ones (signed by their key; perhaps even some Hydra-like SW for this could be created).

Together this ecosystem might (soon) offer some properties that we don't have with our current solution (centralized farm + standard CDN).

vcunat on 2 Jan 2017

@vcunat Have a look:
https://github.com/mguentner/nix-ipfs/blob/master/ipfs-gateway.nix
This gateway currently accepts all requests to /ipfs while this is the original config that is
whitelist-only:
https://github.com/mguentner/nix-ipfs-gateway/blob/master/containers.nix
The config still lacks the means to compile the whitelist in a sane way (i.e. checking for duplicates, including older hashes that are not in the latest binary cache etc.)
This script could be extended for that:
https://github.com/mguentner/nix-ipfs/blob/master/ipfs-mirror-push.py

mguentner on 2 Jan 2017

@vcunat And I really like your idea of distributing/decentralizing the actual build process. The most critical part here is the web-of-trust which is currently missing in Nix(OS). Other package managers have integrated gpg and each package is being signed by the respective maintainer (Arch comes to mind).

All this could possibly be achieved using the IPFS ecosystem. Have you looked at IPLD yet?

mguentner on 2 Jan 2017

Currently: our build farm signs the results and publishes that within the *.narinfo files; nix.conf then contains a list of trusted keys in binary-cache-public-keys.

With IPFS: I don't remember details from my studying IPFS anymore :-) but I remember IPNS seemed the very best fit for publishing the mapping: signing key + derivation hash -> output hashes (+ signature).

vcunat on 2 Jan 2017

my old ideas for IPFS+NIX was to store whole nar files in IPFS, and to use https://github.com/taktoa/narfuse to turn a directory of nar files into a /nix/store

then the IPFS daemon can be started/stopped, and serve the raw nar files as-is

but you would need a mapping from storepath(hash of build scripts) to IPFS path(multi-hash of nar)

main downside to this plan i had was that it had to store the entire NAR uncompressed in the IPFS system, and on the end-users systems, though normal users pay the same cost once its unpacked to /nix/store

cleverca22 on 2 Jan 2017

The mapping problem is also an issue for forge package archiving. In this case we would like to map arbitrary upstream source hashes to the ipfs path. We're hoping to do this without the need of a separate moving part. Like if there was a way to embed extra hashes into ipfs object. But is there other ways?

CMCDragonkai on 3 Jan 2017

@CMCDragonkai https://github.com/ipld/cid would be exactly what you want I think, but that spec sadly seems to be stalled. The basic idea is allowing IPFS-links to point to more things than IPFS-objects as long as the "pointing" is via content-addressing.

Ericson2314 on 3 Jan 2017

the original idea i had to solve the mapping problem was for hydra to multi-hash every nar, and include that into the .narinfo file, but to leave the "ipfs add" as an optional step anybody can do to contribute bandwidth

main downside is that you still need cache.nixos.org for the narinfo files, it just stops being a bandwidth issue

cleverca22 on 3 Jan 2017

The haskell code we got currently stream multihashes http resources, so that could be integrated into hydra. But cid project looks interesting, we will check it out indepth soon.

CMCDragonkai on 3 Jan 2017

@cleverca22 Nice idea with narfuse! That solves the problem of duplicate storage.
If you leave the ipfs add step optional you still need some authority that does the mapping of .nar hashes and the IPFS hash. A user that does ipfs add still needs to inform other users that the .nar is now available using that IPFS hash.

Just an idea how it could work (the code is already finished for that, see #1167):
(That is Scenario A in https://github.com/NixOS/nix/issues/859#issuecomment-269922805)

A Hydra will build a jobset (e.g. nixpkgs), create binary cache afterwards and publish the resulting IPFS Hashes to a set of initial IPFS Nodes (initial seeders in bittorrent language). These seeders will download everything from the Hydra and once this is finished, the Hydra can (in theory) stop distributing that jobset since from this moment the initial seeders and everyone else running IPFS on their Nix(OS) machine will start distributing.
Have a look at this script which is a basic implementation of what I describe.

How to distribute the .narinfo files is open for debate. Either use the traditional HTTP method (a .narinfo hardly generates any traffic) or also put the information inside some IPFS/IPLD structure.

The upside of distributing using HTTP is that there is a single authority that does the mapping between .nar files and IPFS hashes and no IPFS daemon needs to be installed on the "client" side since .nar
files can also be fetched using a gateway (e.g. one of the initial seeders, some local machine or the one running @ https://ipfs.io).

I am confident that IPFS could revolutionize the way we distribute things but I don't consider it mature enough to be running on every machine out there. We need to find pragmatic solutions and come up with some sort road map for Nix and IPFS.
Starting to distribute .nar files using IPFS could be the first step, mapping .nar files from a mounted IPFS to /nix/store the second step, making all sources (fetchgit, fetchFromGithub) available through IPFS the third (what @knupfer started) and the utopia of building everything from IPFS to IPFS the last one. :rocket: :arrow_right: :new_moon:

mguentner on 3 Jan 2017

👍7

A part from the fact that /nix/store can contain files which are not safe for sharing, because of other issues.
I want to raise security concerns to any P2P & Nix integration.

The biggest issue here is how to guarantee the anonymity of both peers. To highlight the issues let suppose we have 2 peers, Alice (A) and Bob (B) as usual, and that A request one package P to B.

B sees which version of P is requested, and knows the IP of A. Thus can deduce that A does not yet have P.
If A does not have P, this means that either A is installing it for the first time, or upgrading it P.
In which case, B can try to attack A with the issues fixed in the latest version of P.
A sees if P is available, and knows the IP of B. If newer version of P are not available on B, then either P is no longer used in B's configuration or P is not yet upgraded.
In which case, A can try to attack B with the issues fixed in the latest version of P which is not available on B.

In both cases we might think that both issues can be avoided by faking the fact that we have or not a package P, by forwarding the content of someone else. But this suffer from timing attack, and might increase the DoS surface.

What these examples highlight is that we need to either trust the peers, or that we need to provide anonymity between the peers, such that nor A, nor B knows the IP of the others.

nbp on 20 Jan 2017

👍2

@nbp Thanks for mentioning this!

That is very true and will be something that needs to be addressed once it makes sense to run a IPFS daemon on the same system that requests the /nix/store paths using IPFS (as nar or directly
pinning them).

For now IPFS itself is the biggest security concern on a system and then the information about the system it potentially leaks.

However, currently every NixOS user who uses https://cache.nixos.org leaks information about the installed versions to a central entity (Amazon Cloudfront) and all systems in between (filesize).

Depends on your scenario but running a local IPFS gateway might even improve security by reducing the ability to fingerprint your system since many Nix installtions potentially share this gateway. But that's just guesswork plus the security is based partly on obscurity :)

mguentner on 20 Jan 2017

👍2

@nbp
another factor to consider, is that IPFS will advertise the multi-hash of every object you can serve

even if you never advertise locally built things with secrets like users-groups.json, you are still going to advertise you have a copy of hello-2.10 built with a given nixpkgs, and then an attacker could make use of that

cleverca22 on 21 Jan 2017

👍1

You could serve only store paths which could be garbage collected. So you'll only leak information when you download from ipfs, but not by serving.

knupfer on 21 Jan 2017

but now you will never contribute bandwidth towards current build-products, only out of date things

cleverca22 on 21 Jan 2017

Or build-products which you've deinstalled, or brought only via nix-shell into your system

knupfer on 21 Jan 2017

yeah, that would limit its use-fullness while giving security, feels more like something the end-user should decide on via a config option

cleverca22 on 21 Jan 2017

Agree. Don't forget that newer version of sources have normally a lot of untouched files, so it would even help with old garbage (this is obviously not so often with binaries).

knupfer on 21 Jan 2017

main issue i can spot with adding raw uncompressed NAR's to the IPFS network is the lack of compression, and lack of file-level dedup within the NAR, but the IPLD stuff i've heard about could add the NAR in file sized chunks, inspecting the contents of the NAR as it goes, at the cost of having a different hash from plain "ipfs add"

cleverca22 on 21 Jan 2017

I think you do get file-level dedup within and across NARs, as IPFS is supposed to do chunking based on content IIRC.

vcunat on 21 Jan 2017

@copumpkin's comment https://github.com/NixOS/nix/issues/520#issuecomment-275666718 sketching a possible implementation of non-deterministic dependencies shares a lot of characteristics with IPNS.

Ericson2314 on 27 Jan 2017

Will this IPFS enhancement help the Nix community overcome another AWS S3 outage? (Like the one which just happened recently)

equalunique on 10 Mar 2017

As long as the ipfs nodes has their content hosted outside s3.
On 10/03/2017 3:27 PM, "Evan Rowley" notifications@github.com wrote:

Will this IPFS enhancement help the Nix community overcome another AWS S3
outage? (Like the one which just happened recently)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nix/issues/859#issuecomment-285574769, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAnHHXnrl06rtsU61dX5-X1SkWE2Vq9Bks5rkNE1gaJpZM4H32W0
.

CMCDragonkai on 10 Mar 2017

I'm wondering if the new nix 2.0 store abstraction would help and adding an IPFS store.

CMCDragonkai on 5 May 2018

For reference, the experiments around https://github.com/NixIPFS found that IPFS isn't able to offer reasonable performance for the CDN part, at least not yet.

vcunat on 5 May 2018

Are there benchmarks?

On 5 May 2018 23:27:50 GMT+10:00, "Vladimír Čunát" notifications@github.com wrote:

For reference, the experiments around https://github.com/NixIPFS found
that IPFS isn't able to offer reasonable performance for the CDN part,
at least not yet.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/NixOS/nix/issues/859#issuecomment-386805735

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

CMCDragonkai on 5 May 2018

I don't remember any definite results, except that it wasn't usable. @mguentner might remember more.

vcunat on 5 May 2018

@CMCDragonkai No runnable benchmark, just personal experience.
Here you can read about the last deployment:

https://github.com/NixIPFS/infrastructure/blob/master/ipfs_mirror/logbook_20170311.txt

I have no idea how IPFS behaves currently, but I assume that the DHT management traffic is still a problem. Without a DHT you have to manually connect the instances.
Please note that IPFS itself works fine for smaller datasets (<= 1 GiB) but does not compare well against old-timers like rsync (which we used in a second deployment of nixipfs-scripts).

mguentner on 5 May 2018

@whyrusleeping is aware of these things

He wrote in some issue at the end of 2017:

In general, with each update we've had improvements that reduce bandwidth consumption.

So it might be already "usable" for this use case?

It is still not fixed completely. Here are some related issues to follow.

https://github.com/ipfs/go-ipfs/issues/2828
https://github.com/ipfs/go-ipfs/issues/3429
https://github.com/ipfs/go-ipfs/issues/3065

davidak on 5 May 2018

👍2

would love to revive this, anyone on the nix side actively involved as of now?

parkan on 5 Sep 2018

@parkan i don't think so. The linked ipfs issues in my last comment are still open, so we have to wait for fixes (or get involved there and help resolve them).

davidak on 6 Sep 2018

👍1

@parkan: as written, there were severe performance problems with IPFS for our use case. I haven't heard of them being addressed, but I haven't been watching IPFS changes...

vcunat on 6 Sep 2018

gotcha, thanks for the TLDR 😄

there's ongoing work on improving DHT performance, but the most effective approach will likely involve non-DHT based content routing -- I'll review the work in @NixIPFS to see if there's anything obvious we can do today

are there stats on things like total number of installed machines, cached binaries, etc somewhere?

parkan on 6 Sep 2018

❤2

@parkan: there's a list of binary packages for a single snapshot (~70k of them). We have that amount roughly thrice at a single moment (stable + unstable + staging), and we probably don't want to abandon older snapshots before a few weeks/months have passed, though subsequent snapshots will _usually_ share most of the count (and size). Overall I'd guess it might be on the order of hundreds of gigabytes of data to keep up at once (maybe a couple terabytes, I don't know).

I suppose the publishing rate of new data in GB/day would be interesting for this purpose (DHT write throughput), but I don't know how to get that data easily. And also the "download" traffic: I expect there will be large amounts, given that a single system update can easily cause hundreds of MB in downloads from the CDN, and github shows roughly a thousand of unique visitors to the repo each day (even though by default you download source snapshots from the CDN directly instead of git).

I'm sure I did see some stats on a NixCon, but I can't find them and they might be double nowadays. @AmineChikhaoui: any idea if it's easy to get similar stats from Amazon, or who could know/do that?

vcunat on 6 Sep 2018

@parkan

total number of installed machines: 0 (once 3)
cached binaries: 0 (once ~ 400 GiB, rougly 40 jobsets of nixpkgs)

The project is dead at the moment because no one showed interested. I decided that I won't force something if the community is happy with the AW$ solution.

The @NixIPFS project was also an attempt to free the NixOS project of the AW$ dependency which seemed really silly and naive to me.

Since a simple rsync mirror already fulfills that requirement I went ahead with that. However I found nobody who wanted to commit themselves with server(s) and time.
The idea would have been a setup with mirrorbits (redudant with redis sentinel) and optional geo dns. Old issue

Ping me if you need assistance.

mguentner on 6 Sep 2018

I appreciate the scaling issues with serving NARs, etc. over IPFS, but it looks like this "full-blown" approach has derailed the orthogonal issue of making external sources more reliable (described under "What we might start with" in the first comment).

I've certainly encountered things like URLs and repos disappearing (e.g. disappearing TeXLive packages, people deleting their GitHub repos/accounts after the Microsoft aquisition, etc.), which has required me to search the Web for the new location (if it even exists elsewhere...) and alter "finished" projects to point at these new locations. This is especially frustrating for things like reproducible scientific experiments, where experimental results are tagged with a particular revision of the code, but that revision no longer works (even with everything pinned) due to the old URLs.

As far as I see it there are two problems that look like low hanging fruit:

The first is to make a fetchFromIPFS function which doesn't require hardcoding a HTTP gateway. This could be as simple as e.g.

fetchFromIPFS = { contentHash, sha256 }: fetchurl {
  inherit sha256;
  url = "https://ipfs.io/ipfs/${contentHash}";
}

This prevents having HTTP gateways scattered all over Nix files, and allows a future implementation to e.g. look for a local IPFS node, which would (a) remove the gateway dependency, (b) use the local IPFS cache and (c) have access to private nodes e.g. on a LAN.

The second issue is that personally, I would like to use a set of sources, a bit like metalink files or magnet links. The reason is that upstream HTTP URLs might be unreliable, but so might IPFS! At the moment, fixed-output derivations offer a false dichotomy: we must trust one source (except for the hardcoded mirrors.nix), so we can either hope that upstream works or force ourselves to reliably host things forever (whether through IPFS or otherwise). Whilst I don't trust upstreams to not disappear, I trust my own hosting ability even less!

I'm not sure how this would work internally, but I would love the ability to say e.g.

src = fetchAny [
  (fetchFromIPFS { inherit sha256; contentHash = "abc"; })
  (fetchurl { inherit sha256; url = http://example.org/library-1.0.tar.lz; })
  (fetchurl { inherit sha256; url = http://chriswarbo.net/files/library-1.0.tar.lz; })
];

The same goes for other fetching mechanisms too, e.g.

src = fetchAny [
  (fetchFromGitHub { inherit rev sha256; owner = "Warbo"; repo = "..."; })
  (fetchgit { inherit rev sha256; url = http://chriswarbo.net/git/...; })
  (fetchFromIPFS { inherit sha256; contentHash = "..."; })  # I also mirror repos to IPFS/IPNS
];

Whilst all of the hash conversion, Hydra integration, etc. discussed in this thread would be nice; simple mechanisms like the above would be a great help to me, at least. I could have a go at writing them myself, if there was concensus that I'm not barking up the wrong tree? ;)

Warbo on 6 Sep 2018

👍7

I don't think it's orthogonal at all. Sources are cached in the CDN as well. (Once in a longer while IIRC.) EDIT: maybe only fetchurl-based sources ATM, I think, but that's vast majority and not a blocker anyway, as it's only store paths again. Current example: https://github.com/NixOS/nixpkgs/pull/46202

vcunat on 6 Sep 2018

I must admit it's difficult to compete with these CDNs, as long as someone pays/donates them. My systems commonly update with 100 Mb/s, reply < 5 ms. I'm convinced this WIP has taken lots of effort to get into this stage, but to make it close to the CDN, that would surely take many times more. I personally am "interested" in this, but it's a matter of priorities, and I've been overloaded with nix stuff that works much worse than the content distribution...

vcunat on 6 Sep 2018

👍3

@vcunat Just to be clear (can't tell if you were replying to me or not) my thoughts above were mostly concerned with custom packages (of which I have a lot ;) ) which have no CDN, etc. rather than "official" things from nixpkgs.

Warbo on 6 Sep 2018

OK, in this whole issue I've been only considering sources used in the official nixpkgs repository plus the binaries generated from that by hydra.nixos.org. Ability to seamlessly go above that would be nice, but it feels like overstretching my wishlist.

vcunat on 7 Sep 2018

Whoops, never mind; it looks like https://github.com/NixOS/nixpkgs/tree/master/pkgs/build-support/fetchipfs basically does what I described (fetch from a local IPFS node, with a HTTP fallback)!

Warbo on 7 Sep 2018

Just a note, proprietary sources are not cached in the CDN. And these tend to break the most I find. In one instance the source link is not even encoded in nixpkgs (cuDNN) and you're expected to login to NVIDIA to get them. I did however find automatable link for acquiring cuDNN.

My original goal here was to have transparent ipfs fetching. So you dont need to special case the fetches, it just works reproducibly as the first time a fetch is applied, it gets put into an IPFS node.

CMCDragonkai on 7 Sep 2018

@vcunat I think @edolstra has a script that generates few stats/graphs from the binary cache, if I'm not wrong the latest was shared in https://nixos.org/~eelco/cache-stats/, I believe it should be possible to generate that again.
Is that what you're looking for ?

AmineChikhaoui on 7 Sep 2018

❤1

I think that's exactly the link I had seen. It's data until December 2017, but that should still be good enough for a rough picture.

Unfree packages aren't cached as a matter of policy, in some cases even distribution of sources isn't legally allowed by the author. Yes, switching to IPFS would make it possible to decentralize that decision (and the legal responsibility), which _might_ improve the situation from your point of view. But... you can use fetchIPFS for those already ;-) (and convince people to "serve" them via IPFS) – I don't expect anyone would oppose switching the undownloadable ones to fetchIPFS in upstream nixpkgs.

vcunat on 7 Sep 2018

@Warbo
https://github.com/NixOS/nixpkgs/blob/082169ab029b4a111309f7d9a795b88e6429222c/pkgs/build-support/fetchurl/default.nix#L38-L43

pkgs.fetchurl already supports a list of URL's and will try each one in order until one returns something
so its just a matter of generating a call to fetchurl, that knows the ipfs hash, sha256, and the original upstream url

cleverca22 on 7 Sep 2018

@cleverca22 Wow, now that you point it out it's obvious; I've looked through that code so many times, but the ability to give multiple URLs didn't stick in my mind, maybe because I've not used it (because I forgot it was possible... and so on) :P

I've moved my other thoughts to #2408 since they're not specific to IPFS.

Warbo on 7 Sep 2018

Unfree packages aren't cached as a matter of policy, in some cases even distribution of sources isn't legally allowed by the author. Yes, switching to IPFS would make it possible to decentralize that decision (and the legal responsibility), which might improve the situation from your point of view. But... you can use fetchIPFS for those already ;-) (and convince people to "serve" them via IPFS) – I don't expect anyone would oppose switching the undownloadable ones to fetchIPFS in upstream nixpkgs.

I want to also get these standard deep learning weights into Nixpkgs as well: https://github.com/fchollet/deep-learning-models/releases

But they are large fixed output derivations. Weights represent source code when there is more and more deep learning applications coming out. For example libpostal.

Someone on IRC mentioned it shouldn't be cached by hydra or something like that. At any case, I want to make use of Nix for scientific reproducibility, the only way to truly make Nix usable for all of these usecases and not bog down the official Nix caching systems with all our large files is to decentralised the responsibility. So another reason IPFS would be important here.

I was wondering if anyone considered Dat?

On another note, I had some work previously involving attempting to get Hydra integrated with IPFS. To do that we had to look more deeply in the IPFS functionality specifically its libp2p framework. We have moved on to other things for now, but we have some knowledge about this particular area. For deeper integration between Nix and IPFS beyond just fetchIPFS, feel free to put up issues in https://github.com/MatrixAI?utf8=%E2%9C%93&q=libp2p&type=&language=.

CMCDragonkai on 12 Sep 2018

👍1

I might be missing something, but I'm not sure what's particularly large about that.

LnL7 on 12 Sep 2018

I see two downloads over 300 MiB each, so perhaps that. (I don't know particulars at all.)

vcunat on 12 Sep 2018

The IPFS team apparently made package managers their top priority for 2019 :tada:

timokau on 6 Apr 2019

🎉23

They made a nice blueprint graphic how people mirror packages which includes Nix, but they need help with details. I think nobody had done it except @mguentner, but also more general details like how is our "package registry" git-powered and how are build products related. One point is, you don't have to mirror the whole package cache to use specific version of nixpkgs. You actually don't need it at all. In worst case you can build the whole system from nixpkgs + source files. https://github.com/ipfs/package-managers/issues/86

They also talked about Nix in the meeting yesterday. https://github.com/ipfs/package-managers/issues/1#issuecomment-525384207

They are also working on performance issues. So we might be able to share packages soon.

(I like their way of organising and outlining use cases and wish we had that too, so we can focus to solve real problems like reproducible environments for universities and HPC without the struggle it is to get started using Nix today...)

davidak on 28 Aug 2019

❤2 👍1

Also keep in mind that one unique design assumption of Nix / Hydra is that the resulting binary cache grows without much structure (at least that I know of).
That means each evaluation of a jobset piles more binaries (.nars + narinfo) without some containment like folders. That makes garbage collection based on closures / jobset / evaluation quite hard.
If a binary cache is stored in a limited environment (not fastly / S3 / CDN with attachable storage), you will run into problems.
Most of my work over at https://github.com/NixIPFS/nixipfs-scripts is mapping Hydra outputs (.nars + narinfo) to folders using symbolic links. All symbolic links point again to a global store. If you would delete a folder of an evaluation, you only delete symbolic links. To garbage collect the global store you only need to check for all files that have no more links from the individual folders.
The output was first put into IPFS but later synced using rsync for performance reasons.

Also note that the approach taken was the naive approach as it only leverages distributed filesystem sharing part of IPFS using a central build service and a central exporter. That's also the reason why you can simply use rsync or simply scp for network transfers.
IPFS has much more features like a pub/sub service, support for linked data structures (see https://github.com/ipld/ipld) which could be really useful for building something that integrates way deeper into the architecture of Nix (if desired).

mguentner on 30 Aug 2019

❤1 👍1

I just noticed that the year has rotated and was a bit let down by the fact that we still don't have any progress on this. (Can't really blame anyone :))

Started reading into what @mguentner said last and found a potentially relevant discussion on hackernews how IPLD: https://news.ycombinator.com/item?id=13441305 (https://web.archive.org/save/https://news.ycombinator.com/item?id=13441305)

andir on 23 Feb 2020

@andir we'll get to it. The trick is getting both sides to agree on how to hash data to avoid indirection. Steps building up to that would be CA-derivations and hashing data using something common like git tree hashes instead of nar. Guess what I was thinking of (after code quality) with https://github.com/NixOS/nix/pull/3455 ? :)

Ericson2314 on 31 Mar 2020

❤2 👍2

Ideally something still rolling-hash chunked like bup rather than pure git, to be able to store binary outputs without overmuch duplication?

ohAitch on 1 Apr 2020

@andir @Ericson2314 I have some news that might be helpful! we've recently launched the IPFS Grant Platform (not announced publicly just yet 🤫) and Nix <> IPFS integration work seems like an ideal grant candidate

this could take the shape of either a direct grant to someone in the Nix community to work on the problem or a jointly formulated call for proposals from 3rd parties

would this be helpful?

parkan on 1 Apr 2020

🎉13

ok, I'm seeing a lot of 🎉 -- who is the best person in the Nix community to chat with to make this happen?

parkan on 2 Apr 2020

@parkan Well I don't want to flatter myself as the single best person, but given the work I've been doing on adjacent issues, I'd be happy to kick off the conversation. How should I reach you?

Ericson2314 on 2 Apr 2020

👍7 👀3

@parkan Well I don't want to flatter myself as the single _best_ person, but given the work I've been doing on adjacent issues, I'd be happy to kick off the conversation. How should I reach you?

dropped an email to the address in your github bio 🙂

parkan on 4 Apr 2020

👍1

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/obsidian-systems-is-excited-to-bring-ipfs-support-to-nix/7375/1

nixos-discourse on 27 May 2020

https://discourse.nixos.org/t/obsidian-systems-is-excited-to-bring-ipfs-support-to-nix/7375 the fruit of @parkan's and my discussion! [edit I guess the bot beat me to it :)]

Ericson2314 on 27 May 2020

❤1 🎉1

I guess I should keep this thread up to date with the major highlights.

Milestone 1 is released!

blog post: https://blog.ipfs.io/2020-09-08-nix-ipfs-milestone-1/
tutorial: https://github.com/obsidiansystems/ipfs-nix-guide/blob/master/tutorial.md
guide to work behind the scenes: https://github.com/obsidiansystems/ipfs-nix-guide/blob/master/branches.md

Ericson2314 on 15 Sep 2020

🎉9

Adhoc Nix over IPFS

I've found a way we could make Hydra populate an IPFS binary cache
on each build and then allow users to consume it and help with the
distribution by serving their local portion/copy of the Hydra IPFS binary cache.

Hydra current state

Hydra currently builds a derivation and produces a store path: /nix/store/hash123123-name
Hydra currently have a signing key used in the cache.nixos.org distribution.
let's call this signing pair hydra.public and hydra.private.
A new signing pair can be generated with:
$ nix-store --generate-binary-cache-key hydra hydra.private hydra.public

By default a nix.conf is like:

substituters = https://cache.nixos.org
trusted-public-keys = <content of hydra.public>

Which reads NAR files in https://cache.nixos.org.

If one NAR file matches and the content signature can be verified
with any of the trusted-public-keys it's used as cache for
$ nix-build, $ nix-shell, etc

Hydra required additions (option 1)

The extra steps we need Hydra to perform are:

Have a running daemon of IPFS, it's as simple as running $ ipfs daemon
Have a normal privileges folder, any location used as NAR store $ mkdir /path/to/nar-store
Sign the nix store path $ nix sign-paths --key-file /path/to/hydra.private --recursive /nix/store/hash123123-name
Copy the closure to the NAR store
nix copy --to file:///path/to/nar-store /nix/store/hash123123-name
Add the NAR store to IPFS and get its content ID
$ CID=$(ipfs add -Qr /path/to/nar-store)
Publish the CID under IPNS $ HYDRA_IPNS=$(ipfs name publish -Q "$CID") # This always yields the same value, it just notifies the network that a change happened
Now $HYDRA_IPNS (which is static) points to the latest version of the Hydra Cache that includes the last built derivation

Hydra required additions (option 2)

Find the way to replicate https://cache.nixos.org over IPNS
(this is an static hash that points to the latest version of content at https://cache.nixos.org)
Naive way is:
- $ aws s3 sync s3://cache-nixos-org /path/to/local-folder
- $ CID=$(ipfs add -Qr /path/to/local-folder)
- $ HYDRA_IPNS=$(ipfs name publish -Q "$CID") # This always yields the same value, it just notifies the network that a change happened

User required additions

Have a running daemon of IPFS, it's as simple as running $ ipfs daemon
Create 2 local folders, normal privileges, any location
$ mkdir /path/to/ipfs /path/to/ipns
Mount the Hydra IPFS substituter with
$ ipfs mount -f /path/to/ipfs -n /path/to/ipns.
This is lazy, both dirs are actually 0 size and data is streamed as-needed via IPFS either from the master Hydra node, or from the nearest peer
Configure nix.conf to consume the substituter:

# Replace the var $HYDRA_IPNS with the static IPNS hash provided by Hydra

substituters = file:///path/to/ipns/$HYDRA_IPNS https://cache.nixos.org
trusted-public-keys = <content of hydra.public>

That's it! I've tested it and it works, required steps on the user-side are short and effective, Hydra new steps should be technically possible

kamadorueda on 29 Oct 2020

👀3 🚀2

I could be wrong but I think having the entire caching sitting on a single
machine is probably infeasible. I think you are right that you could
publish via IPNS but I think you would have to simply add a file entry to
the existing directory.

This is technically possible but the last time I looked into the IPFS tools
it was a complete mess. It seems that you have the following options:

Use ipfs files add. However this requires pinning the entire directory
so it is probably a no-go. It is also very slow.
Use ipfs object patch add-link however this doesn't handle directory
hashing which is almost certainly required.
Manually build the hashed directory objects.

The last one isn't pretty, but probably required. The better solution is
probably fix up go-ipfs to have a nice interface for that and use it.

kevincox on 29 Oct 2020

I wonder how much ram it would take to load all the derivation hashes from the cache.

For IPFS to scale it will be necessary to split the announcement from the actual storage. Basically, have one or more hosts announce the store paths and when they get the request for the actual file, they would retrieve it from the binary cache. A bit like what https://github.com/ipfs/go-ds-s3 does.

zimbatm on 4 Nov 2020

I wonder how much ram it would take to load all the derivation hashes from the cache.

If I had to guess this is probably feasible, but citation needed. However you don't even need to do that, with directory sharding and similar you can do partial updates with only a subset of the directory tree locally. (There is no logical tree but the sharding adds one). Just make sure you pick the sharding algorithm that allows this. Of course this is also in theory. I don't know if any current implementation actually supports this type of operation.

For IPFS to scale it will be necessary to split the announcement from the actual storage

I agree. It is far too high of a maintenance cost for us to run our own storage servers, so we would want to farm out to something like s3 that manages it for us. I'm not sure it actually "doesn't scale" if we wanted to run our own servers, I guess it depends on how much overhead would be running the publishing logic on the storage servers. At the end of the day you do need some machines with storage attached.

As noted with the project you linked this is entirely possible with IPFS.

kevincox on 4 Nov 2020

Following this concepts: https://github.com/NixOS/nix/issues/859#issuecomment-718355215

I wrote a whitepaper of the exact steps we need to follow in order to create a software that mirrors any binary cache over IPFS

The benefits are the same of the ones described in the issue

In short the implementation allows users to become peers of the distribution network just by using Nix normally (after launching some easy magic commands) and, of course, get the benefits of fetching data over IPFS instead of HTTP.

The implementation is serverless, every user launch a small server locally and there is nothing we have to modify on Hydra or the core of Nix, we've always had it all!

Please read it here! https://github.com/kamadorueda/nix-ipfs/blob/latest/README.md

And let me know your thoughts!

kamadorueda on 9 Nov 2020

🚀1

@kamadorueda I really like the approach but I have one question: Wouldn't it make more sense to write the fuse file system in Rust or C for better performance? Depending on network speed I could imagine that python isn't that fast. But maybe you already considered that and have a longer explanation of your decision.

mohe2015 on 9 Nov 2020

@kamadorueda looks nice, but I have one question: why fuse filesystem? Wouldn't it be easier (and more scalable) to use HTTP interface?

iavael on 9 Nov 2020

I agree the HTTP interface sounds like a much cleaner solution. Then the user can just use {gateway}/ipns/{key} as the cache. Where gateway can be a local IPFS gateway (http://localhost:8080) or a public gateway (https://cloudflare-ipfs.com)

Furthermore this allows configuring multiple IPFS caches to trust trivially, instead of needing to run another fuse filesystem locally for each IPFS cache you want to support. In fact this works today with no additional configuration necessary.

# /etc/nix/nix.conf
substituters = https://cloudflare-ipfs.com/ipns/{ipns-key}
trusted-public-keys = {nix-key}

kevincox on 9 Nov 2020

@kevincox as far as I understand, the problem with ipns is that you have to populate all nix cache keys and repopulate them on cache update. So @kamadorueda proposed essentially a proxy which translates nix hash to ipfs hash.
And my question was why use fuse interface for this proxy instead of http?

iavael on 9 Nov 2020

@kamadorueda I really like the approach but I have one question: Wouldn't it make more sense to write the fuse file system in Rust or C for better performance? Depending on network speed I could imagine that python isn't that fast. But maybe you already considered that and have a longer explanation of your decision.

@kamadorueda looks nice, but I have one question: why fuse filesystem? Wouldn't it be easier (and more scalable) to use HTTP interface?

Both of you are right, and HTTP interface could work and be simpler to implement! Pending to do some tests and update the README

I wrote the examples on Python because that's the language I know more. Truth is that given this is an Input / Output bound problem then a language with low concurrency costs would be the most performant

At the end of the day what the community knows more is better as it allows the project to receive more contributions! network bandwidth is the bottleneck

I agree the HTTP interface sounds like a much cleaner solution. Then the user can just use {gateway}/ipns/{key} as the cache. Where gateway can be a local IPFS gateway (http://localhost:8080) or a public gateway (https://cloudflare-ipfs.com)

Furthermore this allows configuring multiple IPFS caches to trust trivially, instead of needing to run another fuse filesystem locally for each IPFS cache you want to support. In fact this works today with no additional configuration necessary.
# /etc/nix/nix.conf
substituters = https://cloudflare-ipfs.com/ipns/{ipns-key}
trusted-public-keys = {nix-key}

The good thing about serving the substituter as a fuse filesystem or some localhost:1234 is that there is no need to deal with trust (take into account that adding a bad-substituter can yield a full host takeover and lots of damage to the info in your system) with a local substituter all you have to trust is yourself and the upstream binary cache (cache.nixos.org, your-own-and-trusted-cachix, etc)

If we use the implementation as it is, the only ipfs command needed is ipfs get <hash> which is automatically protected by cryptography

kamadorueda on 9 Nov 2020

So @kamadorueda proposed essentially a proxy which translates nix hash to ipfs hash.

An IPFS directory is a translation of filename -> IPFS hash. I guess this does sidestep the current issue of incremental IPFS directory update. However for the long term it is probably best just to fix that issue.

If it's not but it's available on a binary cache, stream it from the binary cache to the user AND add it to the user IPFS node.

I missed this bit. Currently this can't be done by hitting the gateway directly. However I wonder if it would just be easier to have a cron job that adds the current store to IPFS every once and a while instead of a proxy? However either solution would be good.

Also if we are doing this how does the user publish this info? Just uploading the nar isn't enough to let other people use it.

kevincox on 9 Nov 2020

@kevincox I don't know how many keys are there in nix binary cache, but preloading many pb of data with cron job sounds impractical.
I think even creating ipns directory of all keys in cache (and regulary update it) is a bit too much by itself.

iavael on 9 Nov 2020

If it's not but it's available on a binary cache, stream it from the binary cache to the user AND add it to the user IPFS node.

I missed this bit. Currently this can't be done by hitting the gateway directly. However I wonder if it would just be easier to have a cron job that adds the current store to IPFS every once and a while instead of a proxy? However either solution would be good.

Also if we are doing this how does the user publish this info? Just uploading the nar isn't enough to let other people use it.

For the moment users are just Mirroring binary caches over IPFS

In other words, users download the data they need from (the upstream binary cache / the nearest ipfs node that has it)

The upstream binary cache is who has the .narinfo files (small metadata files) and the distributed ipfs swarm (other people) is who has the nar.xz (may be big content files)

Users CANT announce store paths at discretion (security, trust problems, it's hard to implement but it's possible).

Users can only announce and receive from peers store paths that are in the upstream binary cache (cache.nixos.org, etc). If it exists on the upstream binary cache then it's trusted

I think we can start implementing this read-only proxy, the benefits are HUGE, mainly in costs savings for all involved parties and speed of transfer. This benefits both the nixos/cachix infrastructure, and end-users

Implementing a write-proxy is possible, it's hard but it's possible, I just think it's better to go step by step, solving problems and adding value every day. Start with the smallest possible change that changes things for better

kamadorueda on 9 Nov 2020

👍1

It sounded like this was being done by the client right? This is just things that you have on your disk already. And since the nix store is immutable IPFS doesn't even need to copy the data.

I think even creating ipns directory of all keys in cache (and regulary update it) is a bit too much by itself.

I doubt it. The amount of work per-narinfo is based on the depth of the IPFS directory tree. IPFS can easily store thousands of directory entries in a single block so the depth is logarithmic with that base. This means that while the amount of work will grow over time it will still be relatively small.

The slightly more concerning number may be that the NixOS project may want to host all of those narinfo files forever. This will likely require something slightly more complicated than just pinning the tree however we currently pay for all of the narinfo and nar on S3 so I can't imagine that it is much worse.

I would love to see info on the total size of narinfo files in the current cache.nixos.org.

kevincox on 9 Nov 2020

The upstream binary cache is who has the .narinfo files

Ah, so this is just proxying the narinfo requests? The doc isn't very clear on the difference between how the narinfo vs the nar are handled.

If you are just proxying the narinfo you can do something very cool. You can just transform the url parameter to point at the user's preferred gateway. (I'm assuming that that field supports absolute URLs, if not it shouldn't be that hard to add).

Then your proxy doesn't even see the nar requests. (And performance becomes mostly a non-issue).

Furthermore if this becomes widespread then we can at some point start publishing all the narinfos (pointing to IPFS) directly and remove the need for the proxy all together. This also allows people to publish their own caches via IPFS without needing to serve HTTP at all.

kevincox on 9 Nov 2020

From the nixos team perspective they pay S3 storage + data transfer

If we implement the proxy as I propose it nixos team would spend the same on S3 storage, but less in data transfer because some objects would be fetched by client from other clients in the IPFS network instead of S3 (or Cloudfront)

Basically users become a small CDN server of the derivations they use, care about, and have locally

There is no need for pinning services, $0 cost for it

Users benefit from speed, binary caches benefit from costs savings, win-win, the added cost is the time it take us (the volunteers) to create such software: https://github.com/kamadorueda/nix-ipfs

kamadorueda on 9 Nov 2020

@kevincox wouldn't ipns approach require to list all keys of binary cache for every cache update? I don't think that there are thousands of them, it's more likely there are much much more.
And I don't even touch properties of ipfs and it's scalability. At first is it practical to create listing of millions of keys (or even dozens/hundreds of millions) for every cache update with cron job?

iavael on 9 Nov 2020

The upstream binary cache is who has the .narinfo files

Ah, so this is just proxying the narinfo requests? The doc isn't very clear on the difference between how the narinfo vs the nar are handled.

nix requests for the narinfo files go to the upstream binary cache always
nix requests for the nar.xz file go to the upstream binary cache OR another peer that has such nar.xs file, the the user becomes a peer of such nar.xz file

Is it more clear know?

this is because nar.xz files are content-addressed, but narinfo are not. ipfs is content-addressed and that's why it's possible with nar.xz but not possible with narinfos

kamadorueda on 9 Nov 2020

👍2

wouldn't ipns approach require to list all keys of binary cache for every cache update

No, you can do incremental updates. It is just a tree and you don't need to recompute unchanged subtrees. (Although currently the implementations that do this are not the best. I think we can use the go-ipfs mutable filesystem API as the scale of narinfos is small. However in the future we may need to implement something new, however that shouldn't be that hard).

kevincox on 9 Nov 2020

👍2

nix requests for the narinfo files go to the upstream binary cache always

However IIUC we need to proxy the request so that we can modify it to point the url field at the proxy. (Although I guess since most caches use relative URLs we don't actually change anything, but in theory we would need to for non-relative URLs).

nix requests for the nar.xz file go to the upstream binary cache OR another peer that has such nar.xs file, the the user becomes a peer of such nar.xz file

That makes sense. One thing to be aware of here is the timeout for when the file isn't on IPFS yet. This may result in more fetches but otherwise the user could be left there waiting forever.

kevincox on 9 Nov 2020

That makes sense. One thing to be aware of here is the timeout for when the file isn't on IPFS yet. This may result in more fetches but otherwise the user could be left there waiting forever.

Sure, this one is easy! thanks

kamadorueda on 9 Nov 2020

Yes, you'll want a short timeout on the IPFS lookup. If something doesn't exist, it can take a long time for IPFS to decide that by default - you can't really prove it doesn't exist, you just have to decide when to give up. Since you have a good fallback, the best user experience is to give up much more quickly than normal. However, if I understand correctly, fetching the file from cache.nixos.org still results in adding the file to IPFS for future users, right?

lordcirth on 10 Nov 2020

I just updated the document taking into account everything you guys said! The change has so many deltas that I think it's faster to read it all again

https://github.com/kamadorueda/nix-ipfs/blob/latest/README.md

Yes, you'll want a short timeout on the IPFS lookup. If something doesn't exist, it can take a long time for IPFS to decide that by default - you can't really prove it doesn't exist, you just have to decide when to give up. Since you have a good fallback, the best user experience is to give up much more quickly than normal. However, if I understand correctly, fetching the file from cache.nixos.org still results in adding the file to IPFS for future users, right?

yes, that's right! you may want to read this section (added a few minutes ago) https://github.com/kamadorueda/nix-ipfs/blob/latest/README.md#implementing-the-local-server

kamadorueda on 10 Nov 2020

We turn this FileHash into an IPFS CID by calling a remote translation service

I'm pretty sure the is no need for a translation service. You can just decode and re-encode the hash.

The only other nit is that you hardcode the assumption that nars live at nar/* which I don't think is required.

kevincox on 10 Nov 2020

We turn this FileHash into an IPFS CID by calling a remote translation service

I'm pretty sure the is no need for a translation service. You can just decode and re-encode the hash.

Man I did the math trying to translate the nix-sha256 into the IPFS CID and couldn't :(

I think I couldn't do it because the CID stores the hash of the merkle-whatever-techy-thing-composed-of-chunked-bits-with-metadata-and-raw-data-together instead of the nix-sha256 of the raw-data only

so nix_sha256_to_ipfs_cid(nix_sha256_hash_as_string) is not possible in terms of math operations ~~it's possible in terms of OS/network commands if we download the entire data in order to ipfs add it and get the merkle-whatever hash (but this defeats the purpose of the entire project)~~

If you have any idea on this tell us please! of course that translation service is something I'd prefer not to develop (and pay for) but seems needed until know

The only other nit is that you hardcode the assumption that nars live at nar/* which I don't think is required.

That's true, although nothing to worry for now I think. If we follow the URL field of the .narinfo everything would be ok

kamadorueda on 11 Nov 2020

We turn this FileHash into an IPFS CID by calling a remote translation service

I'm pretty sure the is no need for a translation service. You can just decode and re-encode the hash.

If it's not possible in terms of math-only (I wish I'm wrong) something really helpful that saves us the translation service would be having a new field for the IPFSCID in the .narinfo.

In such case I think nix-copy-closure should be modified to add: IPFSCID = $(ipfs add -q --only-hash <.nar.xz>) (this just hash, this stores nothing in the host)

kamadorueda on 11 Nov 2020

👍1

$ nix-hash --type sha256 --to-base16 17g1n8hxhq7h5h4jh0vy15pp6l1yyy1rg9mdq3pi60znnj53dzzz
ffff368ab4f60313efc0ada69783f73e50736f097e0328092cf060d821b2e19d

$ sha256sum 17g1n8hxhq7h5h4jh0vy15pp6l1yyy1rg9mdq3pi60znnj53dzzz.nar.xz 
ffff368ab4f60313efc0ada69783f73e50736f097e0328092cf060d821b2e19d  17g1n8hxhq7h5h4jh0vy15pp6l1yyy1rg9mdq3pi60znnj53dzzz.nar.xz

$ ipfs add -q 17g1n8hxhq7h5h4jh0vy15pp6l1yyy1rg9mdq3pi60znnj53dzzz.nar.xz
QmPW7pVJGdV4wkANRgZDmTnMiQvUrwy4EnQpVn4qHAdrTj

https://cid.ipfs.io/#QmPW7pVJGdV4wkANRgZDmTnMiQvUrwy4EnQpVn4qHAdrTj

base58btc - cidv0 - dag-pb - (sha2-256 : 256 : 1148914FBEEBDBB92D2DEC92697CFA76D7D36DA30339F84FCE76222941015BA2)

ipfs sha256: 1148914FBEEBDBB92D2DEC92697CFA76D7D36DA30339F84FCE76222941015BA2
nix  sha256: ffff368ab4f60313efc0ada69783f73e50736f097e0328092cf060d821b2e19d

ipfs hash is the hash of a data-structure composed of metadata and linked chunks, nix hash is just the hash of the raw content

kamadorueda on 11 Nov 2020

Ah shoot you are right. The file will at least have the proto wrapper. And it gets more complicated if the file is multiple blocks in size (which it probably is). I think I was confused but the IPFS git model because it has isomorphic hashes. However it appears that it doesn't really work, it just breaks for files larger than a block. I guess I'll sleep on it and see if there is something clever we can do.

In such case I think nix-copy-closure should be modified to add: IPFSCID = $(ipfs add -q --only-hash <.nar.xz>) (this just hash, this stores nothing in the host)

Of course this forces the chunking strategy to be the current default. It would probably be better to use variable length hashing. (This is probably something worth adding to the current design). But either way encoding the CID without actually pinning the file to IPFS or somehow indicating the chunking method will probably result in issues down the line.

kevincox on 11 Nov 2020

I do still hope my idea at the bottom, https://discuss.ipfs.io/t/git-on-ipfs-links-and-references/730/24, will work. It could work for nars too (modern IPFS underneath the hood cares more about the underlying multihash than the multicodec part of the CID).

Ericson2314 on 11 Nov 2020

🚀1

In such case I think nix-copy-closure should be modified to add: IPFSCID = $(ipfs add -q --only-hash <.nar.xz>) (this just hash, this stores nothing in the host)

Of course this forces the chunking strategy to be the current default.

this one can be specified

-s, --chunker string - Chunking algorithm, size-[bytes], rabin-[min]-[avg]-[max] or buzhash. Default: size-262144.

so maybe adding another field to the .narinfo: IPFSChunking = size=262144, could work

this way the ipfs add can be reproduced on any host, past or future

from a user perspective the ipfs get will work for any chunking strategy

It would probably be better to use variable length hashing. (This is probably something worth adding to the current design). But either way encoding the CID without actually pinning the file to IPFS or somehow indicating the chunking method will probably result in issues down the line.

Maybe, yes, someone who reads the .narinfo can be tempted to think the file is pinned/stored somewhere on the ipfs swarm and then discover it's not

at the end of the day I think this is kind of intended-behaviour, everyone knows data can be available, and then not! only data that people cares about remains over time

kamadorueda on 11 Nov 2020

so maybe adding another field to the .narinfo: IPFSChunking = size=262144, could work

Yeah, I think that would be a necessary addition if we are going to do that.

from a user perspective the ipfs get will work for any chunking strategy

Yes, but my understanding is that this proposal relies on users uploading the nar. And if they can't upload the nar and end up with the same hash no one will never be able to download it from IPFS.

at the end of the day I think this is kind of intended-behaviour, everyone knows data can be available, and then not! only data that people cares about remains over time

This is an okay guarantee if we want to keep the fallback forever. However it would be nice if this was a solution that could potentiality replace the current cache. (Of course an HTTP gateway would be provided for those not using IPFS natively)

I'm starting to wonder if this is the best approach. What about something like this:

Publish a feed of narinfo files published to cache.nixos.org
Write a service that consumes this feed and:
1. Implement an IPFS store that is backed by HTTP and use this to advertise the hash.
2. Publishes its own narinfo files that point at IPFS. (the URL field is modified to /ipfs/{hash})
3. For now this could be any sort of storage.
4. Eventually it would be nice to use a directory published in IPNS.

The obvious downside is that the service itself will use more bandwidth as it needs to upload the nar files (hopefully only occasionally). It also requires writing an IPFS Store HTTP backend that doesn't yet exist (AFAIK).

The upsides are:

The cache could transparently be made self-standing. By hosting the nar files directly we can remove s3 from the equation (if one day this becomes the most popular solution).
No extra software to run on the client. The client only needs to run an IPFS node. (Or use a public gateway, but probably bust to encourage running your own node)
Could be run "fully decentralized" with the IPNS directory. (Although we need to have someone publishing it)
The "uploader" is the one writing the hash so there is no concern with chunking. It can be changed at any time and be fine.

I think the thing I like about this is that it is simple to the user. It just looks like we are hosting a cache over IPFS. They don't need to worry about proxies and explicitly uploading files that they downloaded.

It is probably also worth pointing out that the long-term solution is probably to drop nar files altogether and just upload the directories themselves to IPFS. I think all of the proposals could work fine with this, you just need to add a field to the narinfo saying that it is a directory rather than an archive. However this would require much bigger client-side changes and would not be directly accessible over HTTP gateways. So I think that is a long way off.

kevincox on 11 Nov 2020

I think chunking should be set to Rabin; if the majority of these packages are going to be uploaded by this implementation anyway, there is little downside to being non-standard. Rabin is more advanced and should save on incremental update sizes. Though maybe that doesn't apply to compressed files?

lordcirth on 11 Nov 2020

👍1

It depends on how you do the compression. gzip has the --rsyncable option but the xz cli doesn't appear to have it. Basically if you reset the compression stream every once and a while (ideally using some rolling hash yourself) the hashes can sync up.

Of course the ideal IPFS chunking would just reuse the compression chunking but I don't think that is supported by the current implementation (and would need to be custom for each compression format).

So I guess the answer is that as of today it probably won't help much.

kevincox on 12 Nov 2020

👍1

You could always do the https://bup.github.io/ thing and store the explicitly chunked into files compressed version in IPFS, especially if the chunks are small enough you know they won't be separately chunked by IPFS itself.

ohAitch on 12 Nov 2020

I guess if there is no big win, then it is best to stick with IPFS defaults. Incremental dedup between versions isn't the main reason we want IPFS anyway.

lordcirth on 12 Nov 2020

Guys let me introduce you to a beta of CachIPFS, an encrypted (and private) read/write Nix binary cache over IPFS

https://4shells.com/docs#about-cachipfs

Sure it has a lot of rough edges, but it is an MVP that works, so let me know your thoughts!

Demo.sh
Demo GIF

kamadorueda on 26 Dec 2020

🚀6

@kamadorueda I tried CachIPFS twice. The first run took a long time (which I guess is normal) but the second run still took over 11 hours. Is that normal? I was publishing my nix-config.

I wasn't running 4s cachipfs daemon since in the demo it looks like it's only used for fetching.

bbigras on 30 Dec 2020

I think having faster second and later executions is a must

The current algorithm is very naive:

create a temporary directory
do nix copy the /nix/store/path you want to publish to the directory
encrypt the files
add them to IPFS
publish them to the CachIPFS coordinator

But naive is slow

We'll definitely improve it, thanks for the feedback @bbigras, I'm taking note! 😄

Does someone have a user case beyond publishing/retrieving from a private cache? we'd love to hear it

kamadorueda on 30 Dec 2020

👍1

Does someone have a user case beyond publishing/retrieving from a private cache? we'd love to hear it

If both a stranger and me publish our nix-config with CachIPFS. Could both caches by used by the 2 of us?

Maybe another similar use case would be, if a lot of people are using CachIPFS, and they are all using the same channel (let say unstable). It could be nice and efficient to all share the same stuff.

If the file are trustable. I didn't read everything on CachIPFS yet.

bbigras on 31 Dec 2020

If both a stranger and me publish our nix-config with CachIPFS. Could both caches by used by the 2 of us?

Yes! As long as both of you use the same CachIPFS API token

Every account has associated:

an api token
a secret key for encryption/decryption
a private binary cache that can be accessed with the api token and encrypted/decrypted with the secret key

We'll have the ability to rotate those secrets soon. If many machines (you + your friend) use the same api token, they use the same encryption keys, and upload/retrieve from the same private binary cache

This is the private layer of CachIPFS, it requires trust but this is actually a feature (we don't want untrusted people to read/modify our data)

Maybe another similar use case would be, if a lot of people are using CachIPFS, and they are all using the same channel (let say unstable). It could be nice and efficient to all share the same stuff.

If the file are trustable. I didn't read everything on CachIPFS yet.

I'm thinking on this one, this would be the CachIPFS public layer in which all Nix users share the binary cache with all Nix users. This creates a distributed binary cache over IPFS (this is my dream and purpose)

The problem is security, An attacker can place a virus under /nix/store/gqm07as49jn3gqmxlxrgpnqhzmm18374-gcc-9.3.0 and upload it to the binary cache. If someone else requires gcc, (s)he downloads the virus instead of gcc. This is why trust is very important, you only want to fetch data from people you can trust (not hackers)

But trust can be negotiated in many ways:

users can decide which users to share data with. Sounds like the CachIPFS private layer? well, it is
with algorithms: https://www.tweag.io/blog/2020-12-16-trustix-announcement/ which is not very different from CachIPFS private layer but we find this solution very cool, we just need to find a way in which we can guarantee to the network 100% confidence they are downloading legit nix store paths (I take security very seriously)
by implementing cryptographic protocols that make trust unnecessary, like git https://blog.ipfs.io/2020-09-08-nix-ipfs-milestone-1/ this is by far the ideal solution, but the implementation is hard and there may be many months/years until we can offer a good user experience to the community, also it may not be possible to have 100% content-addressability in all cases (https://www.tweag.io/blog/2020-11-18-nix-cas-self-references/) and those cases will require trust anyway

This is a very exciting topic, we are thinking about it every day

In the big picture, CachIPFS can be defined as a let's-implement-something-useful-with-the-things-we-have-today

kamadorueda on 2 Jan 2021

👍4

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Nix Installation Failure on "Bash on Ubuntu on Windows"

chexxor · 4Comments

Outputting '${ from indented strings

ericsagnes · 4Comments

UX issue with nix-shell: No clean way to preserve my $PS1

drewm1980 · 3Comments

[macOS] NIX_SSL_CERT_FILE: unbound variable

bflyblue · 3Comments

nixUnstable doesn't build with gcc10

luc65r · 3Comments