Nixpkgs: `buildRustPackage` needs improving or replacing - abuse of fixed-output derivations, etc.

Created on 5 Jun 2020  路  12Comments  路  Source: NixOS/nixpkgs

Current issues with rustPlatform.buildRustPackage:

Candidates:


Comment below changes/additions that can be made to this issue, or edit this issue directly

bug rust

Most helpful comment

node creates large diffs with deps.nix, which I'm assuming could be shared across many node projects. However, it is concerning having to merge PRs with 3000+ line diffs frequently.

For the specific use case of cargo vendor shas, I think it perfectly acceptable to have the current "abuse" of fixed-output derviations. The Cargo.lock already contains a lot of logic around verifying the contents of each package used, and is likely to be very reproducible. I think the main argument is that each rust package can't reuse individual packages

I had in the past a meeting with @kolloch and we discussed what the best way to implement this is. The idea is to share a Cargo.nix between all crates that we want to build as application to keep the expression size under control.

One issue I can see with cargo as well, is that it allows for many versions of the same package to be used by a single build (dependencies may require different versions). This may not be much of an issue with "stable" packages, but it could an issue if many beta/unstable packages require creating a need for 10's of the same package to be kept around.

All 12 comments

Should cargo test always run against debug, or the value of buildType

Fixed in https://github.com/NixOS/nixpkgs/pull/82342. (Currently only on staging though).

Other alternative: https://github.com/edolstra/import-cargo

This seems nice! Is this just a POC or intendet to be used at some point? :)

At this point import-cargo is not usuable in nixpkgs. It uses builtins.fetchGit which has an impure dependency on git during evaluation. We don't allow builtin fetcher during evaluation. Also currently the code is delivered as a nix flakes, which is considered experimental.

CC @kolloch since he may not have seen this issue.

Well-known points, but not stated in this issue yet:

Approaches using cargo's vendoring support (buildRustCrate, import-cargo, etc.) Rebuild a lot of dependent crates between different Rust derivations, since compiled crates are not realized as separate output paths. Moreover if the vendored dependencies are not built a separate step, then changes to the derivation will also rebuild all the dependencies. (This is solved by naersk.)

Approaches using buildRustCrate solve these problems. However at the cost of having to generate a Nix representation of the Cargo metadata of the crate and its dependencies. While it is easy to use buildRustCrate directly with fromTOML on Cargo.lock and Cargo.toml for really trivial crates and dependencies, this does not work in the general case because e.g. features for dependent crates are only defined in their Cargo.toml files, which are not available during evaluation.

My question: is it acceptable to include derivations based on crate2nix in nixpkgs? The downside is that the generated Crate.nix files are large. But if crate2nix were to be accepted as a standard Rust packaging method in nixpkgs, the common parts of Cargo.nix could be imported to nixpkgs and what would remain is effectively the Nix representation of cargo metadata + checksums.

CC @kolloch since he may not have seen this issue.

Well-known points, but not stated in this issue yet:

Approaches using cargo's vendoring support (buildRustCrate, import-cargo, etc.) Rebuild a lot of dependent crates between different Rust derivations, since compiled crates are not realized as separate output paths. Moreover if the vendored dependencies are not built a separate step, then changes to the derivation will also rebuild all the dependencies. (This is solved by naersk.)

Approaches using buildRustCrate solve these problems. However at the cost of having to generate a Nix representation of the Cargo metadata of the crate and its dependencies. While it is easy to use buildRustCrate directly with fromTOML on Cargo.lock and Cargo.toml for really trivial crates and dependencies, this does not work in the general case because e.g. features for dependent crates are only defined in their Cargo.toml files, which are not available during evaluation.

My question: is it acceptable to include derivations based on crate2nix in nixpkgs? The downside is that the generated Crate.nix files are large. But if crate2nix were to be accepted as a standard Rust packaging method in nixpkgs, the common parts of Cargo.nix could be imported to nixpkgs and what would remain is effectively the Nix representation of cargo metadata + checksums.

I had in the past a meeting with @kolloch and we discussed what the best way to implement this is. The idea is to share a Cargo.nix between all crates that we want to build as application to keep the expression size under control. We discussed this here: https://github.com/kolloch/crate2nix/issues/102
It's not ready yet unfortunately.

node creates large diffs with deps.nix, which I'm assuming could be shared across many node projects. However, it is concerning having to merge PRs with 3000+ line diffs frequently.

For the specific use case of cargo vendor shas, I think it perfectly acceptable to have the current "abuse" of fixed-output derviations. The Cargo.lock already contains a lot of logic around verifying the contents of each package used, and is likely to be very reproducible. I think the main argument is that each rust package can't reuse individual packages

I had in the past a meeting with @kolloch and we discussed what the best way to implement this is. The idea is to share a Cargo.nix between all crates that we want to build as application to keep the expression size under control.

One issue I can see with cargo as well, is that it allows for many versions of the same package to be used by a single build (dependencies may require different versions). This may not be much of an issue with "stable" packages, but it could an issue if many beta/unstable packages require creating a need for 10's of the same package to be kept around.

node creates large diffs with deps.nix, which I'm assuming could be shared across many node projects. However, it is concerning having to merge PRs with 3000+ line diffs frequently.

OTOH, npm-builds aren't as reproducible as you might think: https://github.com/NixOS/nixpkgs/pull/76618 (and all the previous hash-fixes).

Doesn't yarn have content hashes in its lock-file? We could use those as well to fetch dependencies we analyze at eval-time (thus no need for big lock-files) and then build a node_modules/ directory.

For the specific use case of cargo vendor shas, I think it perfectly acceptable to have the current "abuse" of fixed-output derviations. The Cargo.lock already contains a lot of logic around verifying the contents of each package used, and is likely to be very reproducible. I think the main argument is that each rust package can't reuse individual packages

Sooner or later we may want to use something like edolstra/import-cargo. As you correctly pointed out, we can use the content-hashes from Cargo.lock for that.

One issue I can see with cargo as well, is that it allows for many versions of the same package to be used by a single build (dependencies may require different versions). This may not be much of an issue with "stable" packages, but it could an issue if many beta/unstable packages require creating a need for 10's of the same package to be kept around.

Why is that a problem? If we recreate a vendor-directory with the same semantics cargo does, it should be fine, right?

node creates large diffs with deps.nix, which I'm assuming could be shared across many node projects. However, it is concerning having to merge PRs with 3000+ line diffs frequently.
[...]
I think the main argument is that each rust package can't reuse individual packages

I am getting a lot of mileage out of that for personal/work projects, since they share a lot of dependencies at the same minor versions. Builds are generally much faster than with buildRustPackage. If packages use a large number of different minor versions, then the benefits of sharing dependencies obviously go away.

Nearly all Rust crates adhere to semver and API breakages are generally considered to be bugs. I think this is due to Cargo's ignoring of lock files in library projects. Changing an API in violation of semver will break a lot of downstream library crates and are therefore quickly caught.

I think it is at least worth experimenting (in a crate2nix/buildRustCrate approach) with only using the latest version within semver for all crates. This would have several benefits:

  • Hugely increase the amount of sharing between derivations.
  • Reduce the sizes of diffs in a shared Cargo.nix file.
  • Reduce eval time of full nixpkgs.

Given that Cargo doesn't lock down dependencies in library crates, I think this will break far less than e.g. Python dependencies. Also, AFAIR this is also what the Guix folks do.

That said, I still have some worries:

  • Large diffs in a shared Nix file can quickly lead to conflicts between PRs.
  • I wonder what having a gazillion crates as real Nix expressions will do to eval time of all of nixpkgs.
  • buildRustCrate generally leads to more complex derivations. E.g. when you have dependencies that need native libraries, you need to override individual crates, whereas with buildRustPackage you can just stash everything in the main derivation's (native)?BuildInputs.

Edit: after reading @Ma27's comment I realized that I may incorrectly read can't reuse individual packages as can't reuse output paths of built crate derivations.

Why is that a problem? If we recreate a vendor-directory with the same semantics cargo does, it should be fine, right?

Just having a lot of packages with relatively minor changes, I guess this is not much of a point given the current paradigm exhibits the same behavior.

Just having a lot of packages with relatively minor changes, I guess this is not much of a point given the current paradigm exhibits the same behavior.

Well, this is something, the upstream developers have to take care of. We shouldn't mess up a package's dependency graph just because we don't like it (it's fine though if we really have to do it, e.g. to apply CVE-relevant patches).

One issue I can see with cargo as well, is that it allows for many versions of the same package to be used by a single build (dependencies may require different versions).

Only semver-incompatible versions. This is a necessity, otherwise the dependencies wouldn't build due to API changes. For semver-compatible versions in transitive dependencies, Cargo will use the same version for all transitive dependencies (typically the latest version permitted by semver, unless dependencies have more specific constraints).

For semver-compatible versions in transitive dependencies, Cargo will use the same version for all transitive dependencies (typically the latest version permitted by semver, unless dependencies have more specific constraints).

I guess I'm just jaded by the python ecosystem (and even more so by the machine learning ecosystem) which has no set standard and is subject to breakages all the time.

Was this page helpful?
0 / 5 - 0 ratings