Nixpkgs: python3Packages.tensorflow build is broken on master

Created on 21 Oct 2019  ยท  28Comments  ยท  Source: NixOS/nixpkgs

Describe the bug
build fails

Analyzing: 2 targets (358 packages loaded, 7386 targets configured)
    currently loading: tensorflow/core/platform/cloud
ERROR: /build/output/external/jpeg/BUILD.bazel:220:15: in nocopts attribute of cc_library rule @jpeg//:simd_x86_64: This attribute was removed. See https://github.com/bazelbuild/bazel/issues/8706 for details.
Analyzing: 2 targets (359 packages loaded, 7419 targets configured)
    currently loading: tensorflow/core/platform/cloud
ERROR: /build/output/external/jpeg/BUILD.bazel:220:15: in nocopts attribute of cc_library rule @jpeg//:simd_x86_64: This attribute was removed. See https://github.com/bazelbuild/bazel/issues/8706 for details.
Analyzing: 2 targets (359 packages loaded, 7419 targets configured)
    currently loading: tensorflow/core/platform/cloud
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: Analysis of target '@jpeg//:simd_x86_64' failed; build aborted
Analyzing: 2 targets (360 packages loaded, 7419 targets configured)
INFO: Elapsed time: 7.017s
Analyzing: 2 targets (360 packages loaded, 7419 targets configured)
INFO: 0 processes.
Analyzing: 2 targets (360 packages loaded, 7419 targets configured)
FAILED: Build did NOT complete successfully (360 packages loaded, 7419 targets\
 configured)
FAILED: Build did NOT complete successfully (360 packages loaded, 7419 targets\
 configured)

To Reproduce
nix-build -A python3Packages.tensorlfow

bug python

Most helpful comment

To throw a bone to those who are struggling:

shell.nix

let
  oldBazelPkgs = import (builtins.fetchTarball {
    name = "bazel-0.29-nixpkgs";
    url = https://github.com/nixos/nixpkgs/archive/b8ab4451b53e3f8282ff3cbd8dde763e8b5e1d28.tar.gz;
    # Hash obtained using `nix-prefetch-url --unpack <url>`
    sha256 = "0cw3ff9n0zfc1k11sp8cm12g6b7x90yl23llhp8b78017x8lq71p";
  }) {};
  bazelOverlay = self: super: {
    inherit (oldBazelPkgs) bazel bazel-buildtools bazel-deps bazel-remote bazel-watcher;
  };
  pkgs = import <nixpkgs> { overlays = [ bazelOverlay ];};
....
# rest of shell script

global overlay: (E.g ~/.config/nixpkgs/overlays/bazel.nix)

let
  oldBazelPkgs = import (builtins.fetchTarball {
    name = "bazel-0.29-nixpkgs";
    url = https://github.com/nixos/nixpkgs/archive/b8ab4451b53e3f8282ff3cbd8dde763e8b5e1d28.tar.gz;
    # Hash obtained using `nix-prefetch-url --unpack <url>`
    sha256 = "0cw3ff9n0zfc1k11sp8cm12g6b7x90yl23llhp8b78017x8lq71p";
  }) {};
in self: super: {
  inherit (oldBazelPkgs) bazel bazel-buildtools bazel-deps bazel-remote bazel-watcher;
}

this will cause a rebuild of tensorflow and all dependent packages, but you will have tensorflow again.

All 28 comments

@timokau @flokli should we add back the previous bazel toolchain? I don't see a problem with having the 1.0 version alongside the previous one

That might be possible, but I'm not sure how the different parts of the bazel toolchain interact, how you'd get buildBazelPackage to use the right version etc. Someone more involved with bazel would need to do that.

well, i was thinking of going to the commit where it was bumped to 1.0, and freezing the previous the previous toolchain and aliasing it to bazel_0_x

Yes but we'd have to at least duplicate buildBazelPackage as well to make it use bazel_0_x. Or maybe make it possible to override the bazel that is used. That might come in handy in the future too, if they keep breaking backwards compatibility (although hopefully they don't now that we have 1.0).

there could always be a 2.0...

No opinion from myself: I'm only here to add my experience reports :)

anyone know a workaround to get an older version of tensorflow working? Seems like https://nixos.wiki/wiki/FAQ/Pinning_Nixpkgs should be relevant but not sure how to install a single package this way.

you could probably do an overlay with the older bazel version, and then I assume tensorflow would build just fine

@jonringer would you kindly share an example? Been trying to do this for a week in my Nixpkgs, got it to work by building out python.withPkgs from nixos19-09. Couldn't figure out anything else.

let
  oldBazelPkgs = import (builtins.fetchTarball {
  name = "nixos-old-bazel";
  url = https://github.com/nixos/nixpkgs/archive/<commit with old bazel>.tar.gz;
  # Hash obtained using `nix-prefetch-url --unpack <url>`
  sha256 = <some hash>;
}) {};
in 
self: super: {
  inherit (oldBazelPkgs) bazel bazel-buildtools bazel-deps bazel-remote bazel-watcher;
}

I haven't tested this, but I don't seem why it wouldn't work

To throw a bone to those who are struggling:

shell.nix

let
  oldBazelPkgs = import (builtins.fetchTarball {
    name = "bazel-0.29-nixpkgs";
    url = https://github.com/nixos/nixpkgs/archive/b8ab4451b53e3f8282ff3cbd8dde763e8b5e1d28.tar.gz;
    # Hash obtained using `nix-prefetch-url --unpack <url>`
    sha256 = "0cw3ff9n0zfc1k11sp8cm12g6b7x90yl23llhp8b78017x8lq71p";
  }) {};
  bazelOverlay = self: super: {
    inherit (oldBazelPkgs) bazel bazel-buildtools bazel-deps bazel-remote bazel-watcher;
  };
  pkgs = import <nixpkgs> { overlays = [ bazelOverlay ];};
....
# rest of shell script

global overlay: (E.g ~/.config/nixpkgs/overlays/bazel.nix)

let
  oldBazelPkgs = import (builtins.fetchTarball {
    name = "bazel-0.29-nixpkgs";
    url = https://github.com/nixos/nixpkgs/archive/b8ab4451b53e3f8282ff3cbd8dde763e8b5e1d28.tar.gz;
    # Hash obtained using `nix-prefetch-url --unpack <url>`
    sha256 = "0cw3ff9n0zfc1k11sp8cm12g6b7x90yl23llhp8b78017x8lq71p";
  }) {};
in self: super: {
  inherit (oldBazelPkgs) bazel bazel-buildtools bazel-deps bazel-remote bazel-watcher;
}

this will cause a rebuild of tensorflow and all dependent packages, but you will have tensorflow again.

In my case, I only wanted a jupyter notebook service and went with running the tensorflow container:

@jonringer for the latter overlay, I get infinite recursion encountered. I have it in ~/.config/nixpkgs/overlays.nix (wrapped in parentheses), not sure if that makes a difference?

you should be able to do --show-trace to see what was trying to be evaluated, I could have missed a dependency

@jonringer here's the output:

โฏ nix-shell --show-trace -I nixpkgs=/Computer/nixpkgs -p 'python3.buildEnv.override { extraLibs = [ python3Packages.tensorflow ]; }'
error: while evaluating the attribute 'buildInputs' of the derivation 'shell' at /Computer/nixpkgs/pkgs/build-support/trivial-builders.nix:7:14:
while evaluating the attribute 'passAsFile' of the derivation 'python3-3.7.5-env' at /Computer/nixpkgs/pkgs/build-support/trivial-builders.nix:7:14:
while evaluating 'requiredPythonModules' at /Computer/nixpkgs/pkgs/top-level/python-packages.nix:64:27, called from /Computer/nixpkgs/pkgs/development/interpreters/python/wrapper.nix:15:13:
while evaluating 'unique' at /Computer/nixpkgs/lib/lists.nix:643:12, called from /Computer/nixpkgs/pkgs/top-level/python-packages.nix:66:6:
while evaluating 'unique' at /Computer/nixpkgs/lib/lists.nix:643:12, called from /Computer/nixpkgs/lib/lists.nix:649:17:
while evaluating anonymous function at /Computer/nixpkgs/lib/lists.nix:152:16, called from undefined position:
while evaluating the attribute 'outPath' at /Computer/nixpkgs/lib/customisation.nix:159:7:
while evaluating the attribute 'src' of the derivation 'python3.7-tensorflow-gpu-1.14.0' at /Computer/nixpkgs/pkgs/development/interpreters/python/mk-python-derivation.nix:102:3:
while evaluating the attribute 'deps' of the derivation 'tensorflow-gpu-1.14.0' at /Computer/nixpkgs/pkgs/build-support/build-bazel-package/default.nix:15:10:
while evaluating the attribute 'nativeBuildInputs' of the derivation 'tensorflow-gpu-1.14.0-deps' at /Computer/nixpkgs/pkgs/build-support/build-bazel-package/default.nix:18:5:
while evaluating 'getOutput' at /Computer/nixpkgs/lib/attrsets.nix:464:23, called from undefined position:
while evaluating anonymous function at /Computer/nixpkgs/pkgs/stdenv/generic/make-derivation.nix:133:17, called from undefined position:
while evaluating the attribute 'bazel' at /home/tyler/.config/nixpkgs/overlays/bazel.nix:9:25:
while evaluating the attribute 'bazel' at /home/tyler/.config/nixpkgs/overlays/bazel.nix:9:25:
infinite recursion encountered, at /home/tyler/.config/nixpkgs/overlays/bazel.nix:9:25

Edit: if I add buildBazelPackage to the others that are inherited by overlay, a similar issue happens:

error: while evaluating the attribute 'buildInputs' of the derivation 'shell' at /Computer/nixpkgs/pkgs/build-support/trivial-builders.nix:7:14:
while evaluating the attribute 'passAsFile' of the derivation 'python3-3.7.5-env' at /Computer/nixpkgs/pkgs/build-support/trivial-builders.nix:7:14:
while evaluating 'requiredPythonModules' at /Computer/nixpkgs/pkgs/top-level/python-packages.nix:64:27, called from /Computer/nixpkgs/pkgs/development/interpreters/python/wrapper.nix:15:13:
while evaluating 'unique' at /Computer/nixpkgs/lib/lists.nix:643:12, called from /Computer/nixpkgs/pkgs/top-level/python-packages.nix:66:6:
while evaluating 'unique' at /Computer/nixpkgs/lib/lists.nix:643:12, called from /Computer/nixpkgs/lib/lists.nix:649:17:
while evaluating anonymous function at /Computer/nixpkgs/lib/lists.nix:152:16, called from undefined position:
while evaluating the attribute 'outPath' at /Computer/nixpkgs/lib/customisation.nix:159:7:
while evaluating 'assertValidity' at /Computer/nixpkgs/pkgs/stdenv/generic/check-meta.nix:247:20, called from /Computer/nixpkgs/pkgs/stdenv/generic/make-derivation.nix:278:18:
while evaluating 'checkValidity' at /Computer/nixpkgs/pkgs/stdenv/generic/check-meta.nix:228:19, called from /Computer/nixpkgs/pkgs/stdenv/generic/check-meta.nix:248:18:
while evaluating anonymous function at /Computer/nixpkgs/pkgs/stdenv/generic/check-meta.nix:41:56, called from /Computer/nixpkgs/pkgs/stdenv/generic/check-meta.nix:231:13:
while evaluating 'hasLicense' at /Computer/nixpkgs/pkgs/stdenv/generic/check-meta.nix:35:16, called from /Computer/nixpkgs/pkgs/stdenv/generic/check-meta.nix:42:5:
while evaluating the attribute 'buildBazelPackage' at /home/tyler/.config/nixpkgs/overlays/bazel.nix:9:24:
while evaluating the attribute 'buildBazelPackage' at /home/tyler/.config/nixpkgs/overlays/bazel.nix:9:24:
infinite recursion encountered, at /home/tyler/.config/nixpkgs/overlays/bazel.nix:9:24

This seems to be an error in how the overlays directory gets folded over.... doing,

$ nix-shell -p "with import ./. { overlays = [ (import /home/jon/bazelOverlay.nix) ];}; python3Packages.tensorflow"

builds just fine.

it seems that overlays specific'd in the overlays directory get applied twice

The following also worked for me in .config/nixpkgs/overlays/python.nix:

self: super:
let
    bazelSrc = super.pkgs.fetchFromGitHub {
      owner  = "nixos";
      repo   = "nixpkgs";
      # rev    = "7e5b2e74b7ee31e26fb3ced45ef74758d682505a";
      rev = "b8ab4451b53e3f8282ff3cbd8dde763e8b5e1d28";
      #sha256 = "0afsy1pfcy9bhhq1cw5268p3ghiijx0xlj68v1g1yym8gcpi8000";
      sha256 = "0cw3ff9n0zfc1k11sp8cm12g6b7x90yl23llhp8b78017x8lq71p";
    };
    oldPkgs = import bazelSrc {};
    oldBuildBazelPackage = oldPkgs.buildBazelPackage;

  in
  {
  pythonOverrides = python-self: python-super: {
    tensorflow = python-super.tensorflow.override { cudaSupport = true; buildBazelPackage = oldBuildBazelPackage; };
  python37 = super.python37.override {packageOverrides = self.pythonOverrides;};
  };

I'm pretty sure that there is something wrong with the opengl-driver runpath (for libcuda): https://github.com/NixOS/nixpkgs/commit/1c429acbffc650d960ec64014e9af92c8f2577f3#diff-02b80aedf2ac4a1f97ca849034b2e240
Couldn't really find it in any of the outputs of readelf -d

cc @ambrop72

I'll try this and check the runpaths.
By the way, why does the package put nvidia_x11 into buildInputs? That seems wrong to me. No package should depend on nvidia_x11, it's up to users to select which version of the driver to use in the system configuration.

I'm assuming it's for cuda, they rely on shared libraries that are built as part of nvidia_x11 package

I'm assuming it's for cuda, they rely on shared libraries that are built as part of nvidia_x11 package

For building, cudatoolkit provides some stubs to satisfy the linker during build time, so nvidia_x11 should really not be a buildInput.

oh, then maybe it could be improved. I'm not super familiar with the cuda toolchain

I'm not very familiar as well, but anybody who knows more about it should feel free to file a PR and ping me :)

I'm pretty sure that there is something wrong with the opengl-driver runpath (for libcuda): 1c429ac#diff-02b80aedf2ac4a1f97ca849034b2e240
Couldn't really find it in any of the outputs of readelf -d

Ok, sorry about the noise, libtensorflow-bin does actually contain /run/opengl-driver/lib in the runpath if correctly built with cudaSupport, however the nvidia_x11 dependency is probably not supposed to be there.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ob7 picture ob7  ยท  3Comments

ghost picture ghost  ยท  3Comments

vaibhavsagar picture vaibhavsagar  ยท  3Comments

chris-martin picture chris-martin  ยท  3Comments

copumpkin picture copumpkin  ยท  3Comments