On 18.03-Impala the package pythonPackages.tensorflow
fails to build when using pythonPackages
, python3Packages
or python36Packages
(didn't work on my machine and the machine of @fpletz)
The error message looks like this:
$ nix-build -A pythonPackages.tensorflow
...
____Loading package: tensorflow/tools/pip_package
____Loading package: @bazel_tools//tools/cpp
____Loading package: @bazel_tools//tools/jdk
____Loading package: @local_config_xcode//
____Loading package: @local_jdk//
____Loading package: @local_config_cc//
____Loading complete. Analyzing...
____Loading package: tensorflow/python/tools
____Loading package: @nccl_archive//
____Loading package: tensorflow/python
ERROR: /tmp/nix-build-python2.7-tensorflow-1.3.1.drv-0/source/tensorflow/tools/pip_package/BUILD:100:1: no such package '@zlib_archive//': BUILD file not found on package path and referenced by '//tensorflow/tools/pip_package:licenses'.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted.
____Elapsed time: 4.076s
builder for '/nix/store/1chcdjqqnljpwd8xznwf7pql02s2h38x-python2.7-tensorflow-1.3.1.drv' failed with exit code 1
error: build of '/nix/store/1chcdjqqnljpwd8xznwf7pql02s2h38x-python2.7-tensorflow-1.3.1.drv' failed
$ nix-build -A python3Packages.tensorflow
...
____Loading package: tensorflow/tools/pip_package
____Loading package: @bazel_tools//tools/cpp
____Loading package: @bazel_tools//tools/jdk
____Loading package: @local_config_xcode//
____Loading package: @local_jdk//
____Loading package: @local_config_cc//
____Loading complete. Analyzing...
____Loading package: tensorflow/contrib/slim
____Loading package: tensorflow/python
____Loading package: tensorflow/contrib/tensor_forest
____Loading package: @grpc//
____Loading package: tensorflow/contrib/timeseries
ERROR: /tmp/nix-build-python3.6-tensorflow-1.3.1.drv-0/source/tensorflow/tools/pip_package/BUILD:100:1: no such package '@png_archive//': BUILD file not found on package path and referenced by '//tensorflow/tools/pip_package:licenses'.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted.
____Elapsed time: 4.044s
builder for '/nix/store/2hqd8afh58mvr72301apqh5yv6x57bql-python3.6-tensorflow-1.3.1.drv' failed with exit code 1
error: build of '/nix/store/2hqd8afh58mvr72301apqh5yv6x57bql-python3.6-tensorflow-1.3.1.drv' failed
The first commit which affects tensorflow and is not incorporated in 17.09 is https://github.com/NixOS/nixpkgs/commit/1f2a18d9163f75c1001a04157f195557b0c24f8a#diff-4c48ecaa454daa000c372b8b2ca7cfbe by @abbradar.
However this might be a transitive issue caused by Bazel.
run nix-build -A pythonPackages.tensorflow
or nix-build -A python3Packages.tensorflow
on master.
When pinning NixOS to 17.09 it works perfectly fine.
note: I'm just trying to "learn" tensorflow, so I'm definitely not an expert about this...
Hm, I guess this is because of sandboxing. Can you try enabling it and see if it works?
Unfortunately not:
$ nix run nixpkgs.pythonPackages.tensorflow --sandbox
builder for '/nix/store/ikrfvs0blbi9ibvw7y1s0nkv3l7ikbcr-python2.7-tensorflow-1.3.1.drv' failed with exit code 1; last 10 log lines:
____Loading package: tensorflow/contrib/boosted_trees
____Loading package: tensorflow/contrib/cluster_resolver
____Loading package: tensorflow/python/saved_model
____Loading package: tensorflow/contrib/signal
____Loading package: tensorflow/core
____Loading package: @protobuf//
____Loading package: third_party/hadoop
ERROR: /tmp/nix-build-python2.7-tensorflow-1.3.1.drv-0/source/tensorflow/tools/pip_package/BUILD:100:1: no such package '@lmdb//': BUILD file not found on package path and referenced by '//tensorflow/tools/pip_package:licenses'.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted.
____Elapsed time: 4.051s
[0 built (1 failed), 0.0 MiB DL]
error: build of '/nix/store/ikrfvs0blbi9ibvw7y1s0nkv3l7ikbcr-python2.7-tensorflow-1.3.1.drv' failed
I can reproduce this on master; strange but most likely something else has changed that broke the build. Either way I have 1.4 update ready that builds for me.
if 1.4 builds I think we don't need any further investigation here, do we?
Let's leave this open until 1.4 lands.
In case this needs fixed prior to 1.4, c3255fe8ec326d2c8fe9462d49ed83aa64d3e68f appears to be the commit that breaks this, though there seem to be some glibc 2.26 issues too (at least with CUDA).
@abbradar what's needed to merge your tensorflow branch?
Successfully built it without cuda support with this patch on top of your tensorflow-new branch:
From 1eb8b76515e50aad1c7fbc3690f75df19567d418 Mon Sep 17 00:00:00 2001
From: Robin Gloster <[email protected]>
Date: Thu, 7 Dec 2017 19:45:17 +0100
Subject: [PATCH] tensorflow: correctly optionalize cuda + fix deps hash
---
pkgs/development/python-modules/tensorflow/default.nix | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/pkgs/development/python-modules/tensorflow/default.nix b/pkgs/development/python-modules/tensorflow/default.nix
index 8b59916f009..fcf6fbfe6e6 100644
--- a/pkgs/development/python-modules/tensorflow/default.nix
+++ b/pkgs/development/python-modules/tensorflow/default.nix
@@ -71,7 +71,7 @@ let
mkdir -p "$PYTHON_LIB_PATH"
'';
- NIX_CFLAGS_COMPILE = cudatoolkit.ccFlags;
+ NIX_CFLAGS_COMPILE = lib.optional cudaSupport cudatoolkit.ccFlags;
NIX_LDFLAGS = lib.optionals cudaSupport [ "-lcublas" "-lcudnn" "-lcuda" "-lcudart" ];
hardeningDisable = [ "all" ];
@@ -89,7 +89,7 @@ let
rm -rf $bazelOut/external/{bazel_tools,\@bazel_tools.marker,local_*,\@local_*}
'';
- sha256 = "0sq0a7vsajzqwxgg82xw1q74n7vdq37n9d5z7p0c8gzpmyw7mgc9";
+ sha256 = "10k7i61ya33dcy98i0s7r8f1d4s4rwjl5myfyiyr46skjpzydxdv";
};
buildAttrs = {
--
2.15.0
Does it fail if you build with CUDA?
Built both with the patch above.
ping @abbradar
My build demands the same hash as @globin 's. Because tensorflow is anyway broken in master, I'd propose to go with this patch until we build a better understanding of what's going on. (The alternative would be to rollback to a wheel-based pre-build of tensorflow).
Actually, I did not manage to rebase the branch in question onto master. (The patch of bazel does not apply any more, and the simplest fix yields a version which can't build tensorflow.) It seems that in the absence of @abbradar we'll have to return to the wheel-based build.
After some more investigation: the bazel patch is not applied in the tensorflow-new branch nor in master. (In fact it won't apply). But after rebasing it is attempted (but fails). Fixing or disabling the patch yields to a build failure for tensorflow.
I went back to the wheel-based build (locally). For the record here is the nix file that I'm using.
{ stdenv
, symlinkJoin
, lib
, fetchurl
, buildPythonPackage
, isPy3k, isPy35, isPy36, isPy27
, cudaSupport ? false
, cudatoolkit ? null
, cudnn ? null
, linuxPackages ? null
, tensorflow-tensorboard
, six
, protobuf
, numpy
, mock
, backports_weakref
, absl-py
, zlib
, python
}:
assert cudaSupport -> cudatoolkit != null
&& cudnn != null
&& linuxPackages != null;
# unsupported combination
assert ! (stdenv.isDarwin && cudaSupport);
# tensorflow is built from a downloaded wheel, because the upstream
# project's build system is an arcane beast based on
# bazel. Untangling it and building the wheel from source is an open
# problem.
buildPythonPackage rec {
pname = "tensorflow";
version = "1.5.0rc1";
name = "${pname}-${version}";
format = "wheel";
disabled = ! (isPy35 || isPy36 || isPy27);
# cudatoolkit is split (see https://github.com/NixOS/nixpkgs/commit/bb1c9b027d343f2ce263496582d6b56af8af92e6)
# However this means that libcusolver is not loadable by tensor flow. So we undo the split here.
cudatoolkit_joined = symlinkJoin {
name = "unsplit_cudatoolkit";
paths = [ cudatoolkit.out
cudatoolkit.lib ];};
src = let
tfurl = sys: proc: pykind:
let
tfpref = if proc == "gpu"
then "gpu/tensorflow_gpu"
else "cpu/tensorflow";
in
"https://storage.googleapis.com/tensorflow/${sys}/${tfpref}-${version}-${pykind}.whl";
dls =
{
darwin.cpu = {
py2 = {
url = tfurl "mac" "cpu" "py2-none-any" ;
sha256 = "0nkymqbqjx8rsmc8vkc26cfsg4hpr6lj9zrwhjnfizvkzbbsh5z4";
};
py3 = {
url = tfurl "mac" "cpu" "py3-none-any" ;
sha256 = "1rj4m817w3lajnb1lgn3bwfwwk3qwvypyx11dim1ybakbmsc1j20";
};
};
linux-x86_64.cpu = {
py2 = {
url = tfurl "linux" "cpu" "cp27-none-linux_x86_64";
sha256 = "09pcyx0yfil4dm6cij8n3907pfgva07a38avrbai4qk5h6hxm8w9";
};
py35 = {
url = tfurl "linux" "cpu" "cp35-cp35m-linux_x86_64";
sha256 = "0p10zcf41pi33bi025fibqkq9rpd3v0rrbdmc9i9yd7igy076a07";
};
py36 = {
url = tfurl "linux" "cpu" "cp36-cp36m-linux_x86_64";
sha256 = "1qm8lm2f6bf9d462ybgwrz0dn9i6cnisgwdvyq9ssmy2f1gp8hxk";
};
};
linux-x86_64.cuda = {
py2 = {
url = tfurl "linux" "gpu" "cp27-none-linux_x86_64";
sha256 = "10yyyn4g2fsv1xgmw99bbr0fg7jvykay4gb5pxrrylh7h38h6wah";
};
py35 = {
url = tfurl "linux" "gpu" "cp35-cp35m-linux_x86_64";
sha256 = "0icwnhkcf3fxr6bmbihqzipnn4pxybd06qv7l3k0p4xdgycwzmzk";
};
py36 = {
url = tfurl "linux" "gpu" "cp36-cp36m-linux_x86_64";
sha256 = "16n8fx8h66jy07p93fvny8knq8ri1i2svm2sbw9fq44lhrhqi4az";
};
};
};
in
fetchurl (
if stdenv.isDarwin then
if isPy3k then
dls.darwin.cpu.py3
else
dls.darwin.cpu.py2
else
if isPy35 then
if cudaSupport then
dls.linux-x86_64.cuda.py35
else
dls.linux-x86_64.cpu.py35
else if isPy36 then
if cudaSupport then
dls.linux-x86_64.cuda.py36
else
dls.linux-x86_64.cpu.py36
else
if cudaSupport then
dls.linux-x86_64.cuda.py2
else
dls.linux-x86_64.cpu.py2
);
propagatedBuildInputs =
[ numpy six protobuf mock backports_weakref absl-py ]
++ lib.optional (!isPy36) tensorflow-tensorboard
++ lib.optionals cudaSupport [ cudatoolkit_joined cudnn stdenv.cc ];
# tensorflow-gpu depends on tensorflow_tensorboard, which cannot be
# built at the moment (some of its dependencies do not build
# [htlm5lib9999999 (seven nines) -> tensorboard], and it depends on an old version of
# bleach) Hence we disable dependency checking for now.
installFlags = lib.optional isPy36 "--no-dependencies";
# Note that we need to run *after* the fixup phase because the
# libraries are loaded at runtime. If we run in preFixup then
# patchelf --shrink-rpath will remove the cuda libraries.
postFixup = let
rpath = stdenv.lib.makeLibraryPath
(if cudaSupport then
[ stdenv.cc.cc.lib zlib cudatoolkit_joined cudnn
linuxPackages.nvidia_x11 ]
else
[ stdenv.cc.cc.lib zlib ]
);
in
''
rrPath="$out/${python.sitePackages}/tensorflow/:${rpath}"
internalLibPath="$out/${python.sitePackages}/tensorflow/python/_pywrap_tensorflow_internal.so"
find $out -name '*.so' -exec patchelf --set-rpath "$rrPath" {} \;
'';
doCheck = false;
meta = with stdenv.lib; {
description = "TensorFlow helps the tensors flow";
homepage = http://tensorflow.org;
license = licenses.asl20;
maintainers = with maintainers; [ jyp ];
platforms = with platforms; if cudaSupport then linux else linux ++ darwin;
};
}
@jyp I'm trying this out. Which nixpkgs commit does this building on top of?
@timsears d2d1a2dfbabaf723ebc2102a3c7baa5138303bc2
Note that this was quick and dirty --- only one hash is updated (py 3.6 with cuda).
@jyp thanks. For others (or myself when I forget :-) I got tensorflowWithCuda running using @jpy wheel based build. I am using local nixpkgs github repo under my home directory.. Tracking nixos-unstable. Tested with commit 5402412b97247bcc
plus the following changes.
~/nixpkgs/pkgs/development/python-modules/tensorflow/default.nix
with the expression provided above~/nixpkgs/pkgs/top-level/python-packages.nix
. Make the expression for tensorflow look liketensorflow = callPackage ../development/python-modules/tensorflow rec {
cudaSupport = pkgs.config.cudaSupport or false;
cudatoolkit = pkgs.cudatoolkit9;
cudnn = pkgs.cudnn_cudatoolkit9;
};
(Note: This uses a recent version of cudnn that you have to nupload with nix-prefetch-url.)
For completness sake, here's the recent versions I ended up with. Loading cuda after nix packaging can be very flaky.
nix-repl> cudnn.version
"7.0.5"
nix-repl> cudatoolkit.version
"9.0.176"
nix-repl> cudnn_cudatoolkit9
芦derivation /nix/store/1al281ymrdlhj2z4nhbnycsclwf9yh5n-cudatoolkit-9.0-cudnn-7.0.5.drv禄
{ pkgs ? (import <nixpkgs>) {} }:
with pkgs;
stdenv.mkDerivation {
name = "MachineLearningEnv";
buildInputs = [
pandoc # or other command line tools you may need
]
++ (with python36Packages; [
tensorflowWithCuda
#tensorflow # provides the cpu version
jupyter
Keras
pillow
widgetsnbextension
scikitlearn
seaborn
matplotlib
]);
}
Related to @jyp comment, I should note that I had to change one of the hashes in the expression. Those changes are not reflect in the the expression.
I was planning to make a PR with the above patch for tensorflow 1.5.0, but unfortunately I got:
RuntimeError: module compiled against API version 0xc but this version of numpy is 0xb
I tried to track this down but it is daunting to even get a mapping from API versions to numpy versions. Updating to numpy 1.14 did not change the error.
Actually, tensorflow 1.5.0 now requires numpy 1.14, but nixpkgs has an issue with that version: https://github.com/NixOS/nixpkgs/issues/33559
Some weirdness:
PR is available in #34418 but cannot be merged at the moment. If you care, please help with the blocking issues: #33559
@jyp Although your PR is closed, regarding Numpy: you could override numpy in the tensorflow derivation, e.g.
let
numpy_1_14 = numpy.overridePythonAttrs(oldAttrs: rec {
version = "1.14.0";
pname = "numpy";
name = "${pname}-${version}";
src = python.pkgs.fetchPypi {
inherit pname version;
extension = "zip";
sha256 = "1ywrq31sy8hkgis1sv9kgac53v2478r1i01442s0f8r1bf9l7rix";
};
});
in
...
buildInputs = [ numpy_1_14 ];
...
Hacky but it should work. I'm going to try and tackle building it this weekend as I have a project which depends on it.
Regarding the sha not triggering a re-fetch, it does that on fetchurl and fetchFromGithub as well. If the sha is the same there isn't any reason to try fetching again. I usually change the sha by one character to get it to re-fetch and print the actual hash (for quick testing); so you can copy the actual sha from the mismatch error that gets printed.
@lukeadams
I have another PR with 1.4 which should be mergeable now. If you support the revert to a wheel-based build please comment about it in the PR.
@abbradar is back working on the bazel build. Some fixes went in and he thinks that the current failure is due to other factors.
Fixed in https://github.com/NixOS/nixpkgs/commit/94ebc13a6ac5c6448a932ca48ae9e2bd9ce755ea -- the core issue was tensorfow after all, because I tested exclusively CUDA builds this has been left unnoticed. Let's leave this open until Hydra builds the package.
https://hydra.nixos.org/build/70215238 finally, success!
awesome, thanks!
@abbradar isn't the hydra server building bin.nix
and not the default.nix
? Or, at least, on master
in python-packages.nix
it says
tensorflow =
if stdenv.isDarwin
then callPackage ../development/python-modules/tensorflow/bin.nix { }
else callPackage ../development/python-modules/tensorflow/bin.nix rec {
...
};
If I change this to use bazel
based default.nix
file I get
nix-build -A pythonPackages.tensorflow
...
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
........
Loading:
Loading: 0 packages loaded
Analyzing: target //tensorflow/tools/pip_package:build_pip_package (2 packages loaded)
Analyzing: target //tensorflow/tools/pip_package:build_pip_package (65 packages loaded)
Analyzing: target //tensorflow/tools/pip_package:build_pip_package (115 packages loaded)
ERROR: /build/output/external/jpeg/BUILD:122:12: Illegal ambiguous match on configurable attribute "deps" in @jpeg//:jpeg:
@jpeg//:k8
@jpeg//:armeabi-v7a
Multiple matches are not allowed unless one is unambiguously more specialized.
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted:
/build/output/external/jpeg/BUILD:122:12: Illegal ambiguous match on configurable attribute "deps" in @jpeg//:jpeg:
@jpeg//:k8
@jpeg//:armeabi-v7a
Multiple matches are not allowed unless one is unambiguously more specialized.
INFO: Elapsed time: 4.480s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (124 packages loaded)
FAILED: Build did NOT complete successfully (124 packages loaded)
builder for '/nix/store/lq5lr44hp3c61qbfafllwvin93zsx7p4-tensorflow-build-1.5.0.drv' failed with exit code 1
cannot build derivation '/nix/store/lfazj9kahmpgqmbp8ncgxd0647f1ckvn-python2.7-tensorflow-1.5.0.drv': 1 dependencies couldn't be built
error: build of '/nix/store/lfazj9kahmpgqmbp8ncgxd0647f1ckvn-python2.7-tensorflow-1.5.0.drv' failed
@jyp @timsears the bin.nix
build in master
also fails to work with python.withPackages
. Checking out master
and then doing
nix-build -E 'with import ./. { }; python.withPackages (p: with p; [tensorflow])'
building '/nix/store/zj8adg9ri2jfmr0xg86rlk0i9nk0sw85-python-2.7.15-env.drv'...
collision between `/nix/store/z9a2y31akz53df8g9d6f5klr7kdzgmvb-python2.7-tensorflow-tensorboard-1.7.0/bin/.tensorboard-wrapped' and `/nix/store/wmxg0d5zw88a9m8zk1pmza3ga05npgzc-python2.7-tensorflow-1.7.1/bin/.tensorboard-wrapped'
builder for '/nix/store/zj8adg9ri2jfmr0xg86rlk0i9nk0sw85-python-2.7.15-env.drv' failed with exit code 25
error: build of '/nix/store/zj8adg9ri2jfmr0xg86rlk0i9nk0sw85-python-2.7.15-env.drv' failed
I'm guessing this is due to tensorflow-tensorboard
being a propagated-build-input of tensorflow
.
@jyp @timsears unpacked the tensorflow wheel and poked around in it. Found this purelib/tensorflow/tools/pip_package/setup.py
CONSOLE_SCRIPTS = [
'freeze_graph = tensorflow.python.tools.freeze_graph:main',
'toco_from_protos = tensorflow.contrib.lite.toco.python.toco_from_protos:main',
'toco = tensorflow.contrib.lite.toco.python.toco_wrapper:main',
'saved_model_cli = tensorflow.python.tools.saved_model_cli:main',
# We need to keep the TensorBoard command, even though the console script
# is now declared by the tensorboard pip package. If we remove the
# TensorBoard command, pip will inappropriately remove it during install,
# even though the command is not removed, just moved to a different wheel.
'tensorboard = tensorboard.main:run_main',
]
From the sounds of that, possibly the tensorboard executable should just be erased as a post step?
Possibly #42783 is a way out of this.
With regard to the bin.nix
variant andtensorboard
, I added
rm $out/bin/tensorboard $out/bin/.tensorboard-wrapped
to the postFixup
phase script of tensorflow
and it solved the pythonPackages
issue (tested both the GPU and CPU versions under Linux).
@twhitehead Please submit a PR -- this issue is about something else entirely (and closed).
@jyp will do. Also found issue #42809 about the bazel build issue I noticed too, so will continue that discussion there.
Most helpful comment
Fixed in https://github.com/NixOS/nixpkgs/commit/94ebc13a6ac5c6448a932ca48ae9e2bd9ce755ea -- the core issue was tensorfow after all, because I tested exclusively CUDA builds this has been left unnoticed. Let's leave this open until Hydra builds the package.