Nixpkgs: nixos.ova build fails on Hydra

Created on 18 May 2017  Â·  22Comments  Â·  Source: NixOS/nixpkgs

Issue description

The ova job has been failing during past week, blocking the nixos-unstable channel. I don't know what's the exact problem.

Steps to reproduce

I'm unable to reproduce the failure locally.

blocker

Most helpful comment

Please bear in mind I don't know anything about the low-level details involved. I was able to reproduce the failure in a VM and trying to debug and connect some breadcrumbs I ended up with the below, although again my reasoning here may well be flawed. Hopefully it is of some help and doesn't add to the confusion.


Applying the following:

$ git status --short
 M pkgs/tools/system/fakeroot/default.nix
?? pkgs/tools/system/fakeroot/einval.patch
$ git diff
diff --git a/pkgs/tools/system/fakeroot/default.nix b/pkgs/tools/system/fakeroot/default.nix
index 5286b6b2cb..a3b858db2d 100644
--- a/pkgs/tools/system/fakeroot/default.nix
+++ b/pkgs/tools/system/fakeroot/default.nix
@@ -10,7 +10,7 @@ stdenv.mkDerivation rec {
   };

   # patchset from brew
-  patches = stdenv.lib.optionals stdenv.isDarwin [
+  patches = [ ./einval.patch ] ++ (stdenv.lib.optionals stdenv.isDarwin [
     (fetchpatch {
       name = "0001-Implement-openat-2-wrapper-which-handles-optional-ar.patch";
       url = "https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=0001-Implement-openat-2-wrapper-which-handles-optional-ar.patch;att=1;bug=766649";
@@ -26,7 +26,7 @@ stdenv.mkDerivation rec {
       url = "https://bugs.debian.org/cgi-bin/bugreport.cgi?att=2;bug=766649;filename=fakeroot-always-pass-mode.patch;msg=20";
       sha256 = "0i3zaca1v449dm9m1cq6wq4dy6hc2y04l05m9gg8d4y4swld637p";
     })
-    ];
+    ]);

   buildInputs = [ getopt ]
     ++ stdenv.lib.optional (!stdenv.isDarwin) libcap
$ cat pkgs/tools/system/fakeroot/einval.patch
diff --git a/libfakeroot.c b/libfakeroot.c
index 68a95fb..70da8bc 100644
--- a/libfakeroot.c
+++ b/libfakeroot.c
@@ -792,7 +792,7 @@ int chown(const char *path, uid_t owner, gid_t group){
     r=next_lchown(path,owner,group);
   else
     r=0;
-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -819,7 +819,7 @@ int lchown(const char *path, uid_t owner, gid_t group){
     r=next_lchown(path,owner,group);
   else
     r=0;
-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -843,7 +843,7 @@ int fchown(int fd, uid_t owner, gid_t group){
   else
     r=0;

-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -870,7 +870,7 @@ int fchownat(int dir_fd, const char *path, uid_t owner, gid_t group, int flags)
   else
     r=0;

-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;

on top of:

  • System: 17.09pre108553.0011f9065a (Hummingbird)
  • Nix version: nix-env (Nix) 1.12pre5350_7689181e
  • Nixpkgs version: 17.09pre108553.0011f9065a
  • Sandboxing enabled: build-use-sandbox = true

makes the test pass for me.

Patch and ideas stolen from https://github.com/NixOS/nixpkgs/issues/10496

All 22 comments

Differences between succeeding and failing job:

+copying closure closure to /build/root...
-copying closure closure to /tmp/nix-build-nixos-ova-17.09pre107278.42bf19cc04-x86_64-linux.drv-0/root...
+error: changing ownership of path ‘/build/root/nix/store’: Invalid argument

Probably the same reason as explained in 6cfb3b636418526d1c49d14316a127133cf09c9d.

No idea on how to fix, though.

This issue seems to have no owner (just stating facts, no offense intended). Is there a process to identify the offending commit and roll it back?

Apparently it wasn't triggered by a nixpkgs commit but by a nix change. That's why most people won't reproduce it. I don't think a process really exists (for this).

/cc @edolstra for the option to roll that change back on the build farm for now, as there's no idea how to fix it properly and the channel is on a ~11 days old commit already.

Seems it's now timing out building ibus package?

ibus was updated on master (since the last ova failure) and it seems to build fine on Hydra now, though the ova job doesn't really show that (yet).

Now the job built successfully, though I can't see why. AFAIK it's possible some build slaves still use an older version of nix, or something...

Right, probably only the packet machine can succeed.

I managed to make Hydra build the tested job successfully now, after two weeks, but this issue remains a channel blocker IMO.

I guess the packet machine was updated so now the job will never succeed...

The problem seems to be we're running insstall commands for the ova inside the sandbox, which prevents setuid/gid, _however_, we do use setsid/setgid in some places:

ex in nixos-prepare-root:

mkdir -m 1775 -p $mountPoint/nix/store

Here are some additional places:

grahamc@Morbo> rg  -g '!*.xml' nix/store | grep -E "[0-7]{4}"
nixos/modules/virtualisation/qemu-vm.nix:          mkdir -p 0755 $targetRoot/nix/.rw-store/store $targetRoot/nix/.rw-store/work $targetRoot/nix/store
nixos/modules/system/boot/stage-2-init.sh:chmod -f 1775 /nix/store
nixos/modules/installer/tools/nixos-prepare-root.sh:mkdir -m 1775 -p $mountPoint/nix/store
pkgs/tools/package-management/nix/nix/nix.spec.in:chmod 1775 /nix/store

Right, /nix/store seems to use 1775 root nixbld on standard NixOS, but that's only the sticky bit, not set(u/g)id. If I read the seccomp code right, it only attempts to disallow those two bits.

Please bear in mind I don't know anything about the low-level details involved. I was able to reproduce the failure in a VM and trying to debug and connect some breadcrumbs I ended up with the below, although again my reasoning here may well be flawed. Hopefully it is of some help and doesn't add to the confusion.


Applying the following:

$ git status --short
 M pkgs/tools/system/fakeroot/default.nix
?? pkgs/tools/system/fakeroot/einval.patch
$ git diff
diff --git a/pkgs/tools/system/fakeroot/default.nix b/pkgs/tools/system/fakeroot/default.nix
index 5286b6b2cb..a3b858db2d 100644
--- a/pkgs/tools/system/fakeroot/default.nix
+++ b/pkgs/tools/system/fakeroot/default.nix
@@ -10,7 +10,7 @@ stdenv.mkDerivation rec {
   };

   # patchset from brew
-  patches = stdenv.lib.optionals stdenv.isDarwin [
+  patches = [ ./einval.patch ] ++ (stdenv.lib.optionals stdenv.isDarwin [
     (fetchpatch {
       name = "0001-Implement-openat-2-wrapper-which-handles-optional-ar.patch";
       url = "https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=0001-Implement-openat-2-wrapper-which-handles-optional-ar.patch;att=1;bug=766649";
@@ -26,7 +26,7 @@ stdenv.mkDerivation rec {
       url = "https://bugs.debian.org/cgi-bin/bugreport.cgi?att=2;bug=766649;filename=fakeroot-always-pass-mode.patch;msg=20";
       sha256 = "0i3zaca1v449dm9m1cq6wq4dy6hc2y04l05m9gg8d4y4swld637p";
     })
-    ];
+    ]);

   buildInputs = [ getopt ]
     ++ stdenv.lib.optional (!stdenv.isDarwin) libcap
$ cat pkgs/tools/system/fakeroot/einval.patch
diff --git a/libfakeroot.c b/libfakeroot.c
index 68a95fb..70da8bc 100644
--- a/libfakeroot.c
+++ b/libfakeroot.c
@@ -792,7 +792,7 @@ int chown(const char *path, uid_t owner, gid_t group){
     r=next_lchown(path,owner,group);
   else
     r=0;
-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -819,7 +819,7 @@ int lchown(const char *path, uid_t owner, gid_t group){
     r=next_lchown(path,owner,group);
   else
     r=0;
-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -843,7 +843,7 @@ int fchown(int fd, uid_t owner, gid_t group){
   else
     r=0;

-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -870,7 +870,7 @@ int fchownat(int dir_fd, const char *path, uid_t owner, gid_t group, int flags)
   else
     r=0;

-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;

on top of:

  • System: 17.09pre108553.0011f9065a (Hummingbird)
  • Nix version: nix-env (Nix) 1.12pre5350_7689181e
  • Nixpkgs version: 17.09pre108553.0011f9065a
  • Sandboxing enabled: build-use-sandbox = true

makes the test pass for me.

Patch and ideas stolen from https://github.com/NixOS/nixpkgs/issues/10496

@pbogdan I'm in no position to evaluate if your patch is the best solution (I have no idea what I'm doing here,) however I can definitely appreciate your great digging and patch. Thank you!

@grahamc you can safely skip the sticky bit in, when creating the image. It will be fixed by stage-2 automatically: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/boot/stage-2-init.sh#L62

Oh, I didn't realize this isn't reproducible with the latest stable nix. It still builds in /tmp and not /build, according to log from ova, and it just succeeds as it is.
EDIT: I wonder why that is, as that change seems to generate more secure binaries when that daemon is used to build them.

This solution seems OK. I think I now understand. @pbogdan: thank you a lot for finding a solution; I designated you as the author of the modified commit :-)

Not only did it work okay, it passed: https://hydra.nixos.org/build/54649680#tabs-summary

I think we'll get a channel update: https://hydra.nixos.org/job/nixos/trunk-combined/tested#tabs-constituents

https://channels.nix.gsc.io/graph.html

Yes, I do believe we'll finally get a channel bump within several hours 🎉 There are just some heavy packages left, e.g. webkitgtk...

nixos-unstable bumped!

Maybe we should just get rid of that fakeroot stuff? I mean, the goal was to git rid of the QEMU VM, but that's still being used, so fakeroot just seems an unnecessary complication...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lverns picture lverns  Â·  3Comments

teto picture teto  Â·  3Comments

tomberek picture tomberek  Â·  3Comments

copumpkin picture copumpkin  Â·  3Comments

ayyess picture ayyess  Â·  3Comments