Nixpkgs: dbus: unable to reload configuration: Failed to open "/etc/dbus-1/system.conf"

Created on 26 Jun 2016  Â·  31Comments  Â·  Source: NixOS/nixpkgs

Issue description

Using nixos-unstable in a virtualbox VM via nixops, I am not able to restart dbus because it cannot load /etc/dbus-1/system.conf. The flle is not present:

$ ls -l /etc/dbus-1/
total 8
-r--r--r-- 1 root root  989 Jan  1  1970 session-local.conf
-r--r--r-- 1 root root 1093 Jan  1  1970 system-local.conf

Steps to reproduce

works with release-16.03:

git checkout release-16.03
nixops deploy -d trivial -I nixpkgs=/path/to/nixpkgs

does not work with current nixos-unstable 453086a:

git checkout 453086a
nixops deploy -d trivial -I nixpkgs=/path/to/nixpkgs

Technical details

  • Guest System: NixOS: 16.09.git.453086a (Flounder)
  • Host system: Gentoo Base System release 2.2
  • Nix version: nix-env (Nix) 1.11.2
  • Nixpkgs version: 16.09pre85639.453086a
  • NixOps version: e33c18c

    NixOps output

building all machine configurations...
trace: warning: The option `boot.loader.grub.timeout' defined in `/nix/store/rknr5q5sw48f041bxscn9gdq7jncdn1i-nixops-1.4pre0_abcdef/share/nix/nixops/virtualbox-image-nixops.nix' has been renamed to `boot.loader.timeout'.
webserver> copying closure...
trivial> closures copied successfully
webserver> updating GRUB 2 menu...
webserver> activating the configuration...
webserver> setting up /etc...
webserver> reloading the following units: dbus.service
webserver> Job for dbus.service failed because the control process exited with error code.
webserver> See "systemctl status dbus.service" and "journalctl -xe" for details.
webserver> the following new units were started: httpd.service
webserver> error: unable to activate new configuration
error: activation of 1 of 1 machines failed (namely on ‘webserver’)

systemctl status dbus.service

● dbus.service - D-Bus System Message Bus
   Loaded: loaded (/nix/store/gv4r19mlazqb7nzzyj28l468awv3km5c-dbus-1.10.8/etc/systemd/system/dbus.service; bad; vendor preset: enabled)
  Drop-In: /nix/store/b7dmswganv200acahv5ipnn5ys7craxg-system-units/dbus.service.d
           └─overrides.conf
   Active: active (running) (Result: exit-code) since Sun 2016-06-26 09:54:35 UTC; 1h 7min ago
     Docs: man:dbus-daemon(1)
  Process: 8596 ExecReload=/nix/store/gv4r19mlazqb7nzzyj28l468awv3km5c-dbus-1.10.8/bin/dbus-send --print-reply --system --type=method_call --dest=org.freedesktop.DBus / org.freedesktop.DBus.ReloadConfig (code=exited, status=1/FAILURE)
 Main PID: 447 (dbus-daemon)
   CGroup: /system.slice/dbus.service
           └─447 /nix/store/ridpx1hxllj4xs1l1dpqrdhgxx15d7q0-dbus-tools-1.8.20/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Jun 26 11:00:57 webserver systemd[1]: Reloading D-Bus System Message Bus.
Jun 26 11:00:57 webserver dbus[447]: [system] Unable to reload configuration: Failed to open "/etc/dbus-1/system.conf": No such file or directory
Jun 26 11:00:57 webserver dbus-send[8336]: Error org.freedesktop.DBus.Error.FileNotFound: Failed to open "/etc/dbus-1/system.conf": No such file or directory
Jun 26 11:00:57 webserver systemd[1]: dbus.service: Control process exited, code=exited status=1
Jun 26 11:00:57 webserver systemd[1]: Reload failed for D-Bus System Message Bus.
Jun 26 11:01:08 webserver systemd[1]: Reloading D-Bus System Message Bus.
Jun 26 11:01:08 webserver dbus[447]: [system] Unable to reload configuration: Failed to open "/etc/dbus-1/system.conf": No such file or directory
Jun 26 11:01:08 webserver dbus-send[8596]: Error org.freedesktop.DBus.Error.FileNotFound: Failed to open "/etc/dbus-1/system.conf": No such file or directory
Jun 26 11:01:08 webserver systemd[1]: dbus.service: Control process exited, code=exited status=1
Jun 26 11:01:08 webserver systemd[1]: Reload failed for D-Bus System Message Bus.
blocker

All 31 comments

I think the relevant commit on nixpkgs is 60b34849284db4aec8f4fdab722273096e1ed002

Best would be to bisect commits and see which one breaks the deployment.

I am also affected. This is very annoying on my remote web server, even if dbus seems to start correctly after failing.
I cannot reproduce the issue by reloading the dbus unit. It only occurs when nixos-rebuild reloads it...

This is the commit causing the issue.
Look at 68a4a6df3971d66aa988bba680351a30fbadbed3, it simply removed /etc/dbus-1/system.conf by removing the cp ... line and the following sed.

@wkennington, Your input would be much appreciated here. What do you think. Could it be that dbus fails on reload but not on (re)start ? Why does it only happen with nixos-rebuild switch/test ?

I've run a bisection and the culprits are indeed 60b3484 or 68a4a6d (same content).

BUT: The problem is in the upgrade path. release-16.03 uses dbus-1.8, while master uses dbus-1.10. dbus-1.10 runs fine without `/etc/dbus-1/system.conf1, dbus-1.8 does not. While upgrading a running system, the config files get changed but dbus does not get restarted. When I restart dbus manually (not just reloading), the problems goes away.

I have seen other bug reports where too eager restarts of dbus caused other sorts of problems. How do we proceed?

We only reload dbus since 1c39a47ac87959b2589ef797e519af96d73c27d6.

I'd like to be able to reproduce this issue, the steps are not very clear to me.

Should I deploy a plain 16.03 VM and then upgrade to nixos-unstable?

Yeah, upgrading a running 16.09 system to master should trigger the bug as described at the top.

(^^ I assume you meant 16.03)

Major OS upgrade without a reboot... I'm not sure it's efficient to spend much energy on that. Maybe we should mention in docs that for such large changes it's recommended to use nixos-rebuild boot?

I got the same switching from freshly booted 16.03 and upgrading to 16.09:

Sep 04 17:50:26 guava dbus[921]: [system] Unable to reload configuration: Failed to open "/etc/dbus-1/system.con
Sep 04 17:50:26 guava dbus-send[13285]: Error org.freedesktop.DBus.Error.FileNotFound: Failed to open "/etc/dbus

Pretty sure we'll get more reports like this once 16.09 channels are available.

system.conf is a dummy file, let's keep it there for at least one more release not to cause any troubles.

Testing partial revert of https://github.com/NixOS/nixpkgs/commit/68a4a6df3971d66aa988bba680351a30fbadbed3

So that won't do since dbus configuration changed:

Configuration file needs one or more <listen> elements giving addresses

I see following options:

  • check if we can generate the configuration that works with old dbus versions as well
  • change dbus to restart on package change
  • ignore reloading errors

@vcunat

Major OS upgrade without a reboot... I'm not sure it's efficient to spend much energy on that. Maybe we should mention in docs that for such large changes it's recommended to use nixos-rebuild boot?

Yes and no. We need not make the transition seamless, but could we at least detect such scenarios and add a warning when someone tries to switch between incompatible versions. Something like an "epoch" integer maybe ?

On related note, we have dbus-1.10.8 while latest upstream is 1.10.10

(dbus update staged in 741527a)

@vcunat NixOS upgrades have traditionally not required a reboot, let's not get into the habit of requiring that.

change dbus to restart on package change

This option isn't working very well. We had that before, and it was killing xfce4-session at least (and I see no indication of that being fixed).

It's better than a reboot. I think it's fair to expect restart of desktop
manager on stable upgrade.

On Mon, Sep 5, 2016, 18:50 VladimĂ­r ČunĂĄt [email protected] wrote:

change dbus to restart on package change

This option isn't working very well. We had that before, and it was
killing xfce4-session at least (and I see no indication of that being
fixed).

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nixpkgs/issues/16514#issuecomment-244785105,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAHtg18HhICvG3BXbjLS9OslOEXe4DGnks5qnEg_gaJpZM4I-iQB
.

IMHO it's not fair to expect killing X session on nixos-rebuild switch :-) but if it's only in specific cases (channel switching) and mentioned it in docs, I suppose it would be bearable.

We could give dbus a passthru.interfaceVersion attribute, like systemd, to tell switch-to-generation when dbus should be restarted rather than reloaded.

Do I understand correctly that during this issue dbus fails to reload its config but otherwise continues to work normally, as if it was neither reloaded nor restarted? That wouldn't seem a bad situation to me.

If the issue is purely the absense of /etc/dbus-1/system.conf, how about instead of generating /etc/dbus-1/system-local.conf we just generate system.conf directly and then patch $out/share/dbus-1/system.conf to include that instead?

Edit: Secondly, from the manual:

SIGHUP will cause the D-Bus daemon to PARTIALLY reload its configuration file and to flush its 
user/group information caches. Some configuration changes would require kicking all apps off 
the bus; so they will only take effect if you restart the daemon. Policy changes should take 
effect with SIGHUP.

The question is if there is a graceful way to restart all the affected services. Maybe we shouldn't restart but only reload?

I tried that, dbus will complain <listen> directive is missing. I'm not sure if configuration files are compatible.

That sounds unlikely. On an FHS system, the following 3 files would be exactly the same format:

  • /usr/share/dbus-1/system.conf
  • /etc/dbus-1/system.conf
  • /etc/dbus-1/system-local.conf

They are the same format, but the old dbus would be using the new configuration files (that's what happens on reload).

Alright, so another idea then.

The --system flags is equivalent to --config-file=/etc/dbus-1/system.conf. How about not using --system and instead use the full path to the config inside the correct derivation?

I think I've figured out the listen problem.

I'm using FHS paths here to hopefully make it clearer.

dbus-daemon --system will make the daemon read /usr/share/dbus-1/system.conf which has a line to include /etc/dbus-1/system-local.conf for overriding settings, so -local.conf only has the overridden config.

If you create /etc/dbus-1/system.conf that will replace /usr/share/dbus-1/system.conf, so if we move -local to system.conf that will only contain a subset of settings and thus fail.

I think I know how to fix this. I'll get back shortly.

I have an additional commit on top of the work done in #18382 that makes dbus use absolute paths:
commit

I haven't tested out the restart logic during upgrades yet as I'm generating the VM from the outside using nixos-rebuild build-vm, but I'll give that a go during the weekend.

@peterhoeg did you make any additional efforts on this issue?

@domenkozar, this is how I see it:

1) the patch to the dbus package to install the unit files is safe and should be done no matter what IMHO

2) changing to socket activation is safe as well:
a) people using machines without a need for dbus will not see it running
b) users needing dbus will now have it guaranteed running without any odd x session startup scripts trying to detect and launch it

3) restarting dbus (socket or no socket-activation) is a bad idea no matter what. dbus itself will come up just fine but dbus spawned services will not restart.

4) the patch to the dbus services so they use absolute names to configuration files instead of using the --system and --session argument is safe as well.

So that leaves the case of the upgrade and the odd error due to /etc/dbus-1/system.conf. I haven't been able to replicate it with item 4 applied, but no guarantees from here.

I think, that we should no matter what do 1 and 4. They are self-contained without sideeffects.

Item 2 can be hidden behind a configuration option that defaults to the old behaviour if you want.

Item 3 cannot be addressed the way I see it.

If you want me to split up this PR into multiple PRs ("safe" and non-safe) I can do that Friday (UTC+8 here).

@peterhoeg where is the patch for 4), I'd like to test if it really fixes the odd error described in this issue.

@domenkozar , this PR: #18777

I've merged that into 16.09, so the issue is fixed. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

teto picture teto  Â·  3Comments

domenkozar picture domenkozar  Â·  3Comments

ghost picture ghost  Â·  3Comments

spacekitteh picture spacekitteh  Â·  3Comments

grahamc picture grahamc  Â·  3Comments