On a semi-clean install (where no stateVersion is set since upgrade is immediate and no state is expected/important), systemd-timesyncd fails due to https://github.com/NixOS/nixpkgs/pull/61321
Other reports: https://github.com/NixOS/nixpkgs/issues/31540#issuecomment-506936820, https://github.com/NixOS/nixpkgs/issues/63688
https://github.com/NixOS/nixpkgs/pull/61321 includes a fix that is only run on stateVersion < 19.09, but most installs with no user-generated state do not have stateVersion
set, so another solution might be needed.
Any reason to NOT set the state version? If the Filesystem would be really stateless the problem should not be there. I initially added that check since I did not want to keep those lines in all activation scripts until the sun erupts. Seeing that a few people are running without a stateVersion set is surprising.
The state version is not set by default, it's not obvious that it should be set when instantly upgrading an image from nixos-19.03 to nixos-unstable.
The standard installer sets it to the current stable release. Which kind of
images are you referring to?
On Tue, 16 Jul 2019, 19:35 Yorick, notifications@github.com wrote:
The state version is not set by default, it's not obvious that it should
be set when instantly upgrading an image from nixos-19.03 to nixos-unstable.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nixpkgs/issues/64922?email_source=notifications&email_token=AAE365HNB45YOF4MCNCFE2LP7YBGBA5CNFSM4IEC2A5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2BS4XI#issuecomment-511913565,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAE365CSBPHS7PQ2ETTC2XLP7YBGBANCNFSM4IEC2A5A
.
Well, there are 9 reports so far. It seems some people are running the .iso. I'm referring to the packet.com nixos-19.03, which I think are installed using nixos-install, and do not include a state version.
I remembered Packet.com's installers configuring the stateVersion, so I provisioned an instance to check. Indeed, it does:
[grahamc@test:~]$ cat /etc/nixos/configuration.nix
# Edit this configuration file to define what should be installed on
# your system. Help is available in the configuration.nix(5) man page
# and in the NixOS manual (accessible by running ‘nixos-help’).
{ config, pkgs, ... }:
{
imports =
[ # Include the results of the hardware scan.
./hardware-configuration.nix
./packet.nix
];
# Use the GRUB 2 boot loader.
boot.loader.grub.enable = true;
boot.loader.grub.version = 2;
# boot.loader.grub.efiSupport = true;
# boot.loader.grub.efiInstallAsRemovable = true;
# boot.loader.efi.efiSysMountPoint = "/boot/efi";
# Define on which hard drive you want to install Grub.
# boot.loader.grub.device = "/dev/sda"; # or "nodev" for efi only
[ snip a bunch of commented stuff ]
# This value determines the NixOS release with which your system is to be
# compatible, in order to avoid breaking some software such as database
# servers. You should change this only after NixOS release notes say you
# should.
system.stateVersion = "19.03"; # Did you read the comment?
}
Well, there are 9 reports so far. It seems some people are running the .iso.
I am not questioning the relevance of this bug. I am just trying to figure why nobody I talked to so far has ever done that. With the .iso
you are refering to the ISOs from the website? nixos-generate-config
definitely adds a stateVersion
. My main concern right now is that I do not see how people end up in that situation when following the standard installation procedures so we can properly test it now and in the future.
Maybe state version should be mandatory. I might still have set it to 19.09, since I never considered some timesyncd symlink "state" (and generally expect state version issues to result in missing data, not failing units).
My config is missing stateVersion
, and I believe that's because it was generated way too long ago (before installer was generating it)... though I don't think that would be the most common reason nowadays.
@andir We're discussing making stateVersion mandatory in above PR, but it would be great if this wouldn't be necessary. In particular the changes you made in https://github.com/NixOS/nixpkgs/pull/61321/commits/024a383d64036dab02157927369ca680427aa61d don't seem like they need stateVersion
as mentioned by @oxij in https://github.com/NixOS/nixpkgs/pull/65314#issuecomment-520118767. stateVersion
should only really be used for things that have to be migrated manually. In this case I'd think this can be done automatically, which would fix this problem for a lot of people that for some reason have a messed up stateVersion
(which is not optimal, but that's the current state). Another report of this problem just rolled in on IRC.
How does the automatic migration fit in with the rollback story of NixOS?
Good point @grahamc, maybe automatic migration shouldn't be done after all because of this, meaning stateVersion
might be needed after all for this..
in the absence of stateVersion
, setting it to 19.03
should be enough to deal with this, however, an in-place NixOS install from another distribution will cause the generated configuration file to use whatever stateVersion
currently set on nixpkgs, causing this error until it is adjusted to use the latest release stateVersion
.
@grahamc systemd will automatically nonprivate -> private. They recently added code to migrate private -> nonprivate. https://github.com/systemd/systemd/issues/12131. Maybe we should backport those patches and scrap the stateVersion migration thing.
I am happy that there is now a patch and backporting that seems like the best option we have right now. The patch is also not that complicated but I didn't check if any previous patches are a prerequisite for it.
I also think that stateVersion should be mandatory but nothing we should discuss here now :-)
Does this migration mean a rollback won't be smooth anymore back to generations without it? Or can systemd handle it? Because if not this will cause some trouble for almost every NixOS user wanting to do such a rollback
systemd can roll back this migration, according to their issue about this
Could we automatically perform the migration if stateVersion is unset? As long as it doesn't error if you are already migrated, I think this should be safe.
Nevermind, I understand why this is necessary. stateVersion has a default to the current NixOS version when unset, so we have no way to distinguish between "stateVersion = "19.09";" and "unset stateVersion". Letting systemd handle this is much better.
The patch looks like just https://github.com/systemd/systemd/commit/5c6d40d13238e56699074fc1b01f4ac929ba62b8 ?
... we have no way to distinguish between "stateVersion = "19.09";" and "unset stateVersion".
There, kinda, is. See highestPrio
use in 1f0b6922d3c2de3da235a7075d0d3cb9255b7cd7.
The upstream patch is in systemd version 243. The PR for that just landed in staging #68096.
Would be nice to know if it works for your scenarios (that do not set a stateVersion
).
@yorickvP is this still an issue now that we have the migration patch on our systemd branch?
I would assume not. We could remove the workaround in that case.
Apparently people are still reporting this issue: https://github.com/NixOS/nixpkgs/issues/69258#issuecomment-534030279
The fix should be in our systemd, anyone have any idea why it's not running?
@yorickvP if you need any particular information on this, I can reproduce it on every nixos-build switch
I do on my system currently. I haven't dug into the cause due to lack of time though.
@alunduil The easiest method would probably stepping through systemd in gdb to see why it's not triggering the migration code, but it would take time to do.
@yorickvP that doesn't sound undoable, but I would need some instructions on how to do that (it's been almost a decade since I even looked at gdb). Is there an easy way to tell if I have debugging symbols so I can find the code in consideration quickly?
Another thing you could do is set the SYSTEMD_LOG_LEVEL=debug
environment variable on the timesyncd setvice. (I might have made a typo of the env var. Im on mobile). It will then log more and journald will have code line information for each log entry. This might help seeing what's going wrong if systemd logs sufficiently
@arianvp adding that logging didn't seem to get anything extra in the journal but I do have a paste with my current output (all of it so it's a bit long): https://gist.github.com/alunduil/8c3a8ad7ccc679fd12c20171377ddf54
If I can figure out the gdb debugging process reasonably well, I'll try that next but I will be pretty busy until next week.
Removing from 19.09 milestone since it's not a blocker for the release anymore.
For anyone else who hits this, the steps to recover are:
system.stateVersion = 19.03
sudo nixos-rebuild switch
system.stateVersion = 19.03
sudo nixos-rebuild switch
You should see no more units listed in systemctl --failed
.
With these instructions available, I'm comfortable marking this issue resolved, but I'll leave it up to others that are watching this issue.
Do leave stateVersion
set however! Not setting it is what caused the original issue, and might very well cause future ones too.
I removed mine and subsequent updates worked as expected. If I need to permanently set the stateVersion then I'm worried about moving forward with the distribution. Is it assumed that when a new version is released I have to completely reinstall?
No need to fear getting stuck, as almost nothing changes based on it. Feel free to git grep stateVersion
yourself. EDIT: note that Nix* is trying relatively hard to minimize any state.
Is there a set of best practices for stateVersion? Specifically, it'd be nice to know if it should always be set and should be part of the release channel update process. I haven't seen this recommendation in the past, but I'll look at the current upgrade documentation. It sounds like clarifying the use of this variable is the missing piece of this and related issues.
It's generated into your (first) configuration according to the current release, like this:
{
# This value determines the NixOS release from which the default
# settings for stateful data, like file locations and database versions
# on your system were taken. It‘s perfectly fine and recommended to leave
# this value at the release version of the first install of this system.
# Before changing this value read the documentation for this option
# (e.g. man configuration.nix or on https://nixos.org/nixos/options.html).
system.stateVersion = "20.03"; # Did you read the comment?
}
My general installation practice writes over /etc/nixos/configuration.nix with a git checkout so, yes, I did miss this. Could we add this to the description on NixOS options? In that description it says this doesn't need to stick around and can be deleted (at least that's how I read it). This is also getting off-topic so let me know if another issue or other media would be appropriate.
I run in the issue at the end of nixos-rebuild switch
. Putting system.stateVersion = "20.03";
into /etc/nixos/configuration.nix
doesn't help.
nixos-version
20.03.2648.69af91469be (Markhor)
@p-alik if you originally installed 19.09 or earlier, use "19.09" for stateVersion.
@lheckemann, my system was on 20.03 since April, but I run in the issue for the first time yesterday.
As suggested in https://github.com/NixOS/nixpkgs/issues/31540#issuecomment-598838035 I've solved the issue by rm -f /var/lib/systemd/timesync
Most helpful comment
For anyone else who hits this, the steps to recover are:
system.stateVersion = 19.03
sudo nixos-rebuild switch
system.stateVersion = 19.03
sudo nixos-rebuild switch
You should see no more units listed in
systemctl --failed
.With these instructions available, I'm comfortable marking this issue resolved, but I'll leave it up to others that are watching this issue.