Describe the bug
Having managed two NixOS (the second almost complete) I have decided that it's basically going to be unsustainable to not have a set of criteria that release managers can verify. Validating the release is ready is currently being done from feeling, and that's not going to be enough to offer a consistent experience. Releasing NixOS should be clear and straightforward task every release.
The idea is to make a checklist of criteria that are blocking for the release, and also a process to identify a bug or issues as blocking (that needs to be handled outside of this issue). We then confirm this by making a GO / NO-GO meeting where we discuss if the criteria for the release is being met. And even more ideal would be that a QA team is formed that will discuss their findings with the Release team and developers.
What work has been done already
For 20.03 and 20.09 there has (the 20.09 one is tomorrow) been GO / NO-GO meetings for the final milestone and this was already a great improvement. I've also drafted a WIP criteria which I will circulate to future RMs to expand.
What inspired a lot of this
A majority of this is from my observation of the process Fedora uses to release. When they do something people know about it and this is a very encouraging observation. Fedora makes two releases a year, similar to us, but they maintain them for a longer time for desktop user convenience. A good amount of their criteria for releasing a desktop has been included in the WIP doc, with the proper edits and additions.
What I need from you
I need to start discussion about what needs to be a part of the criteria for releasing NixOS.
Two people just cannot figure out the breadth of the communities needs without a discussion :grin:
We can also open a good discussion about what has been missing from NixOS releases for a while.
One that has bugged me is artwork. The default wallpaper has been the same in desktops for way more than two releases.
New wallpapers for the release would make having a mascot a lot more meaningful then just a new logo that isn't available in the OS.
cc @jonringer
I agree, it definitely feels like "what could be an issue" when trying to think of when we can release. Having a set criteria would go a long way for the community to align toward similar goals. For ZHF, just saying "hey, look at failing hydra jobs, we'll try to get as many as we can" isn't the most focused stabilization effort; also there's no prioritization of efforts. Are you fixing a package that no one uses, or are you fixing something that affects a lot of users? We don't know.
Maybe we should introduce some package tiers. So we prioritize packages to fix. An interesting metric to determine this might be how many maintainers there for a given package. If you want to see a package maintained, then you should add yourself as a maintainer. Maybe also expand the "maintainer teams" concept to include a "core" team, so that some critical packages which are kind of "owned" by the community still have representation in this model.
Some "difficulties" I've had so far (These are semi-off topic, but impacted the ability review PRs or gain momentum for ZHF):
staging-next
is "stable-enough"Ofborg has no darwin builder for weeks now: https://github.com/NixOS/ofborg/issues/529#issuecomment-690913092; github actions might be an alternative. (It's a discussion for a different thread, I believe.)
I need to start discussion about what needs to be a part of the criteria for releasing NixOS.
Overall, my opinion on these is perhaps disappointing. If noone fixes a thing, it won't work, regardless of whether we somehow defined it as a priority.
We have meta.maintainers
who make a promise to keep a package in shape (if I simplify it); I do believe that includes QA. They should get notified when the package has problems or that a release is coming to put in extra effort, though the release schedule has been very predictable for years. The field easily supports teams etc. (used e.g. for gnome IIRC) If the maintainers don't keep up, we most likely don't want a one-off fix but to find new/additional maintainer(s); we could have some announcement workflow for such occasions.
Perhaps we should somehow refine this maintainership concept of ours? We surely don't want "release criteria" to hold only during the moment a release is announced.
As for tiers, we already do have a higher tier: the channel-critical set of builds/tests. (Note: even tests support the maintainers
field). Some people notice that it's relatively small. My take: if the respective maintainers (promise to) keep a package/test in very good shape and fix blockers _quickly_, by all means make it channel-blocking (perhaps except some very special cases). The -small channels could be thought of as an even higher tier, though they have a rather specific focus: quick propagation of security fixes to servers. In case the maintainers can't keep this promise, after some time there will obviously be no choice but to demote the package and keep the channel going, but that's just how it is.
I guess one important criterium should be the tracking of increasing closure bloat early in the release branch-off. As it has been a matter of contention recently. https://github.com/NixOS/nixpkgs/issues/98094
I would suggest tracking the size increase of the installer image; but in this specific case that wouldn't suffice as this specific issue doesn't cause closure bloat for NixOS, but only for users of nix in standalone mode.
There are closure "tests" which don't have any hard limit IIRC but you can have a look at development of the value over time, e.g.: https://hydra.nixos.org/job/nixos/trunk-combined/nixos.closures.smallContainer.x86_64-linux#tabs-charts (these take long time to load) I think they're handy when you need to find what caused a closure size regression.
I made a PR to check closure sizes in VM tests. This should help to catch unexpected closure size increases by turning them into a build-time error.
I'm reminded of some old discussions about evaluation speed (#79943, #57477), the role of flakes, and the possibility of splitting nixpkgs into several semi-independently maintained flakes. I can imagine a world where nixpkgs contains only the core essential packages and modules, and users pull in flakes for the rest of their packages and modules. In this world, the release criteria could be actually zero hydra failures in nixpkgs.
I'm not saying this is necessarily the best solution, or that it is a feasible short-term goal, but it is food for thought. For example, perhaps we should have a set of "release" hydra jobs. NixOS isn't ready for release until the release jobs pass, and other jobs are just a bonus.
Lets move that discussion elsewhere. It seems off-topic for this issue. I also do not see how flakes would fix closure-bloat. It's probably more likely to accidentally introduce it than to get rid of it. Monorepos make getting rid of closure bloat easier as you have one single source of truth. One version of a package; whilst with flakes you have potentially endless sources of truth.
Most helpful comment
I made a PR to check closure sizes in VM tests. This should help to catch unexpected closure size increases by turning them into a build-time error.