I have a nixpkgs branch where I have made a series of commits to upgrade Pharo from version 5 to version 6 (#26924). Is it possible to merge this branch directly onto master or does it need to be rebased first?
I am reluctant to rebase because this is a public branch in the sense that I have published it in the open and made it available for other people to merge. If I rebase the branch then I run the risk of creating tricky conflicts for anybody else who has merged this branch (and indeed for myself on other branches where I have merged this change.)
Generally looking for a Git workflow that allows me to merge topic branches before they have landed on master, and without getting burned by rebases rewriting the commit IDs after I have already merged.
IMHO rebasing is not necessary (or even desirable for longer-lived branches, since it can obscure the real history). For example, we don't rebase the staging branch either.
Hey @lukego, good question.
Is it possible to merge this branch directly onto master or does it need to be rebased first?
The important and tricky thing, is that our contribution guidelines ask that 1 package upgrade = 1 commit. Looking at this specific history, in particular I'd expect to see the following commits (if not all of them) squashed in to the one "pharo: 5.0 -> 6.0" commit:
pharo6: Minor fixes and cleanups
pharo: Removed obsolete duplicate ofwrapper.sh
pharo: More quoting
pharo: Add missing file: vms.nix
That said, the important operation here isn't necessarily a rebase, but a squash.
If I rebase the branch then I run the risk of creating tricky conflicts for anybody else who has merged this branch (and indeed for myself on other branches where I have merged this change.)
Generally looking for a Git workflow that allows me to merge topic branches before they have landed on master, and without getting burned by rebases rewriting the commit IDs after I have already merged.
I'm not certain how to handle this, but it certainly seems at least somewhat at odds with our contribution guidelines of 1 update being 1 commit :( . Another important way it is also at odds with NixOS in general is if somehow your branch gets out of sync from nixpkgs near the "core" -- stdenv, the binary cache won't be effective for you anymore.
Personally, I rebase my public PRs constantly, responding to feedback and improving them mostly 'in place'. That said, I have no interest in maintaining my own branch for my own use, nor do I consider my branches long running in any way. I would be surprised if anyone was pulling my branches for incorporation in to their own branch.
I consider PRs to be more like sending patches over email: they don't exist in a "long running branch" because they're just in email. They're only fixed in time and immutable once they get pulled by the maintainer and incorporated in to an upstream long running branch. It is just (IMO) a bit of an accident that GitHub made all our emails out on the public stage of public branches :)
One option here would be I can do the squashing of your commits and push the single change to master. It will then say "Author: lukego, Commiter: grahamc". I'm not sure that solves anything but your own branch not being rebased, though.
What do you think?
Taking a big step back for a moment if I may...
I’d say the fundamental question is whether nixpkgs uses a _distributed_ development model or require a _centralized_ one.
In the distributed model every branch is first-class and can merge with any other branch. This is what Linus advocated for in his Google talk on Git and this is the way Linux works. If you want to work this way then you need to be careful to follow special rules to preserve history in the form that Git needs to perform merges effectively. This can be summarised as “never squash or rebase a commit that has been made available to merge.” (You can still squash/rebase changes that you have not published, or that you have published with a suitable “don’t merge this branch yet” disclaimer.)
In the centralised model the master branch (and a few others) is special. Commits are considered as ephemeral drafts until they land on master, at which point they become permanent and available to merge. This means you can freely squash/rebase any branch that has not been merged by master - it’s only a draft. It also makes master into a bottleneck where changes have to land before they can be more widely adopted.
That’s all to a first approximation. There is a bit of wiggle room in each model, but basically bad things happen if you try to do both at the same time e.g. tricky merge conflicts where Git has lost track of which changes should take precedence.
So - I’d say the big question is whether nixpkgs is using the distributed or the centralised model. The squash/rebase policy should follow from this decision.
I have no interest in maintaining my own branch for my own use, nor do I consider my branches long running in any way.
I'm in the opposite situation. I want to have my own "production" branch that is conservative about merging "background" changes (e.g. pulling from nixpkgs master) but is eager to merge especially relevant changes (e.g. bleeding edge packages for software that I am packaging myself.)
This is simple in the distributed model. I make my changes on topic branches based on the latest stable release of nixpkgs (common ancestor of both master and my own production branch) and then I both PR that upstream and, without waiting, merge it directly into my own production branch. Later the change will land on master, and I will pull that, and Git will take the commit-id into account when making the merge (important if there have been follow-on changes on master and/or my branch.)
This is hard in the centralized model though. The difference is that when I pull my own change from master it will have a new Git history and Git will have no idea how to resolve any differences that it finds. So I will have to manually resolve any conflict somehow which is time consuming and error prone.
So, it's not my decision to make, but from my perspective if I can't trust my commit IDs to be stable then it is a disincentive to push changes upstream because the benefit of sharing development with other people is offset but the challenge of keeping in sync with their changes.
_unfortunately I can't provide much of a reply at this time, but wanted to make a very brief suggestion_
Have you looked at the git cherry-pick based workflow? I believe it is designed for this sort of situation.
I consider PRs to be more like sending patches over email: they don't exist in a "long running branch" because they're just in email. They're only fixed in time and immutable once they get pulled by the maintainer and incorporated in to an upstream long running branch. It is just (IMO) a bit of an accident that GitHub made all our emails out on the public stage of public branches :)
Thanks for clarifying this. I read too fast the first time and did not take on board what you were saying.
This is the centralized workflow. The upstream master branch is special, its history is the one that counts, and all other histories are ephemeral. This implies that people should only merge other branches, like my pharo6 branch, if they are prepared to manually resolve any conflicts with the version that will later arrive via master. (If people are not prepared to deal with conflicts manually then they will need to wait for changes to propagate via master so that they have acquired a stable history that git can use to merge automatically.)
The Git DMZ Flow makes an interesting contrast if you are feeling philosophical.
As another datapoint, I keep a list of commits that are cherry-picked into a special branch which is created by branching off the relevant channel (stable for servers, unstable for workstations). Those special production branches are regenerated regularly.
It's a very simple shell script but it does the trick.
Thank you for your contributions.
This has been automatically marked as stale because it has had no activity for 180 days.
If this is still important to you, we ask that you leave a comment below. Your comment can be as simple as "still important to me". This lets people see that at least one person still cares about this. Someone will have to do this at most twice a year if there is no other activity.
Here are suggestions that might help resolve this more quickly:
Most helpful comment
Taking a big step back for a moment if I may...
I’d say the fundamental question is whether nixpkgs uses a _distributed_ development model or require a _centralized_ one.
In the distributed model every branch is first-class and can merge with any other branch. This is what Linus advocated for in his Google talk on Git and this is the way Linux works. If you want to work this way then you need to be careful to follow special rules to preserve history in the form that Git needs to perform merges effectively. This can be summarised as “never squash or rebase a commit that has been made available to merge.” (You can still squash/rebase changes that you have not published, or that you have published with a suitable “don’t merge this branch yet” disclaimer.)
In the centralised model the master branch (and a few others) is special. Commits are considered as ephemeral drafts until they land on master, at which point they become permanent and available to merge. This means you can freely squash/rebase any branch that has not been merged by master - it’s only a draft. It also makes master into a bottleneck where changes have to land before they can be more widely adopted.
That’s all to a first approximation. There is a bit of wiggle room in each model, but basically bad things happen if you try to do both at the same time e.g. tricky merge conflicts where Git has lost track of which changes should take precedence.
So - I’d say the big question is whether nixpkgs is using the distributed or the centralised model. The squash/rebase policy should follow from this decision.