Git: Rebase drops commit that was previously part of upstream, but no longer is

Created on 1 Mar 2017  路  27Comments  路  Source: git-for-windows/git

  • [x] I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
$ git --version --build-options
git version 2.10.2.windows.1.895.g8126884
sizeof-long: 4
machine: x86_64
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.14393]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Path Option: BashOnly
SSH Option: OpenSSH
CRLF Option: CRLFCommitAsIs
Bash Terminal Option: ConHost
Performance Tweaks FSCache: Enabled
Enable Symlinks: Disabled
  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

Nothing I can think of.

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

Reproducible regardless of where it's being run from.

git checkout -b parent
touch child.txt
git add child.txt
git commit -m'Intended for child'
git checkout -b child --track
git checkout parent
git reset --hard HEAD^
touch parent.txt
git add parent.txt
git commit -m'Intended for parent'
git checkout child
git rebase
git log --oneline --decorate -2
  • What did you expect to occur after running these commands?

Both 'parent.txt' and 'child.txt' should exist. Output from the last log command should be:

<commit hash> (HEAD -> child) Intended for child
<commit hash> (parent) Intended for parent
  • What actually happened instead?

Only parent.txt exists. Output from the last log command begins with:

<commit hash> (HEAD -> child, parent) Intended for parent
  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

Reproducible with any repository.

git-upstream question

Most helpful comment

OK, so a bit more investigation:
[Note that I have always used the full cli params with rebase, so this is a bit of digging]

The problem would appear to be the --fork-point option which, to quote, says:

If either <upstream> or --root is given on the command line, then the default is --no-fork-point, otherwise the default is --fork-point.

So in this case you are operating with --fork-point. "fork_point is the result of git merge-base --fork-point <upstream> <branch>".

..does not just look for the common ancestor of the two commits, but also takes into account the reflog of to see if the history leading to forked from an earlier incarnation of the branch (see discussion on this mode..

With that in place I believe you find the action seen. The fork point is the tip of child itself, so there is nothing to move. QED.

When I run git rebase --no-fork-point -i, I see

pick aad734c Intended for child

# Rebase c1b6e62..aad734c onto c1b6e62 (1 command(s))

which is what you expected.
One learn's something new everyday.

All 27 comments

I tried the rebase with -i to see what it said. I had also started a gitk & session along side to see how it progressed. The repo creation felt a bit hacky with the --tracked and other swaps, which meant a few information messages popped up that made me wonder if there was finger trouble.

The rebase -i gave a noop as the action in the todo list, rather than a pick of the commit.
# Rebase 4b14903..a897d5f onto 4b14903 (1 command(s)

4b14903 is 'intended for parent'
a897d5f is 'intended for child'

So it looks like git doesn't spot that it needs another path (of 'child.txt') for the common empty file (i.e. same blob id) when doing the creation of the todo list, and onwards to rebasing that extra commit.

The example is a rather contrived one, just for purposes of providing a MCVE, but the same thing occurs with any kind of commit in place of "intended for parent" and "intended for child". I.e. it doesn't have to be a file addition, nor does it need to touch the same files/directories. The workflow I keep hitting this with is:

  • The branch I've got checked out is a remote-tracking branch, the main shared branch for my team
  • I look at a problem and start some experimental hacking
  • I commit a few things forgetting I never created a topic branch
  • Later I realize I need a topic branch, create one, then roll the remote-tracking branch back to where it was before my changes
  • I pull others' changes from upstream, then rebase my topic branch on top => boom, the changes I initially committed on the remote-tracking branch go missing.

It also doesn't have to be just one commit. Just before submitting this bug, I actually had 3 consecutive commits disappear in this manner. (Easy to resurrect using the reflog, of course, once you notice something's missing.) Don't ask me why it took me 3 commits to realize I'm on the wrong branch though :)

I think this is by design. Details are in git help rebase: RECOVERING FROM UPSTREAM REBASE section and rebase.missingCommitsCheck configuration option.

@radrik5: isn't it the same noop issue as I had?

@webmaster33, no, it isn't. Your issue is not reproducible and you said it depends on version of Git Extensions.

@webmaster33, no, it isn't. Your issue is not reproducible and you said it depends on version of Git Extensions.
@radrik5, It might be better to say that there are similarities, and that we now have a test case that can be used to expand on the issue. Maybe @webmaster33 now has a better idea of the hidden aspects (at that time) that may allow a reproducible MVCE to be generated there.

I had the view that the upstream was set the right way, that the child branch was being rebased onto it's usptream, which is what rebase should be doing (IIUC for the empty parameter case). (not had time to re-check - life got in the way;-)

It maybe that the manual is unclear, etc. If the code is ideally correct then the presence of two separate reports implies the doc's could be that bit clearer.

OK, so a bit more investigation:
[Note that I have always used the full cli params with rebase, so this is a bit of digging]

The problem would appear to be the --fork-point option which, to quote, says:

If either <upstream> or --root is given on the command line, then the default is --no-fork-point, otherwise the default is --fork-point.

So in this case you are operating with --fork-point. "fork_point is the result of git merge-base --fork-point <upstream> <branch>".

..does not just look for the common ancestor of the two commits, but also takes into account the reflog of to see if the history leading to forked from an earlier incarnation of the branch (see discussion on this mode..

With that in place I believe you find the action seen. The fork point is the tip of child itself, so there is nothing to move. QED.

When I run git rebase --no-fork-point -i, I see

pick aad734c Intended for child

# Rebase c1b6e62..aad734c onto c1b6e62 (1 command(s))

which is what you expected.
One learn's something new everyday.

@PhilipOakley fantastic, --fork-point is exactly what is happening. My hunch was that there must be some kind of cache somewhere that doesn't get invalidated when I reset the parent branch. Turns out the "cache" is the reflog - makes perfect sense.

This gives me an easy workaround, easier than cherry-picking from the reflog: just use --no-fork-point.

All that being said, I'm not quite sure the behaviour is ideal. Looking at the full output from the last few commands in the MCVE:

$ git checkout child
Switched to branch 'child'
Your branch and 'parent' have diverged,
and have 1 and 1 different commits each, respectively.

$ git rebase
First, rewinding head to replay your work on top of it...

... and applying nothing - I find that very counter-intuitive, and especially because that's not how things worked before I switched to Git for Windows. (I used cygwin-git before, which is quite a bit older than the latest git version - maybe it just hadn't caught up to where --fork-point was introduced yet? Just guessing here.)

Do we know why --fork-point is the default (in any of the cases)? It sounds to me as if --no-fork-point should always be the default, but maybe I just can't think of a scenario where defaulting to --fork-point makes better sense.

Having checked the documentation of git merge-base --fork-point including the discussion section at the bottom, it sounds like --fork-point does "the right thing" when you rewrite upstream history in such a way that the changes committed between the new merge base and the fork point are still part of the parent branch, except the commits themselves are no longer identical (e.g. due to conflict resolution or fixups).

I can't really guess whether that's more common than the workflow presented here, but I would argue that rewriting history like that is certainly more advanced than just resetting a branch to an earlier commit, and therefore always having --no-fork-point as the default makes more sense to me. (It's also easier to explain in the docs.)

If that's no longer an option (e.g. because of backwards compatibility reasons) then some configurability would be nice - something like rebase.useForkPoint which people like me could set to "never".

I think the divergence point was the git checkout -b child --track command which is after that commit that is for the child rebase was created. (with similar issues for the other cases where one has already created commits on the wrong branch...)

The other option to coding changes, is improved documentation, but then this is Git for Linux where code rules :wink:

I think the divergence point was the git checkout -b child --track command which is after that commit that is for the child rebase was created.

For clarification, could you paste the output of git log --graph --oneline --left-right --decorate child...master?

I'm not at my regular machine. but the (at the rebase point) graph is very simple.

m -- p
 \
  -- c

master, parent, child commits

The labelling of the files doesn't help - child.txt is added onto the original parent branch, which is later reset hard, but the child branch is created just before that, tracking that pre-reset point of parent (where child exists, [see reflog?]). Eventually we ask rebase (with --fork-point implicitly enabled) to bring across any newer stuff onto the updated parent. But (a) parent had been reset hard losing child.txt, and (b) child has nothing new since the fork-point [as I recall].

The historic (backward compatibility) semantics associated with the pre-remote era is, I think, catching folks.
[reflog: I'm expecting that the reflog points at a sha1 that could actually be garbage collected!

As I said earlier, the example is contrived in an effort to make it minimal. The general scenario is (with a graph after each step):

  • Perform any number of git commits
O - A - B - C (master)
  • git checkout -b topic --track
O - A - B - C (master, topic)
  • git checkout master; git reset --hard origin/master
O (master)
 \
  A - B - C (topic)
  • Optionally perform further commits on the topic branch
O (master)
 \
  A - B - C - D - E - F (topic)
  • Optionally pull in changes to master from upstream
O - P - Q - R (master)
 \
  A - B - C - D - E - F (topic)
  • Rebase topic without specifying any parameters to rebase (so it defaults to --fork-point) or just do it via git pull --rebase:
O - P - Q - R (master)
             \
              D' - E' - F' (topic)

A-B-C are gone because --fork-point identifies in the reflog that topic was created at C, and assumes that everything before C is still part of topic's upstream even if the exact commits are no longer present there. (Which is not the case, although admittedly it could be if upstream history was rewritten in a non-trivial way.)

Looks like there has been some discussion about the same problem on StackOverflow here and here.

@kisslas sorry, I was unclear: I am knee deep in other issues so I can only use half a brain on this ticket, so I was actually asking for a simpler graph, not several more complicated ones... :sweat_smile:

[the] graph is very simple.

` m -- p \ -- c

master, parent, child commits

@PhilipOakley: thanks for your clarification, but that is not the entire truth... the entire truth would be:

  a - m
   \
     m' - c

where m' has been a previous state of m, and is therefore omitted from the rebase (rationale: it had been part of that branch, but somebody then decided that it should not be part of it).

Yes, I fear that this is working as designed...

@dscho: no, that graph is the entire truth, and that's exactly the problem. In fact you can even simplify it down to

O (master)
 \
  C (topic)

and run a git rebase only to see C disappear. Just run this, assuming you're on master:

git commit --allow-empty -m'This will disappear'
git checkout -b topic --track
git checkout master
git reset --hard HEAD~
git checkout topic
git rebase

@PhilipOakley's digging has established that this is working as designed - my understanding is that we are now discussing whether the design is correct.

@PhilipOakley's digging has established that this is working as designed - my understanding is that we are now discussing whether the design is correct.

I'd say we want to be discussing what needs to be done to the _documentation_ to make this effect clearer / easier to understand.

The only finesse I'd add for coding would be for the -i / --interactive mode where I'd try (if I had time) to improve the info message in the todo script to say that the --fork-point had used the reflog to determine the right base (to give the reader fair warning, should they read the message)

no, that graph is the entire truth, and that's exactly the problem.

As per @PhilipOakley's analysis, there is a fork point (in the reflog), and that is also part of the entire truth. So that graph you showed was missing a crucial detail.

my understanding is that we are now discussing whether the design is correct.

If you want to discuss that design, you need to move the discussion to the Git mailing list.

The only finesse I'd add for coding would be for the -i / --interactive mode where I'd try (if I had time) to improve the info message in the todo script to say that the --fork-point had used the reflog to determine the right base (to give the reader fair warning, should they read the message)

That may be easier once the rebase-i-extra branch hits upstream Git. I am looking in particular at this commit: https://github.com/dscho/git/commit/5053647e2824c67e22d00c93581f0b3caa823cca

I tried to explain this issue in this other issue: as you can see I noticed this issue as well.

1072 However this issue has also explained some other things. In that case I can separate the other bug that you explained here from that issue.

As per @PhilipOakley's analysis, there is a fork point (in the reflog), and that is also part of the entire truth. So that graph you showed was missing a crucial detail.

Hmm, quite true. But again, that's exactly the (design) problem: the fork point is not represented in the graph, and the fact that something in the reflog should be crucial violates my mental model. In my ~5 years of using git every day, I've never had to consider any information other than the commits themselves and their parent-child relationships to work out what a command I'm about to run will do. Therefore the fact that in this one unusual case git relies on the reflog (which I've always viewed as auxiliary information for purposes of mining history or recovering from mistakes, and until now, rightly so) to automagically do something I don't want is confusing.

(Note although I'm saying "I" I'm no longer really talking about myself, but most other users; I'm pretty sure I'll never fall into this trap again, and I'll notice if I do.)

Anyway...

If you want to discuss that design, you need to move the discussion to the Git mailing list.

Thanks for the pointer, that's what I shall do.

If you want to discuss that design, you need to move the discussion to the Git mailing list.

Thanks for the pointer, that's what I shall do.

Perfect.

I tried to explain this issue in this other issue: as you can see I noticed this issue as well.

1072

No, that report is about an entirely different issue: in #1072, you report that a combination of interactive GPG failing and rebase -i then incorrectly suggesting to amend the commit squashes everything into a single commit, while here, in #1076, the issue is that a local reflog has unintuitive consequences on the interactive rebase.

Here's a link to the email sent to the Git mailing list in case anyone wants to follow (now or in future).

Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yegorich picture yegorich  路  3Comments

limasued picture limasued  路  3Comments

sschlesier picture sschlesier  路  3Comments

educhana picture educhana  路  5Comments

daxelrod picture daxelrod  路  4Comments