This appears to be a common problem:
and I've tried everything (that I'm allowed to do... I don't have Administrator access and will never get it). I'm using Windows 7 64 bit and have no control over changing that.
All the fs cache options mean that a git status is relatively fast (maybe a few seconds) on a humongous codebase, but still nothing compared to what I'm used to on GNU/Linux. Anything like an interactive rebase (even to squash two trivial commits) can take as long as a minute.
Looking at the Task Manager, I can see lots and lots of forked git subprocesses. This may be the cause of the performance problems: I understand this is very slow in Windows.
Would it be a major piece of work to redesign the subprocess execution to be in-process? Even if it meant doing everything in serial.
I'm considering alternatives, possibly even contributing to http://www.eclipse.org/jgit/ as an alternative, as it does not require any subprocess forking. I suspect that the performance cost of the JVM startup (and loss of C-level optimisations) is insignificant compared to the process forking overhead on Windows.
This appears to be a common problem:
Yes, it is known, and there was already somebody working on it, but he decided that he ran out of time: https://github.com/git-for-windows/git/pull/461
and I've tried everything (that I'm allowed to do... I don't have Administrator access and will never get it).
Nope, you did not try to rewrite the shell script that implements interactive rebase in C.
:smile: I should rephrase that as "I tried everything sane". But I'm very happy to hear that insane people exist. I'd be willing to fund this work if it is small enough.
I'd be willing to fund this work if it is small enough.
It is not small.
Just for comparison: the latest Google Summer of Code sponsored the shell script -> C conversion for git pull and git am, i.e. two shell scripts. Google paid the student $4,500 in total IIRC. The rebase functionality consists of _four_ shell scripts.
@dscho good to get some numbers on it!
BTW, there is a new GSOC coming up, is there any interest from capable students for the final four scripts? I'd be willing to fund one, maybe two, if there is a shortfall.
this has been marked as an upstream matter. Is there a ticket upstream that I can subscribe to?
Is there a ticket upstream that I can subscribe to?
No.
@dscho I'm investigating funding possibilities. Would you be able to find somebody who would be willing to work on this?
what about perl instead of native? Might be a lot easier to do that first and get some quick / easy wins.
what about perl instead of native? Might be a lot easier to do that first and get some quick / easy wins.
Wishful thinking! Our Perl is using a POSIX emulation layer, so it is not a speed demon. And you would still have to call Git pretty much as many times as the shell scripts do, and those spawned processes are what kills performance.
Hi @dscho, @fommil,
I'm planning on participating in GSoC again this year. I'm interested in tackling this challenge if Git applies as well :-)
For now, I'm working on a prototype to see if this will be feasible during the GSoC timeframe. I'll send a WIP PR to this repo when I get the basic structure done.
Thanks.
(Watching with excitement)
For quick/easy wins, I found replacing /usr/bin/sh with busybox cut a ~45 second git subtree split to ~25 seconds. http://frippery.org/busybox/
@npostavs those might be quick performance wins, but at what price? Last time I tried, I could not even build a 64-bit version of BusyBox, let alone pass Git's test suite. So it might be faster, but at least at the moment, it is also incorrect.
About slow rebase: Before re-implementing rebase in C, I suggest to use cherry-pick where possible.
Last time I tried, I could not even build a 64-bit version of BusyBox
I tried now and my initial attempt for 64-bit compile failed as well; probably not worth pursuing as the speedup isn't big enough to be worth spending a significant amount of time on.
@linquize that doesn't help, because that would involive an extreme amount of manual intervention, plus cherry-pick is also really slow on Windows :-/ (which is a fundamental problem no matter what we do)
@fommil
plus cherry-pick is also really slow on Windows
Eh? How slow is it, and how many commits are we talking about? It seems fast enough to me, especially compared to interactive rebase.
take a big repository, e.g. linux kernel, and squash some commits. That can take 1 - 2 minutes. Rebasing (without conflicts) can take as long.
I submitted a barebones git-rebase-in-C patch series to the gitml. A rewrite would likely lead to a speedup of 1.7x-13x on Windows with medium-sized repos like git.git.
Any news on that pull request ? ;-)
I understand that somebody submitted a gsoc on this topic, so we can but hope.
I understand that somebody submitted a gsoc on this topic, so we can but hope.
@fommil unfortunately, @pyokagan decided not to apply this year.
As to faster interactive rebase: I am working on it (yes, it is a mess right now, but the commits are clearly marked). It is slow going because it is such a big task, and quite frankly, I would much prefer actual help to cheers from the peanut gallery :grinning:
As to @pyokagan's patch series, there have not been updates since the first round received tons of feedback.
I would also like to caution that I have seen no evidence for the claimed speedup. This is actually something with which users _who are really interested in a faster interactive rebase_ (read: who are prepared to put in some time and effort themselves) can easily help. Git's test suite has a very special place for performance tests, and its notable lack of rebase testing is an obvious place to start contributing: https://github.com/git-for-windows/git/tree/master/t/perf
Hi @dscho,
As to @pyokagan's patch series, there have not been updates since the first round received tons of feedback.
Yes, as I've mentioned in the mailing list thread I'll wait for your reworking of sequencer first. This is also the reason why I've not applied this year.
I would also like to caution that I have seen no evidence for the claimed speedup.
Hmm, is the cover letter from my patch series not enough evidence?
Before patch series:
Test this tree
----------------------------------------------------
3400.2: rebase --onto master^ 10.90(0.06+0.47)
3402.2: rebase -m --onto master^ 86.87(0.04+0.47)
3404.2: rebase -i --onto master^ 191.65(0.09+0.44)
After patch series:
Test this tree
---------------------------------------------------
3400.2: rebase --onto master^ 6.45(0.13+0.40)
3402.2: rebase -m --onto master^ 12.32(0.13+0.40)
3404.2: rebase -i --onto master^ 14.16(0.15+0.40)
@pyokagan :sob:
This is also the reason why I've not applied this year.
I hope you did not misunderstand my comments as indicating that you should not apply this year.
I would also like to caution that I have seen no evidence for the claimed speedup.
Hmm, is the cover letter from my patch series not enough evidence?
It is a bit hard to believe those numbers, given that the cherry-pick call is still spawned and given that there are quite a few missing pieces (e.g. rewritten-list is not generated, patches are not generated in case of a failed pick, neither is autostash respected, etc). I would have been convincable with a perf test that I can run locally...
I have seen evidence that interactive rebases are sped up by having a perl rewrite of the script, such that there are no spawned processes. But that code was proprietary and only a proof of concept (plus it only did the part _up to_ calling out to
git rebase). It certainly felt a lot snappier.
If it is not public, it is as if it never existed.
A rewrite would likely lead to a speedup of 1.7x-13x on Windows with medium-sized repos like git.git.
As I feared: this statement is questionable. I am now done with the first round of optimizations (which port the actual processing of the rebase script to C) and get only a 1.03x speedup on Linux and a 2x speedup on Windows.
The reason for these wildly different numbers is that the 1.7x-13x claim was made based on code that did not port the sanity checks nor the post-rebase actions to the builtin. In other words, these impressive speedups were only possible because the tested fast version did substantially _less_ than the slow version.
This is a great disappointment that could have easily been avoided simply by not claiming such high gains based on incomplete data.
For sure 2x is not 13x speedup - however, a two times speedup for Windows is a really really big thing :-)
@dscho what size of project are you testing against? Can you do some numbers against the Linux Kernel git repository please, (and turn on some virus checkers that scan newly launched processes for the full experience of corporate development). The Linux Kernel is closer in size to my industry project. 2x is a good result.
I would also be interested to hear the size of the repository tested. Our corporate repository is > 200,000 files and Windows is our development platform. Given that, a 2x speedup on Windows -- especially if it's on a codebase smaller than ours -- is a great win in my book.
what size of project are you testing against?
@fommil you just need to look at my branch. I added a perf script that uses git.git itself.
Can you do some numbers against the Linux Kernel git repository please
Sure. Do you want fries with that, too?
[EDIT] Seriously again: if you want those numbers, why not go ahead and build Git from my branch and actually do the testing? You do not need _me_ to do those tests. You actually do not need me _to spend time testing_ when my time is _much better_ spent working on finalizing that branch.
@fommil the fact that @dscho is even attempting something like translating the git-rebase--interactive script into a C built in should be praised. It's a heroic undertaking as there's a lot of untraveled ground between the lush world of BASH and the barren wastes of C.
Sure. Do you want fries with that, too?
Yes please :yum:
I would also be interested to hear the size of the repository tested.
@KorkyPlunger, @dscho has access to the a repo of roughly 3,000,000 files (yes, three million - no I didn't mistype) to prove his concept against; and the repo requires Windows. If there's any repository that can help find optimizations, this is the one. :wink:
@whoisj yes, I know, which is why I have been very supportive of any initiative from the start - including background work to try and raise funding / contribution for the initiative and noting that even a 2x speedup is enough to be encouraging.
I edited my comment to clarify the frustrated "Do you want fries with that?"
For the record, money is not the issue here. The main problem is that there is too much cheering and not much in the way of assisting.
The main problem is that there is too much cheering and not much in the way of assisting.
For the record, I even made it super-easy for everybody to test the performance in their setup, by introducing a new t/perf/p3404-rebase-interactive.sh test. _Everybody_ can test the impact of my work on their system's interactive rebase with that.
For those who were curious, interactive rebases are indeed essentially instantaneous on the upcoming Ubuntu-Bash-on-Windows.
For anybody who is up for some testing in addition to talking, have a look here: https://github.com/dscho/git/releases/tag/rebase-helper-v0 (and be prepared to work with me).
For the one person who was up to test in addition to fill the airwaves: sorry. Here is the next attempt: https://github.com/dscho/git/releases/tag/rebase-helper-v0.1
@richardszalay excellent, thanks for looking into it. It'll take perhaps a decade for large corporates to move onto Windows 10 (Windows 7 is the new Windows XP), but at least that's sounding positive.
@dscho in the case that your remarks are aimed at me, I'd like to apologise for not having run these performance tests. I am unable to install or run arbitrary software (including source code distributions) at my work due to it being a heavily regulated environment. I do not use Windows at home. I could install Windows at home, but coupled with the amount of C (and C tooling) that I would have to learn, the advice of https://xkcd.com/1205/ recommends that I return to being an "air filler" user of git for windows, instead of offering to help fund / matchmake a third party to do the work on my/your behalf, since you have stated that is of no interest to you. Good luck.
Most helpful comment
I submitted a barebones git-rebase-in-C patch series to the gitml. A rewrite would likely lead to a speedup of 1.7x-13x on Windows with medium-sized repos like git.git.