Magit: Implement an Elisp binding for libgit2

Created on 12 Jan 2017  Â·  108Comments  Â·  Source: magit/magit

This description was taken from #2956. I intend to replace it with a more in-depth description at a later time.

Magit is slow and part of fixing that involves the use of libgit2, "a portable, pure C implementation of the Git core methods provided as a re-entrant linkable library with a solid API, allowing you to write native speed custom Git applications in any language which supports C bindings." Unfortunately nobody has written that for Elisp yet and since improving performance is a top priority now, I'll to it.

This will be named just libgit.el (or libgit2.el) and be pretty basic, i.e. just expose the functions provided by libgit2 to Elisp.


Older discussions: #2539, https://github.com/magit/magit/issues/2442#issuecomment-165093660, https://github.com/magit/magit/issues/1327#issuecomment-39678570. (Yes, this goes back a while, but note that doing this is only even possible since Emacs v25.1, which was released in September 2016.)


Some resources:

And of course...

abstraction feature request progress

Most helpful comment

Sorry for the long silence. I had to think about this and then come up with an implementation that demonstrates that we actually can use git-based and libgit-based functions along-side, see #3622. It's very ugly, but I don't think there is a way around that.


As for the alternative approaches suggested above: If git itself provided a git command-server similar to what hg apparently provides, then that would be great. I don't really want to use a third-party libgit-based implementation of that idea though. Too much responsibility would rest on our shoulders if we did that.


If you plan to contribute to libegit2, then please concentrate on functions that collect information (git rev-parse-like functions) not functions that are run to produce some effect (git push-like functions). Of course eventually we would also want to get around to the latter, but I do not intend to use those functions in Magit.

All 108 comments

@tarsius If you have any libgit2 related questions, feel free to ask me. I've used it extensively, both for personal projects and in production, and I maintain the Haskell bindings to libgit2.

That's awesome to hear! I suspect you also have some experience with the new module support. I intend to get started with this soon, but could definitely need some help.

I haven't yet looked into the module support, but this would be a great way to learn it. Count me in!

@jwiegley I'm not current on emacs-devel, but have we thought at all about how modules will be distributed? Is package.el planning support for them?

I ask since they impact the structure of the implementation and I've been thinking about starting that project up again personally (now that magithub is stable-ish).

part of fixing that involves the use of libgit2

I would love to see some data to back up that claim. It _sounds_ right to me, but it would be a shame for you to spend precious time on an optimization that might not bear fruit as anticipated. Perhaps we could start with just a minimal implementation of the libgit2 bindings and do some benchmarks to prove the concept.

I'm guessing you've already thought this through... But it would be nice to see the data. I can help out if there's any dividing and conquering that can be done.

This is a good question, and maybe we'll be the first ones to answer it for future package authors too. I just don't know yet. :)

@mgalgs If someone can show me a set of commands that are being used by magit, and which are presumed to be slow, I can tell you how libgit2 might affect the performance there and why. It's possible too that we could use more caching, and more low-level Git commands (for example, direct tree manipulation) to defer going the libgit2 route.

@mgalgs @tarsius From my memory of prior conversations, rev-parse is probably the biggest single hitter. See also ksjogo/emacs-libgit2#4.

From my memory of prior conversations, rev-parse is probably the biggest single hitter.

... because we call it a lot. So it would be a good idea to implement support for that first.

If someone can show me a set of commands that are being used by magit, and which are presumed to be slow,

The problem isn't that certain git commands are slow on Windows, but that starting subprocesses is slow per se and Magit starts many.

It's possible too that we could use more caching,

We already do a lot of caching. Identical calls (same arguments and directory) during a single refresh (i.e. after every Magit command) get the value from a cache. I don't think there is much room for improvement here. Well there is--see #2982--but that goes much further than just a stupid cache.

and more low-level Git commands

There have been some reports that e.g. rebasing can be slow on Windows, I think. But we cannot do much about that--I certainly don't want to reimplement every Git command that is still implemented as a shell script.

However a few months ago a similar (but much less severe) instance of "starting a subprocess is slow" was fixed, but only on macOS/Darwin. I am hoping that something similar can be done on Windows.

Unfortunately I never got around asking the right people for help. We should dig up the old discussions and then bring those to their attention. The issue on macOS was that "the wrong fork" was being used and since that was being done for a very long time on macOS, the same thing might very well be true on Windows also.

But even if that gives us (not just Magit, but any package that uses many subprocesses) an amazing performance boost, I would still like to be able to use libgit from elisp.

For anyone who's curious, I've implemented a type of benchmark in this gist. What I've done is I've redirected magit-git-executable to a shell script that logs the input and times it. I've got some elisp also that processes that; once I'm done running errands, I'll be doing more processing of that log output so we can say with certainty how long we spend doing git commands.

More useful metrics would have to correlate these data with the timestamps of actual magit commands, but here are my own numbers (using the approach and code above) after using magit to review some history. First column is number of calls, second column is total time taken by that command.

231  1.6963  "rev-parse --show-toplevel"
196  1.4190  "rev-parse --show-cdup"
 19  0.2855  "show -p --cc --format=%n --no-prefix --numstat --stat --no-ext-diff <short-hash>^{commit} --"
 19  0.2591  "branch --merged <short-hash>"
 19  0.2483  "branch --contains <short-hash>"
 19  0.2398  "show --no-patch --format=%d --decorate=full <short-hash>^{commit} --"
 19  0.1996  "describe --contains <short-hash>"
 19  0.1855  "describe --long --tags <short-hash>"
 19  0.1669  "show --no-patch --format=Author:     %aN <%aE>\nAuthorDate: %ad\nCommit:     %cN <%cE>\nCommitDate: %cd\n <short-hash>^{commit} --"
 19  0.1601  "show --no-patch --format=%h %s <long-hash>^{commit} --"
 19  0.1584  "rev-parse --verify <short-hash>^{commit}"
 19  0.1535  "show --no-patch --format=%B <short-hash>^{commit} --"
 19  0.1506  "rev-parse <short-hash>^{commit}"
 19  0.1495  "rev-list -1 --parents <short-hash>"
 19  0.1485  "cat-file -t <short-hash>"
 19  0.1472  "notes show <short-hash>"
...

Be careful with libgit2 as I don't think it implements file locks in a compatible way with Git itself. For example if git gc is run in the background and libgit2 is doing things at the same time, there could be problems.

Based on issues like https://github.com/libgit2/libgit2/issues/2902, I think the developers both think about these sorts of issues, and would be open to bug reports about them.

Maybe but in general I think libgit2 development is lagging behind Git development. See:
https://github.com/git/git/graphs/contributors
https://github.com/libgit2/libgit2/graphs/contributors
(Disclosure: I am a Git developer)

I can certainly believe that.

I started an experimental module for this https://github.com/ubolonton/magit-libgit2

It currently advises magit-rev-parse to use libgit2 where possible.

Some notes:

  • A quick benchmark on my laptop showed 40x speedup for that function. I'm going to check if the difference can be perceived in daily uses.
  • We should probably add some automated benchmarks, ideally integrated with CI, to identify slow parts.
  • Writing the module in Rust is quite nice. The tooling is good, and I got a live reloading setup going.
  • We can start with implementing only functionalities needed by magit. A generic libgit2.el can be extracted much later on.

@ubolonton I am taking a two week break, but am excited to look at this when I get back.

Hi everyone,

I was interested in working on this a bit. From what I see there are two cases of "prior art":

I played around with my own module here: https://github.com/TheBB/libegit2

Now since this is a relatively ambitious project, I'm wary to invite further fracturing, but in my defense, (a) I'm not comfortable with rust, (b) @ksjogo's repo seems abandoned, and (c) I had fun anyway.

I'm aiming for a thin wrapper. If you're familiar with PyQt, it's possible to read Qt's C++ documentation and translate directly to Python. That's the level I'd like to aim for: you can read libgit2's C documentation and use it directly from Emacs with no go-between.

I haven't yet tried to get magit to play with this module.

What's the current status on your side? Should I continue working on this?

I'm aiming for a thin wrapper. [...] That's the level I'd like to aim for: you can read libgit2's C documentation and use it directly from Emacs with no go-between.

That's exactly what I was hoping for and would have done eventually if you didn't beat me to it. But it would probably have taken me much longer than someone more familiar with C.

So far this is pretty incomplete but it is very promising that you have already outlined your plans on what you do or don't intend to implement and that you have added documentation that allows others to contribute.

I think I am going to add this to Emacs.g very soon - not just to the magit-directors-cut branch, but master.

I haven't yet tried to get magit to play with this module.

I have only played with it a tiny bit. But that already confirms that this is easily installable (when using borg :grinning: ). Also I already ran into the first problem: you probably want to expand-file-name all paths before handing them to libgit2 so that not every caller has to do it (git and libgit2 don't understand ~/).

I am already quite sure that this is what I am going to use in Magit (*). If you would like to do that too, then I would like to welcome this project into the magit "organization". Combined with your instructions, that might help encourage contributions.

Beside the need for greater coverage, I think the most important tasks ahead are:

  1. Reconsider the symbol prefix. While this is the package that most deserves the git- prefix (especially now that the git.el that used to be part of Git itself has been removed), this is going to lead to a lot of conflicts with many existing packages and that could lead to a lot of unnecessary work. What about libgit-, lid-, lgit- or egit? (lgit was the name of my own pre-magit git library, and egit is a very old abandoned magit competitor of sorts.)

  2. Make it easily installable from Melpa. Again its important to get this right or else the price has to be payed later in the form of having to help lots of lost users.


(*) I don't want to discourage other efforts though. (By the way, @ubolonton sorry for not getting back to you.) But I do favor this implementation not least because @TheBB has maintained other important Emacs projects before and because, as I said, his approach is pretty much what I had hoped for. It also has less of a proof-of-concept feel to it.

Pinging some people who might be interested in contributing to this effort - @jwiegley @vermiculus @mgalgs @chriscool.

(I've added some useful resources to my initial post above.)

Great!

you probably want to expand-file-name all paths before handing them to libgit2 so that not every caller has to do it

That's fair enough.

I would like to welcome this project into the magit "organization".

I'd be happy to.

Reconsider the symbol prefix.

If I can't use git I'd rather just go straight to libgit I think.

Make it easily installable from Melpa.

If the accepted route for packages with compiled components is still the way pdf-tools does it, there's going to be some trying and failing to get that to work. :-s

I haven't worked more on my approach, since I was working on the underlying Rust binding, and I'm not familiar with the requirements of magit's low level APIs.

It's great to see another effort getting traction. My only concern is that writing it in C makes it easier to introduce bugs/instability. Rust is a much saner choice to write native modules, in term of both language and tooling. However, since C would probably result in more people participating, this would be a non-issue.

I'm happy to see this moving forward!

Hi everyone, my effort was mainly started by me being forced to use a Windows laptop for some time, that changed and I was fine with performance of magit again, so I didn't continue. If anyone wants to use the code for something, feel free to do so.
Regarding implementation language, I would advise going for the language which is the easiest compilable for all platforms in all required configurations. I am not sure how well Rust is standing there. C was relatively easy to setup.

In terms of ease of compilation, Rust is better than C. That's not the most important factor though, IMO.

Make it easily installable from Melpa. Again its important to get this right or else the price has to be payed later in the form of having to help lots of lost users.

Is there any reason not to get this into GNU ELPA from the outset?

This library would surely provide benefits for vc-git as well, so taking steps now to avoid the contributors-without-FSF-copyright-assignment trap seems like a good idea to me.

No particular reason. I'd be happy to put it on ELPA too.

One approach that might be worth considering is using an FFI: https://github.com/tromey/emacs-ffi. The advantage is that there's no need to write C code, nor any need to compile the glue code for various platforms [update: oh, but we need to compile emacs-ffi's glue code, sadly]. You do need libffi, but that should be easy to install on Linux distributions (usually pre-installed since Python and OpenJDK depend on it), it ships with OS X, and, perhaps most importantly, it ships with Git for Windows.

Furthermore, emacs-ffi (or something along its lines) seems like something that might one day ship with Emacs? That would avoid the need to compile emacs-ffi's glue code. I've opened an issue about that here: https://github.com/tromey/emacs-ffi/issues/20.

That said, I haven't actually tried to use emacs-ffi, yet.

~ would like to welcome this project into the magit "organization".

I'd be happy to.

Lets do that then. I will have to give you admin access to the organization for you to be able to transfer the repository. Are you going to be available tomorrow (or now)? After the transfer I will limit your admin access to just this repository. (I am rather busy right now and not on the computer all the time so this might fail and we will have to try again in a week or so.)


Sorry for not replying to all the other new input. Using Rust or emacs-ffi sounds appealing too, but @TheBB's approach is the one that has progressed the most so far, so I will be using that. But please keep working on the other approaches and/or contribute to his efforts.

I'm available all day today.

I'd also love to use an ffi library, if only because the deployment of cross-platform compiled modules to MELPA looks tiring. I'd be interested to see if @tromey can manage it.

What would be the set of libgit2 functions with most priority to make available in elisp so magit can start using them? So those functions could be made available in libegit2 ASAP.

@dacap rev-parse is a need. See above.

So it turns out that I've had chatty-git running since last year, so this is a more rounded example of my use of magit. See that rev-parse is still the big-hitter:

107577  809.5919  "rev-parse --show-toplevel"
 93328  726.4509  "rev-parse --show-cdup"
 23072  177.1379  "symbolic-ref --short HEAD"
 23105  170.0326  "rev-parse --git-dir"
     1  166.8760  "rebase --edit-todo"
 21202  159.4894  "rev-parse @{upstream}"
    36  150.9729  "rebase --continue"
  9471  119.5018  "describe --long --tags"
 15180  111.8685  "rev-parse --is-bare-repository"
  9469   86.6703  "describe --contains HEAD"
  9580   83.2508  "update-index --refresh"
  8402   82.2610  "diff --no-ext-diff --no-prefix --"
 10690   78.8588  "config --list -z"
  9822   78.1585  "for-each-ref --format=%(refname:short) refs/heads"
  8541   77.0307  "name-rev --name-only --no-undefined @{upstream}"
  9436   75.3226  "rev-parse --short HEAD~"
  9785   73.9831  "rev-parse --verify HEAD"
  9436   72.7641  "rev-parse --short HEAD"
  8402   71.7437  "diff --cached --no-ext-diff --no-prefix --"
  9515   71.3091  "rev-parse --verify refs/stash"
  7631   66.2776  "for-each-ref --format=%(refname:short) refs/heads refs/remotes"
  6966   65.9412  "status -z --porcelain"
  8557   64.5011  "remote"
     1   64.1535  "merge --edit --no-ff sa/set-confirm-default"
  8252   60.6545  "config -z --get-all remote.origin.url"
  7390   60.4802  "for-each-ref --format=%(refname) refs/heads refs/remotes"
  6569   57.5323  "reflog --format=%gd%x00%aN%x00%at%x00%gs refs/stash"
  4557   53.6275  "log --format=%h%d%x00%x00%aN%x00%at%x00%s --decorate=full -n10 --use-mailmap --no-prefix HEAD~10..HEAD --"
  5600   47.7975  "rev-parse --verify HEAD~10"
     1   46.7084  "merge --edit --no-ff clone-freely"
  4588   44.7677  "log --format=%h%d%x00%x00%aN%x00%at%x00%s --decorate=full -n256 --use-mailmap --no-prefix ..@{upstream} --"
  5302   41.4387  "symbolic-ref refs/remotes/origin/HEAD"
  4466   38.2130  "show --no-patch --format=%h %s master^{commit} --"
  3654   36.6293  "name-rev --name-only --no-undefined origin/master"
  3794   35.3988  "merge-base --is-ancestor HEAD <long-hash>"
  2522   34.7922  "status -z --porcelain --"
  3868   32.0821  "show --no-patch --format=%s origin/master^{commit} --"
  3035   30.5711  "log --format=%h%d%x00%x00%aN%x00%at%x00%s --decorate=full -n256 --use-mailmap --no-prefix @{upstream}.. --"

I also want to call out rev-parse --verify since I wasn't able to get a good regex to group these together. This is used in the magit-branch-p family of checks and is probably a big time-sink, too.

Note that for speeding up rev-parse, I still use this hack locally, which helps quite a bit:

--- a/lisp/magit-git.el
+++ b/lisp/magit-git.el
@@ -754,16 +754,41 @@

 ;;; Revisions and References

+(defvar magit--rev-parse-toplevel-cache (make-hash-table :test #'equal))
+(defvar magit--rev-parse-cdup-cache (make-hash-table :test #'equal))
+(defvar magit--rev-parse-git-dir-cache (make-hash-table :test #'equal))
+
+(defmacro magit--use-rev-parse-cache (cmd args)
+  `(pcase ,args
+     ('("--show-toplevel")
+      (or (gethash default-directory magit--rev-parse-toplevel-cache)
+          (let ((dir ,cmd))
+            (puthash default-directory dir magit--rev-parse-toplevel-cache)
+            dir)))
+     ('("--show-cdup")
+      (or (gethash default-directory magit--rev-parse-cdup-cache)
+          (let ((dir ,cmd))
+            (puthash default-directory dir magit--rev-parse-cdup-cache)
+            dir)))
+     ('("--git-dir")
+      (or (gethash default-directory magit--rev-parse-git-dir-cache)
+          (let ((dir ,cmd))
+            (puthash default-directory dir magit--rev-parse-git-dir-cache)
+            dir)))
+     (_ ,cmd)))
+
 (defun magit-rev-parse (&rest args)
   "Execute `git rev-parse ARGS', returning first line of output.
-If there is no output, return nil."
-  (apply #'magit-git-string "rev-parse" args))
+  If there is no output, return nil."
+  (magit--use-rev-parse-cache
+   (apply #'magit-git-string "rev-parse" args) args))

 (defun magit-rev-parse-safe (&rest args)
   "Execute `git rev-parse ARGS', returning first line of output.
-If there is no output, return nil.  Like `magit-rev-parse' but
-ignore `magit-git-debug'."
-  (apply #'magit-git-str "rev-parse" args))
+  If there is no output, return nil.  Like `magit-rev-parse' but
+  ignore `magit-git-debug'."
+  (magit--use-rev-parse-cache
+   (apply #'magit-git-str "rev-parse" args) args))

 (defun magit-rev-parse-p (&rest args)
   "Execute `git rev-parse ARGS', returning t if it prints \"true\".

@jwiegley I wonder how your rev-parse behaves after a rebase when all the hashes change.

As a side note, I'd like to improve the performance of magit-log/magit-log-double-commit-limit and magit-stage/magit-unstage-file, which for me are the slowest high-end functions. Here are some profiles using these functions. (You can check your results using profiler-start, profiler-report, profiler-stop, profiler-reset on Emacs.) I don't know where to start yet (eieio-oref and the GC looks like an huge performance penalty doing magit-log, and magit-run-section-hook when stage/unstage.)

@vermiculus I believe it's dependent only on these three specific commands having consistent output when invoked from a given directory:

  • git rev-parse --show-toplevel
  • git rev-parse --show-cdup
  • git rev-parse --git-dir

So git hashes aren't relevant? (assuming that's what you were referring to; the elisp hashes are just of default-directory).

I presume the cache would break if a submodule was introduced into (or removed from) a previously-cached directory. Which could happen outside of Magit. Which seems like the kind of maddening edge-case which could make it difficult to apply such a change generally.

Found this resource in my travels: https://phst.github.io/emacs-modules

Looks to be a fairly complete exposition and demonstration of the module system.

Found this resource in my travels: https://phst.github.io/emacs-modules

Looks to be a fairly complete exposition and demonstration of the module system.

Indeed, that documentation is written by one of the co-authors of Emacs' support for dynamic modules, and was intended to be reviewed as a basis for inclusion in the Elisp manual. See the following emacs-devel thread(s):

I've been away on holiday for a week. I'm back now though so if you want to transfer, let me know whenever.

I've spent a rather embarrassing amount of time setting up building and CI for all three major OSes in the hopes of distributing binary files. If anyone are more knowledgeable than I am with this I'd be happy to accept help.

@TheBB If added you to the organisation and think you should be able to transfer the repository now.

Have you tried yet? Just asking because you have arrived but not yet your repository :wink:

Yeah sorry, just clicked the e-mail link on the phone at dinnertime, but waited to transfer until I was back at a computer. It looks like it's done now.

For public record: https://github.com/magit/libegit2

Weeee! (Okay, now I am off for a few hours.)

@tarsius Could you enable appveyor for the libegit2 repo, or otherwise allow me to do it? I changed the repo for my appveyor project that used to be pointed to TheBB/libegit2 to magit/libegit2, and it builds PRs, but doesn't report build successes or failures to github any more. I guess all we need is to make a new one from scratch.

@TheBB I have setup appveyor for this repository and it seems to work (though it only builds on one platform I think). You also have admin access for the github repository now, which propagates to appveyor I believe.

Thanks! Yeah, Appveyor is for Windows.

@tarsius Just curious: will this be ready soon(ish)? I installed the libgit package the other day, but I was disappointed when I found out that the actual usage of libgit2 hasn't been implemented yet. I am really looking forward to this feature, since I work a lot on a project which usually has lots of unstaged & untracked changes... Magit sadly chokes on it more often than not. My only recourse currently is to use Tower or tig. But Tower is too bright for my taste and tig, well... it just isn't Magit ;)

Not trying to rush anyone, just curious if there might be some progress soon here. Also, I'm curious as to where the performance improvements will be in the interface, i.e., will my issues above be lessened? Or are the speed improvements mostly for other parts of the interface at first?

Yeah, I am hoping to get around to that soon. One problem is that the module is fairly incomplete, so I cannot go much further than a proof-of-concept right now, using whatever is already available and holding out for the rest. On the other hand if I did that, then that might encourage @TheBB and others to implement some of the missing functions, and I would be able to list some of the functions that would be most useful.

@tarsius That sounds like a good place to start. I'd like to help, but I'm probably too strapped for time to justifiably do so... if there are some low-hanging fruits that @TheBB or you would like to have done once you get started, I think it'd be a good idea to post it somewhere so that perhaps someone such as myself could get into it without feeling lost. Or without spending inordinate amounts of time on it :laughing:

Is anybody still working on this ?

I am curious as @brotzeit is; I haven't seen any developments or pointers on how to get started here. Is this still a priority for Magit?

I have to admit I fell off the wagon for this somewhat. Aside from a vaguely nonspecific idea to get back on it eventually, I'm still watching the repo though and will for sure review PRs if anyone would like to keep the ball rolling. I believe the readme is quite thorough in terms of how to expand libegit2. If that's not the case please just ask questions.

https://github.com/magit/libegit2

@TheBB Thanks for the info. I myself probably won't have much time to contribute to this in the near future (although I'd like to), but thanks a lot for pointing to the README. After looking at it more thoroughly (rather than glancing through it), you're right; it provides good info on how to get started. Hopefully this ball can get rolling again soon. I'm sure we're all looking forward to a faster Magit!

I would like to do it, but I won't have the time :/
Maybe post it on reddit to see if somebody is interested in implementing it ? Worth a try...

Agreed. A "Help Wanted" thread on reddit could be useful. /r/emacs would be the place to post it.

FWIW, I'm interested in either funding someone or being funded myself to do this.

Thanks for the tip, I'll write a call to arms on reddit tonight.

@luismbo Sounds good. How much would you expect ?

I'm going to create a libgit branch in the repository over the next two days or so.

I just realized that libgit won't work over Tramp. It's quite amazing that I, and apparently nobody else, realized that before. That's very bad news. It means that for every magit function that we port to libgit we have to keep the old implementation around. And then we need wrappers to dispatch the proper implementation (probably based on the file-handler functionality). That's going to be a lot of work and the resulting bloat will be with us forever. Or we drop support for Tramp. We are stuck between a rock and a hard place.

Why can't libgit work over Tramp? Is that something that can be accounted for in the module?

For the same reason Tramp has to use the ls executable instead of the opendir() and readdir() functions to list the contents of a remote directory.

The emacs process runs on the local machine. Adding bindings for libgit2 teaches that process new tricks, but making system calls on another machine is not one of them. That may be possible with some academic Erlang OS, but not with Linux, *BSD, macOS, or Windows.

Currently when Magit needs to run git "over Tramp" it does that by running git on the other machine. That's how Tramp works -- it runs processes on the other machine and then does something with the output of those processes.

The equivalent with a libgit-enabled emacs would be to run another such emacs instance on the other machine. That would defeat the purpose of Tramp at least for one popular use-case: do stuff on another machine which doesn't have Emacs installed, from the comfort of the local Emacs. (And obviously starting many emacs processes would be much slower than starting many git processes, so this is a non-starter.)

Couldn't the module run git remotely if it must? I'm thinking a normal remote execution over something like ssh. That should provide some output that could be parsed into the appropriate data structures.

@vermiculus, are you suggesting that if it did that and returned data which was equivalent to what it would return if it could use libgit, then Magit wouldn't need to worry about maintaining multiple ways of doing things in the future?

I guess that might work, but it seems to me like it's just shifting the burden of the legacy approach from Magit (where it is already implemented and working) into the C module (where it is not implemented at all), so I'm not sure that's the best approach?

As libgit can't work with a remote git repository, maybe the sensible approach is not to implement libgit support directly as an Emacs module, but instead to implement a separate program which uses libgit, and to which Emacs can communicate (whether locally or remotely) using a persistent process?

The idea being that this program could be installed locally (in which case a local Emacs can start it as a process to do local git things), and it could likewise be installed remotely (in which case I believe a local Emacs could talk to the remote process in order to do remote git things).

It would still be necessary for Magit to maintain non-libgit functionality (unless installing the program became a requirement); but this still seems like an improvement upon "Cannot take advantage of libgit over tramp at all" ?

Edit: Maybe such a program already exists? Regardless, it would clearly have applicability to more than just Emacs, so such a project might get wider support from the git community, compared to an Emacs-specific module?

Yes, you understand correctly. I can see your point about shifting work though, but my gut instinct is that it would be more straightforward in C -- in some ways, this low-level parsing is what the language was built to do.

I think I might be on your same thought process -- it seems to me like we've identified a deficiency in libgit. Could that deficiency be appropriately resolved within libgit? In other words, can we teach libgit to operate over a network?

Would it be really that much effort to provide libgit and tramp functionality ?

Having concurrent implementations of some functionality is doable if there is a plan to remove the duplication. Otherwise, the two implementations will surely fall out of sync.

It is undeniable that implementing two backends, git and libgit, will add complexity, particularly maintenance-wise. In terms of initial implementation, it might not be too different effort-wise, though. In fact, there might be some advantages to having two backends:

  1. For read-only operations, the git backend can be used to test the libgit backend and vice-versa. Both backends can be executed and have their outputs compared. Write operations would be trickier, but could be done with appropriately set up clones, if worthwhile.

  2. The git backend can be used whenever libgit2 and/or the libegit2 glue library are unavailable.

  3. And, of course, we can have TRAMP _and_ fast local execution, particularly on Windows. (That is to say, this solution fits the requirements.)

The maintenance burden will be pretty annoying, no doubt. Whenever the backend API needs tweaking, you'll need to implement it twice. Perhaps we could think of the second implementation as a unit test of sorts?

In my humble, non-maintainer opinion, I don't think having two concurrent and ongoing implementations is the right path for Magit – or indeed for any project. It's early here, so I may not be understanding fully/correctly, but (1) seems like an performance drain rather an improvement. Instead of replacing shell-outs, we would add to them with more in-process work. (2) is reasonable, but I don't think it's a good enough reason to maintain two implementations indefinitely. It will make it much harder for the project to move forward unless a simple, obvious, and robust system can be put in place that makes dual implementations truly easy (and makes it apparent when they've broken). I doubt such a system exists, but I'm willing to be proven wrong.

I've raised the possibility/question to libgit2. I do not know when/if the conversation will move to a more stable medium, but it's currently on their slack channel.


three hours later: there's some good conversation there

@vermiculus the main motivation for me (not necessarily for others) for using libgit is that launching n git processes for each magit operation is _unbearably_ slow on Windows. (1) is an interesting point, though. At least some libgit calls should probably take place in a separate thread, and either Emacs threads can context switch at the call-to-C boundary or libegit2 needs to provide asynchronicity somehow.

That's a main motivator for me as well, though I'm not following your thinking. What does async have to do with the current conversation?

@vermiculus you mentioned that calling libgit2 meant more in-process work. I was just noticing that such in-process work will block the Emacs process and would become a problem for operations that aren't instantaneous. The most obvious offenders would be fetch and push.

How many people rely on using Magit-via-tramp? I'd be happy to drop Tramp support, in order to gain libgit2.

@jwiegley I felt the same way at first, but a number of people on reddit have made it clear that they rely on Magit+TRAMP. I've been in their position many times, so I cannot in good conscience advocate for the disregard of these users.

Someone made a good point on the reddit thread: the libgit backend doesn't need to be complete. It can focus on performance critical operations and fallback to git elsewhere.

John Wiegley notifications@github.com writes:

How many people rely on using Magit-via-tramp? I'd be happy to drop Tramp support, in order to gain libgit2.

I do this a lot when I'm editing configs on my server. It's nice to be
able to commit changes on the server and then pull to my local machine
later.

I mean it wouldn't be the end of the world if I had to open up a shell
and do it from there instead. It would probably be faster anyways
because magit via tramp is quite slow from my experience.

--
https://jb55.com

I too rely on remote file support frequently. I think that remote file support is one of the basic features of Emacs that packages should aim to provide. Removing it seems like a significant step backwards.

While editing and commiting on remote machines is nice, editing locally and pushing to the remote machine may be sufficient, given that we are talking about git. When push-to-checkout is set up, this allows conveniently pushing to a remote which will then directly check out the updated branch. It may be a good idea to add support to magit for configuring the remote repository to support push-to-checkout, which e.g. can be done via

# Setup git push_to_checkout on a remote server

ssh -t $1 "cd $2; git config receive.denyCurrentBranch updateInstead"
ssh -t $1 "cd $2; echo 'git read-tree -u -m HEAD \"\$1\"' > .git/hooks/push-to-checkout"
ssh -t $1 "cd $2; chmod +x .git/hooks/push-to-checkout"

The thing with tramp in general is that it's unlikely to be the most performant, and there's usually a 'better' way if you take the time to set it up, but it is incredibly convenient when you haven't set up something better (to the point where many people do indeed use it all the time, even if others would not consider working that way).

Tramp is awesome for this reason, and the fact that Magit works over tramp is awesome too. I think that eliminating that facility from Magit would be a tremendous shame (especially given how much effort has gone into making it work in the first place).

I do not intend to remove tramp support.

So I'm not too familiar with the TRAMP support, but is it acceptable if a libgit-based solution were to install some RPC server on the remote that libgit could communicate with? I originally disregarded the idea because it was the definition of remote code execution, but from the example above it appears we're already doing that.

Such a remote server would need to be using libgit, so I think that's a similar suggestion to https://github.com/magit/magit/issues/2959#issuecomment-428839121 ?

(Also, authorized remote code execution isn't cause for alarm.)

Ah, yes -- I was recalling a conversation on libgit's slack channel, but yours is essentially the same suggestion.

I do think such a thing would be widely useful, so integration into libgit itself might not be a bad idea.

Sorry for the long silence. I had to think about this and then come up with an implementation that demonstrates that we actually can use git-based and libgit-based functions along-side, see #3622. It's very ugly, but I don't think there is a way around that.


As for the alternative approaches suggested above: If git itself provided a git command-server similar to what hg apparently provides, then that would be great. I don't really want to use a third-party libgit-based implementation of that idea though. Too much responsibility would rest on our shoulders if we did that.


If you plan to contribute to libegit2, then please concentrate on functions that collect information (git rev-parse-like functions) not functions that are run to produce some effect (git push-like functions). Of course eventually we would also want to get around to the latter, but I do not intend to use those functions in Magit.

Thanks for pushing forwards with this, tarsius.

I don't really want to use a third-party libgit-based implementation of that idea though. Too much responsibility would rest on our shoulders if we did that.

In principle, if such a program was maintained by other people, would you consider supporting it?

I don't know whether such a thing would gain interest/traction, and I'm not in a position to contribute to it myself, but it seemed like the sort of hypothetical program which could be generally useful, outside of the Emacs sphere.

My main enthusiasm for the idea is that, AFAICS, this would be the only way to facilitate libgit usage in situations when the repository is being accessed by tramp; so if it came to pass that such a program was written, it would be nice if Magit was able to take advantage.

it seemed like the sort of hypothetical program which could be generally useful, outside of the Emacs sphere

This is my hope -- libgit2 maintainers were receptive to the idea, though it would be a large project.

Would it be so large? It seems to me that the existing library would do almost all the work, and the server would be a fairly simple wrapper around that, which would accept network connections and then, for each request, hand it off to the library, and return the result. I'm imagining that the requests would be in a format dictated by libgit's own interface, so that they would be translated automatically.

I don't know what was discussed upstream, though (the aforementioned Slack link not being public).

The slack link is 'kinda' public – you can get an invite from the libgit2 README.

I think the architectural concerns were with integrating the RPC layer into libgit2, though it's difficult to tell in hindsight. If the RPC layer were completely above libgit2 (IOW libgit2 didn't know about the RPC layer), then I don't imagine libgit2 would see any difference.

@ethomson – did I understand you correctly? I don't want to relay bad information.

An RPC a layer could easily use any number of existing bindings for interpreted languages, but there is a portability tradeoff with compiled binaries. (Does the remote have the right version of the interpreter? Does the binary I have work with the remote OS? Probably more questions…) I think we're going to have that problem regardless of who would own the development.

I've been doing a lot of research. There exist some solutions for this general problem already (GitRPC and Gitaly were the two big ones I found), though these both involve running through an interpreter (Ruby) that may not be on the remote. I also found a library that may be used for any serialization of binary data (tpl), but the license for this (BSD) may be incompatible with magit's future goals of being included in emacs core (IANAL even if I play one on TV).

One option that I think deserves some serious consideration is enhancing git to provide this functionality – possibly via supporting development in libgit2 (is this even an option? the proper relationship between these two would need to be explored). At the end of the day, there's no infrastructure difference in asking git to maintain a connection than asking another process.

There are three big advantages I can see:

  • Larger community of testing and support (nice)
  • No extra software required on the remote (nicer)
  • The inevitable demise of shell-out support (nicest)

There are some disadvantages as well:

  • Larger projects (i.e., git) come with more process and stricter contribution requirements. This may mean not using a library when we otherwise could – but this may be in-line with magit's aforementioned 'emacs core' goal.
  • I would imagine git's source is something of a beast. There is probably a mix of interpreted and compiled code that would need to remain consistent with each other. Simply using libgit2 may not be an option unless the relevant bits are reimplemented with libgit2. I don't know if the project is seen as perfect-enough for git maintainers to be comfortable with that (not to mention the inherent risks in changing working code). The alternative to reimplementation is screen-scraping yet again – but now within git.

These will slow development, but I believe it's the best approach to achieve the stated goals.


That all handles how to maintain a connection with the remote. libgit2 (or libegit2) will still need to be taught how to communicate with the remote git and construct the appropriate data structures to truly deal with remote repositories similarly with local ones (a la the TRAMP shell-out approach).


Ha, this is exactly what tarsius is saying here: https://github.com/magit/magit/issues/2959#issuecomment-430021992

I need to stop doing this work at 4:30am.


If there is any amount of consensus on this approach, I can take point on bringing the issue up on the git mailing list to get their reactions.

@vermiculus git should definitely be implemented on top of libgit2. Sadly, it isn't. I'm not sure why but I imagine it's not a priority from the point of view of the git developers/maintainers. I also agree that it'd be nice if git/libgit2 featured Gitaly-like functionality. I imagine it'll be hard for any of us to steer git in this direction, but don't let my negativity stop you from trying! 😃

Nice job finding Gitaly, by the way. I couldn't figure out what GitRPC is, though. Did you mean gRPC?

The backend architecture we're hashing out in #3622 should allow the implementation and coexistence of different backends. For example, we can start with git (shell) and libgit2 backends and the libgit2 backend doesn't need to be complete: it can fall back to the git backend. A third backend could be Gitaly. Hopefully, at some point, one backend will emerge as the winner, since maintaining several backends seems less than optimal, to say the least.

It's GitRPC: https://twitter.com/brynary/status/236271927474413568?lang=en sadly not open source today.

Might emacs-ffi be useful for this effort?

That option has been explored, but it would preclude inclusion into the GNU ELPA since there are no copyright guarantees.

Sorry about another delay.

In principle, if such a program [a command-server] was maintained by other people, would you consider supporting it?

@phil-s Yes.

One option that I think deserves some serious consideration is enhancing git to provide this functionality – possibly via supporting development in libgit2 (is this even an option? the proper relationship between these two would need to be explored). At the end of the day, there's no infrastructure difference in asking git to maintain a connection than asking another process.

I would prefer that.

libgit2 seems to exist mostly for legal reasons. libgit2 (I don't know whether libgit1 ever was a thing) was created because Github and possibly others wanted to link git but they didn't want to share their own code and the Git developers didn't want to add a linking exception to Git itself. (I don't really know whether this is true, just making educated guesses.)

git should definitely be implemented on top of libgit2.

@luismbo libgit2 will always lag behind git, so rebasing git on top of libgit2 isn't an option, I would expect. That also would go against the "no linking-exception for git itself" position.

But the Git developers might be open to the idea of implementing a command server.

The backend architecture we're hashing out in #3622 should allow the implementation and coexistence of different backends.

Exactly. @phil-s The use of generic functions makes it fairly easy to implement a third (or forth) backend. This would require some minor changes; currently some support functions assume that there are no backends beside the git and the libgit backends, but changing that should be easy.

Might emacs-ffi be useful for this effort?

That option has been explored, but it would preclude inclusion into the GNU ELPA since there are no copyright guarantees.

Now that we have figured out that be cannot depend on libgit alone and have to keep the current implementation at least for use over Tramp (and for people who have difficulties installing libgit) that doesn't really matter anymore. Additional backends do not have to be distributed with Magit.

@diamond-lizard If you want to give emacs-ffi a try please go ahead. The reason I decided against emacs-ffi wasn't legal in the first place, but that emacs-ffi isn't proven technology yet and that libegit2 was progressing quickly. But now that it is easy to add multiple backends (and development of libegit2 has slowed considerably) I would welcome it if someone looked into emacs-ffi more.

We probably have a chicken and egg issue here. I didn't start using libegit2 in Magit because it is still rather incomplete, and Magit not using it yet probably didn't help motivating potential contributors. Anyway I will concentrate on all things libgit this week.

@luismbo libgit2 will always lag behind git, so rebasing git on top of libgit2 isn't an option

Exactly. If git were implemented on top of libgit2, then libgit2 wouldn't lag behind. :-)

@diamond-lizard see https://github.com/magit/magit/issues/2959#issuecomment-386608304 for a bit of analysis on that option.

@tarsius Thanks for the update and for giving your thoughts on some matters. We definitely have a sticky situation here, but I think if we decide on a good course of action and really start implementing something, we'll see some actual headway on this problem.

Hi 👋, I've been following this a bit since you've been contemplating using https://github.com/libgit2/libgit2. Just a few notes:

libgit2 seems to exist mostly for legal reasons. libgit2 (I don't know whether libgit1 ever was a thing) was created because Github and possibly others wanted to link git but they didn't want to share their own code and the Git developers didn't want to add a linking exception to Git itself. (I don't really know whether this is true, just making educated guesses.)

As far as the history of libgit2 goes, it was created because git itself is not architected in a way that it could be included inside another application. Although git does build a number of helper functions into a shared library (this is the original libgit, hence where _libgit2_ gets its name), this library is definitely not something that you could consume yourself, as an end-user. Random example: in failure modes, it often calls exit, for example. Not great for inclusion in your application.

libgit2 was started by a @spearce as a linkable library solution to that problem (before he instead went to build jgit). The linking exception was agreed to by most of the git contributors, and as a result, libgit2 includes a decent amount of git code where it makes sense to use it. (It includes more original code, though that's because our architecture is different than git's, not for any licensing reason.)

@luismbo libgit2 will always lag behind git, so rebasing git on top of libgit2 isn't an option, I would expect. That also would go against the "no linking-exception for git itself" position.

I agree that git itself will never be implemented on top of libgit2, but I don't think that the reasons are legal. I can't speak to Shawn's original motives, but I think that he did intend libgit2 to possibly be the core of git itself. Although that's not something that will happen, I don't think there was a goal to preclude it, so I'm not sure that I'm familiar with the position that you're referring to?

@ethomson Thanks a lot for the information. I jumped to too many conclusions. Sorry about that!

@ethomson Thanks a lot for the information. I jumped to too many conclusions. Sorry about that!

Not at all! I know it doesn't really matter either way, I think it's interesting. I'll let you get back to architecture; I'm even more interested to see what you're doing here. 😀

Thanks, I appreciate it. I was feeling pretty embarrassed about this.

I appreciate it... that was a little embarrassing too :stuck_out_tongue_winking_eye:

@vermiculus

If there is any amount of consensus on this approach, I can take point on bringing the issue up on the git mailing list to get their reactions.

Any progress on this front? I'd say that it's worthwhile to ask even if another approach is taken here.

@fice-t I hadn't seen other interest until your comment; I'll put it on my to-do list for today :-)

Looking at @vermiculus's table: Except for maybe some special setups, --show-toplevel and --show-cdup might as well be implemented in pure Lisp.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sje30 picture sje30  Â·  3Comments

mmcclimon picture mmcclimon  Â·  4Comments

ninrod picture ninrod  Â·  4Comments

aspiers picture aspiers  Â·  4Comments

HaraldKi picture HaraldKi  Â·  4Comments