Rust: Replace submodules with subtrees

Created on 1 Apr 2020  路  43Comments  路  Source: rust-lang/rust

cc @rust-lang/clippy @RalfJung

cc @rust-lang/compiler @rust-lang/infra we gate on clippy soon, so if you break clippy you need to fix it immediately in the same PR. This is trivial now though, since we are basically in a monorepo state, so you just edit stuff in src/tools/clippy/ until everything works again.

Synchronizations of the remote repo and the local subtree is explained in CONTRIBUTING.md

The first port (clippy) is being done in https://github.com/rust-lang/rust/pull/70655


A-meta C-enhancement E-easy T-compiler T-infra

Most helpful comment

I'm really glad this effort is underway. On Fuchsia we try to roll to a new tip-of-tree rust compiler every 3-6 weeks, which helps with catching bugs. If we could we'd roll more often. But the main thing slowing us down is the constant state of toolstate breakage.

Some of this (like rustfmt) we can work around in practice by distributing separate versions, but it's a pain to set up and maintain. Some tools (like clippy and rls) really have to match the version of the compiler.

This seems like it would make toolstate breakage a thing of the past, which is why I'm very supportive of this effort and would be glad to help however I can.

All 43 comments

Cc @rust-lang/miri because we have that now :D

Hmm, I was led to believe that git subrepo was nesting git repositories, in a way which git natively handles. But it turns out it's just a monorepo sync tool, same as manually making large commits. I don't think losing commit history is acceptable.

At a glance, git subtree looks like it might do the right thing, i.e. keep the repository nesting of a submodule, but I haven't tried it yet to see how it actually behaves in practice.
EDIT: ah, no, git subtree just seems to create synthetic commit histories. Still, could be better than a single sync mega-commit.
EDIT2: somehow it still managed to preserve commit history?! Compare:
https://github.com/rust-lang/rust-clippy/commits/master
https://github.com/rust-lang/rust/commits/c211cea3e99e04c2980a853b6637de22881b72eb

I moved to git subtree.

In case it gets lost on the PR, I said: https://github.com/rust-lang/rust/pull/70655#issuecomment-607240782

Whoa, clippy commit hashes were kept intact, this makes me really happy!11

I moved to git subtree.

Would be good to get the pro's and con's of that decision documented somewhere. git submodule advertises itself as a better alternative to git subtree; why is subtree still the better choice for us?

(Also I have no idea in which thread to post now, there's at least this issue and 2 PRs. It's very confusing.^^)

I don't know why git subrepo is better than git subtree other than what the documentation of git subrepo states at https://github.com/ingydotnet/git-subrepo#benefits

Fixes known rebase failures with git-subtree.

git subtree is better than git subrepo because it preserves all commits in history instead of squishing them in the name of the person who synchronizes it.

git subtree is upstream, which is nice, and it also supports preserving the commits better.
I believe the "rebase failures" that git subrepo talks about are actually an anti-pattern: rebasing a merge commit, it's not meant to work at all (the rebase will include both branches and lose the merge commit - this can already happens if you accidentally rebase the wrong thing in rust-lang/rust).

git subrepo seems to want to just throw all the changes into single large commits so you "don't have to deal with it", whereas git subtree is more history-preserving-oriented, which I prefer.

git subtree is better than git subrepo because it preserves all commits in history instead of squishing them in the name of the person who synchronizes it.

Is this true in both directions (subtree pull/push)?

Does git subtree also have the problem that a "subtree push" needs a commit on the rustc side?

no, git subtree push does not need a commit on the rustc side.

The only downside of git subtree is that you need to specify the folder name and target repo on every synchronization (because there's no synchronization metadata file)

I would prefer the history preserving option. I was under the impression that subrepo had a no squash mode but that could be subtree.

The only downside of git subtree is that you need to specify the folder name and target repo on every synchronization (because there's no synchronization metadata file)

This still needs to be reflected in https://github.com/rust-lang/rust/pull/70654 then.
I presume we could have scripts for that in the rustc repo?

This still needs to be reflected in #70654 then.

it already is:

You always need to specifiy the -P prefix to the subtree directory. If you specify the wrong directory
you'll get very fun merges that try to push the wrong directory to the remote repository. Luckily you
can just abort this without any consequences.

I was under the impression that subrepo had a no squash mode but that could be subtree.

subtree doesn't squash by default and has a sqash option.

@oli-obk IIUC, Clippy can then fix a nightly version and doesn't have to use the rustc master anymore (correct?). If that's the case, I'd go ahead and make these changes to the Clippy CI/infra, so it will be ready once the subtree PRs are ready to get merged.

it already is:

It doesn't say that you need to give the target repo (and the example commands do not mention a target repo, just a folder name).

Clippy can then fix a nightly version and doesn't have to use the rustc master anymore (correct?). If that's the case, I'd go ahead and make these changes to the Clippy CI/infra, so it will be ready once the subtree PRs are ready to get merged.

well, only if we make the synchronization at specific commits that work on the current nightly. Between two nightlies there can be many states where the clippy subtree will only compile with a master-rustc.

IMO we want to do this, since otherwise, we would have to do rustup commits in Clippy just as often as before. It will just be easier, since it will only be a git subtree push.

Well, we can do it at random points in time, but yes, we can target those points in time to the commit that caused the most recent nightly to be created.

Right, we can have a nightly clippy CI job that notices when clippy stops building with master, and sync then.

at that point we can just have a nightly job that synchronizes once a night automatically to ensure that even nonbreaking changes get synchronized as often as possible.

Clippy's CI could perform pull every time it tries to merge PR.
That would avoid most of conflicts (from maintainer PoV) when new lint using the "old code" is being added but Clippy in Rust repo was updated for the "new code" already.

Edit: There would be still Rustups necessary for lints added to Clippy shortly before breaking changes since Clippy in Rust repo won't have them yet.

@oli-obk @RalfJung What do you think of this logic? Just thought of it, not sure where to leave it.

git subtree pull is an injection, git subtree push is a projection, hence why pulling preserves commits verbatim (hash included) while pushing rewrites them to keep only the relevant parts.
They're both information preserving (wrt file diffs and commit messages) for the subtree directory, and only for that directory.

git subtree pull is an injection, git subtree push is a projection

I have no idea what this means in this context.^^
I mean I know waht both do as you explained it elsewhere, but this terminology here is not useful at all I feel.

I'm really glad this effort is underway. On Fuchsia we try to roll to a new tip-of-tree rust compiler every 3-6 weeks, which helps with catching bugs. If we could we'd roll more often. But the main thing slowing us down is the constant state of toolstate breakage.

Some of this (like rustfmt) we can work around in practice by distributing separate versions, but it's a pain to set up and maintain. Some tools (like clippy and rls) really have to match the version of the compiler.

This seems like it would make toolstate breakage a thing of the past, which is why I'm very supportive of this effort and would be glad to help however I can.

We've protyped this with clippy but are waiting for an upstream fix to git subtree before it becomes feasible for the rustc repo without a patched git

Yeah, I'm hoping for that fix to land before I start seriously pushing it for others. I wanna start having the conversations soon though

Hi! I contribute to Git in my spare-time and I'm really interested in improving the user experience around submodules. I would really be interested to know the reasons for why this change of approach is being made for your project :) After all, git subtree is in contrib/, so it's not an "official" Git command and it's not being actively developed...

I summarized the reasons in https://github.com/rust-lang/compiler-team/issues/266

TLDR: keeping submodules in sync is a lot of effort, we need to be able to easily make changes to the "sub-whatever" to keep it from breaking due to other changes in the main repo, but at the same time not have a pure monorepo, because the "sub-whatever" is also being developed out of tree.

@phil-blain The links Oli posted should give some context, but in short:

  • We have multiple subprojects that are logically separate projects, but have strong code dependencies with this repository. Mainly, they make use of internal, unstable APIs that are prone to a lot of change.

    • If these projects are vendored in tree, this largely just works out: When an internal API is changed, the person changing it will update all internal consumers, as usual. However, the subproject needs to do all development on this main repo, which can slow down stuff.

    • With a submodule, this is far more annoyign to do and would make it harder to iterate on internal APIs. Instead of making compiler developers need to make frequent coordinated submodule updates (which would slow down compiler development and push compiler developers into situations where they may prefer to not update internal APIs even though that's the technically best option), we would have CI catch submodule breakages and submodule maintainers would update things. This made life easier for compiler devs but submodule devs, who may not be 100% familiar with the changes to the internal API, would have to figure this out, and it would often be a lot of work.

    • With subtrees, we have the best of both worlds: Subproject development is done in a separate repo, but whenever someone breaks an internal API it is very easy to fix up the subprojects. We do biweekly syncs and it all works pretty great.

While I do think the UX around submodules could use a lot of love (and I'm glad someone is looking into it!), in this case I think it's more of them not being the right tool for the job.

TLDR: we could create a git-subtree command installable via cargo install rustc-git-subtree to make the use of git subtree on the rustc repo easy.

Some links to start out with:

Even if the git subtree patch is merged and distributed, it may take years to appear in everyone's distro.
@Mark-Simulacrum suggested that we could remove the blocker on the git subtree patch by distributing it ourselves. All we need is a version that works for us after all. So basically we create a new rust project in the rust-lang org and all it does when installed is invoke the scripts (that we'll shamelessly-but-with-the-right-license copy from the git-PR). This way anyone who wants do subtree synchronizations can "overwrite" their git subtree command, by installing ours via cargo install rustc-git-subtree. This will put the git-subtree binary into the PATH like it does with cargo-* subcommands. This will take precedence over the git-subtree of your OS, and thus just work out of the box.

This will put the git-subtree binary into the PATH like it does with cargo-* subcommands. This will take precedence over the git-subtree of your OS, and thus just work out of the box.

Careful though, people running git-subtree will get your version, but people using git subtree will get the original script from contrib/ in git.git, which gets installed to git --exec-path by most distros' git package, since they install subtree by default.

Probably we want to call it rustc-subtree then, or something like that. Doesn't seem like a huge concern -- people interacting with it are a relatively small set, hopefully.

This is a good idea, IMO. Some things the rustc-subtree command should take care of:

  • download and use the new fixed script
  • set the stack limit higher (on command line this is done with ulimit -s 60000) for large subrepos on first use.
  • nicer command line features, either configurable or hardcoded, so we don't have to specify the directory, remote, and branch with every use

It would be funny to put rustc-subtree in its own repo and include it here as a subtree repo. But sadly I can't think of a reason, why it has to be in the Rust repo 馃槃

I was assuming this would be another rustup component

Though I guess this doesn't actually need to be versioned with Rust so a cargo install makes more sense

I marked this as "looking for participation". Working on this git subcommand that can be installed via cargo install requires zero work on rustc. Basically what you need to do is to grab the latest version of the subcommand and copy the file with the appropriate license mentioned into a new cargo project. The main function can then include_str! the script and invoke sh to execute the text (maybe piped in via stdin?) by forwarding all command line args to the script. You'll need to add rust-lang-owner to the owners of your git repository and the crates.io owners of your crate.

Ideally you test the crate by talking with the clippy team on zulip and doing a clippy sync with the "new" tool. If all that works, we'll update the documentation on our git subtree usage to reflect the fact that ppl need to install the tool and invoke a slightly different subcommand name.

I'd like to work on this issue!

great! If you have any questions, feel free to ping me on zulip or just ask here. There's also a bot, so if you write @rustbot claim it should assign this issue to you.

@rustbot claim

I'm wondering if we need this to actually be on crates.io. We could vendor the script in the repository and call ./x.py subtree.

I like having it as an independent thing. Others may want to use it, too. It's not rustc specific

Here's the gitree crate which wraps the git subtree script. I have included the script form the git PR mentioned here. I am not well versed in handling git, so can anyone test it or give me the instructions for doing it? Here's the Zulip topic about this crate.

The crate metadata doesn't refer to the repo, so posting it here for others trying to find it: https://github.com/mominul/gitree

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nikomatsakis picture nikomatsakis  路  210Comments

nikomatsakis picture nikomatsakis  路  274Comments

cramertj picture cramertj  路  512Comments

nikomatsakis picture nikomatsakis  路  236Comments

nikomatsakis picture nikomatsakis  路  268Comments