I have a branch on a GitHub repo which doesn't have submodules, but master has. I want Binder to build my repo/branch, and I can see it's still checking out master's submodules.
I think the checking out of submodules should be done after the checking out of the branch.
The code first checks out the main branch of a repo, then switches to the specified ref (eg branch) and then runs a command to sync the submodules.
https://github.com/jupyter/repo2docker/blob/master/repo2docker/contentproviders/git.py#L18-L55
The first checkout uses --recursive maybe removing that would fix it? However I don't really use/understand submodules so I don't know if that would break things for repos that do have and want to use submodules. Do you know more?
The ideal solution would be to clone the branch and submodules directly with a single command git clone --branch BRANCH --recursive REPO.
@betatim Is the git checkout done in two steps for a reason?
I don't know off the top of my head why we do it in two steps.
Submodules and shallow clones are the two topics that come to my mind as problematic when we try and do clever stuff with checkouts. That is it.
Would git clone --branch WITH-A-REF-HERE --recursive REPO
work? Having a resolved ref instead of a branch name is almost always what happens when repo2docker is invoked.
I've just tested it.... you can clone a branch but not a specific commit.
However there's a -n option to git clone which means it won't automatically checkout HEAD which I think will do the trick:
$ git clone -n https://github.com/ome/omero-py.git xxx
$ cd xxx/
$ git branch
* master
$ ls -a
. .. .git
Then I think
git checkout COMMIT
git submodule init
git submodule update
will checkout the required submodules
Do you want to make a PR trying this out @davidbrochart ?
Most helpful comment
PR here: https://github.com/jupyter/repo2docker/pull/809.