can we add more git hooks for dvc pull/push/status when git pull/push/status are run? ideally, most git commands would have a hook. this will make it much less likely for someone to forget to dvc push
after running a pipeline
A new reason to prioritize this https://discordapp.com/channels/485586884165107732/485586884165107734/561380967311212554
Another question:
is there anyway to have git hooks automatically install on pull or clone?
just thinking of first time developer experience, right now its quite a few steps
Good point @shcheklein ! Added p1 priority, we'll try to take a look at it next week.
After some quick research I have following conclusions:
git pull
- we can use post-merge
hook to do dvc pull
git push
- I would like to discuss this one. It seems to me that we want to execute dvc push
after successfull git push
. However, all "post push hooks" (pre-receive
, update
, post-receive
) are executed on push-side repository. We could use pre-push
but with that we have to consider that dvc might take some time to push everything to cache, and then, in the end, git fails for some reason. It would be some inconvenience for the user. Though it does not seem as some repository breaking problem, as simple git reset
and dvc pull
should bring repository to original state.git status
- it seems that we will be unable to implement this one, none of hooks is triggered upon statusgit clone
- we can use post-checkout
hook to trigger dvc checkout
git checkout
- i would consider creating issue for that one, because we can checkout whole branch or file, and in case of file we would need some logic to handle only checked out file, if git checks out stage file@pared
- git pull - we can use post-merge hook to do dvc pull
But not every pull
involves merge. It might do rebase instead.
- git push - I would like to discuss this one. It seems to me that we want to execute dvc push after successfull git push. However, all "post push hooks" (pre-receive, update, post-receive) are executed on push-side repository. We could use pre-push but with that we have to consider that dvc might take some time to push everything to cache, and then, in the end, git fails for some reason. It would be some inconvenience for the user. Though it does not seem as some repository breaking problem, as simple git reset and dvc pull should bring repository to original state.
pre-push
should be good enough :+1:
- git clone - we can use post-checkout hook to trigger dvc checkout
- git checkout - i would consider creating issue for that one, because we can checkout whole branch or file, and in case of file we would need some logic to handle only checked out file, if git checks out stage file
We already have post-checkot hook that simply runs dvc checkout
(without arguments). Would it be possible to make it handle specific stage files as well?
@efiop
But not every pull involves merge. It might do rebase instead.
Well we can use post-rewrite
, according to the documentation it is triggered after amend
or rebase
, but git is supposed to pass argument indicating whether it is amend
or rebase
, so we could dvc pull
upon rebase
only.
Would it be possible to make it handle specific stage files as well?
According to documentation, git is supposed to pass flag whether it was branch or file, but it does not explicitly point which file. So i have been too fast with saying that it is possible. To use that information we would need mechanism to find checked out files. I think it is better to keep dvc checkout
as it is.
@pared
Well we can use post-rewrite, according to the documentation it is triggered after amend or rebase, but git is supposed to pass argument indicating whether it is
amend
orrebase
, so we could dvc pull upon rebase only.
But that one is going to be called not only on git pull
but also on regular amend
and rebase
. Not sure if it is something we actually want. What do you think?
According to documentation, git is supposed to pass flag whether it was branch or file, but it does not explicitly point which file. So i have been too fast with saying that it is possible. To use that information we would need mechanism to find checked out files. I think it is better to keep dvc checkout as it is.
Nice! Would it make sense to make our checkout hook _not_ perform dvc checkout
if it sees that it is a file that you are checking out? I imagine you git checkout
-ing specific file and then our dvc checkout
(without args) gets triggered and even though it has confirmation prompts, you might accidentally checkout some file that you didn't want.
So for now, only git push
hook is the most straightforward that we could easily add, right?
Looks like even though some git hooks do suffice for some of our purposes, the support would still be far from perfect. Maybe having simple aliases like git push
-> git push + dvc push
would feel better? I don't quite like aliases on dvc's side(e.g. dvc push
, or some other command, that would call git push
) because they would bring a lot of hustle caused by passing through git options. On the other hand, things like aliases and/or git plugins would have to be handled and installed explicitly by a user, so it would add even more complication. I've been opposing this a lot since the early days, but considering everything (including requests for --autogitadd/commit
feature), maybe dvc indeed should handle those. E.g.
dvc config core.autoscm true
-> would make dvc push
automatically git push
, dvc status
automatically git status
and so on? What do you guys think?
But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?
I must agree thats a bit of a bummer. We could solve amend vs rebase problem by reading flag passed by git, but problem "Was it pull or rebase" will remain unsolved. Both hooks (post-rewrite
, post-merge
) contain functionality that we need, but those details make them hard to use in our case.
Nice! Would it make sense to make our checkout hook not perform dvc checkout if it sees that it is a file that you are checking out?
I think we could do that, but then there is one problem left: what if checked out file was stage? In such case we should probably validate our stages against their outputs checksums.
So for now, only git push hook is the most straightforward that we could easily add, right?
I agree.
dvc config core.autoscm true -> would make dvc push automatically git push, dvc status automatically git status and so on?
I think that this idea is the best in terms of versatility. We tried the other way around (git hooks) and we already have problems with covering our commands with avialable hooks. I think that handling and debugginig this on dvc side with autoscm
flag will be easier than handling issues about different OS-s that will come after introducing aliases.
For the record: agreed privately that we would implement git push
hook for now to provide at least some quick aid, but then we would think about autoscm
and stuff like that.
@AlJohri pre-push
hook was released as a part of 0.40.1. Please feel free to give it a try 馃檪
I'm still interested in adding a git pull
hook.
But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?
I must agree thats a bit of a bummer. We could solve amend vs rebase problem by reading flag passed by git, but problem "Was it pull or rebase" will remain unsolved. Both hooks (post-rewrite, post-merge) contain functionality that we need, but those details make them hard to use in our case.
It seems like we can at least just implement it for the post-merge
hook? For post-rewrite
, I understand we can't easily identify between actual rebase vs pull --rebase
but this only affects users who are running git pull --rebase
, right?
I think the majority of users just use git pull
without --rebase
so can we can have it automatically dvc pull
for at least these users?
Most helpful comment
I'm still interested in adding a
git pull
hook.It seems like we can at least just implement it for the
post-merge
hook? Forpost-rewrite
, I understand we can't easily identify between actual rebase vspull --rebase
but this only affects users who are runninggit pull --rebase
, right?I think the majority of users just use
git pull
without--rebase
so can we can have it automaticallydvc pull
for at least these users?