$ dvc get https://github.com/iterative/dvc scripts
ERROR: unexpected error - Could not accommodate requested object type 'tree', got commit
And with -v it hangs on computing hashes :scream:
This was happening for a long time in my machine, I thought something was wrong on my machine (and, it was an intermittent issue).
I get different sets of error messages, some of them are:
1.
SHA b'parent' could not be resolved, git returned: b'parent d3145f92171b593748cc36452455dd099c49e239'
ValueError: Could not accommodate requested object type 'tree', got commit
IndexError: index out of range
ValueError: Failed to parse header: b'40000 .dvc\x00t\xa8Y3vH\xb7\x80\xb5\x97b\x02Y\xdb\\\x8d\xae<+\x1f100644 .gitignore\x00u^R\n'
Looks like an issue with GitPython, at least I found one similar issue: https://github.com/gitpython-developers/GitPython/issues/1016 and gitpython-developers/GitPython#584
Am able to reproduce if I change the directories around in the command. Get different errors though. Very interesting
Ok, I think I found it. There is a bug in gitpython where cat-file --batch returns output in unexpected format. Looking into it...
@efiop, I tried running this on 0.94, and it did not fail once. I was able to bisect the problem to 9ead641c. At the moment,save_info is passed a tree, which is a RepoTree instance since then. This felt wrong to me, as it was not the case before that particular commit but I am too tired to debug right now.
Maybe, this is the reason why it has started hitting GitTree (which should not have? :confused:)?
We are computing the checksum in a thread pool, while stream_object_data is not thread safe. Looks like that is the cause.
@skshetry Yeah, that's the one for sure. Nothing inherently wrong with it, we just forgot that some gitpython methods are not thread safe. Looking into solving it somehow... Thanks for bisecting! Have a good rest :slightly_smiling_face:
@efiop, okay, but shouldn't it read through the checked-out directory rather than git objects?
@skshetry No, we are trying to go away from checking out git repos and doing everything in-memory instead. So we are on the right path.