version: 0.35.5
platform: Ubuntu 18.10
installation method: pip
Description:
dvc metrics show -T
ERROR: unexpected error - Cannot retrieve the name of a submodule if it was not set initially
The complete deletion of a submodule helps, but recloning of repo required
From discussion on slack: using the upcoming dvcignore feature by adding submodule dir to .dvcignore might help, but since re-cloning was required it seems quite unlikely :slightly_frowning_face:
The bug itself is from gitpython: https://github.com/gitpython-developers/GitPython/issues/597 . Workaround listed in the issue does not work.
I'm having the same issue, but I'm running dvc fetch -a -T.
My workaround was to fetch the data from a previous commit without the submodule.
I have the same problem. And here is some code to reproduce this error:
```
mkdir ~/test_repo
cd ~/test_repo
mkdir repo storage_dvc storage_git data_sshfs
cd repo
git init
dvc init
git init --bare ~/test_repo/storage_git/myproject.git
git remote add origin ~/test_repo/storage_git/myproject.git
git config --global credential.helper 'cache --timeout 7200'
git add -A
git commit -m 'init project'
git push origin master
dvc remote add -d local_dvccache ~/test_repo/storage_dvc
dvc push
mkdir data
sshfs $(id -un)@$(hostname -f):${HOME}/test_repo/data_sshfs/ data/
git submodule add https://github.com/mastaer/create_mnist_data.git create_dataset
dvc repro -P
echo -e "datancreate_datasetndata/ncreate_dataset/n" >> .dvcignore
git add -A
git commit -m 'create the dataset'
git push --set-upstream origin master
dvc push
echo -e "import numpy as npnimport jsonnntrain_data = np.load('data/mnist1.npz')nx = train_data['x_train']ny = train_data['y_train']nnx_pred = np.array(train_data['x_train'].mean(axis=(1,2)) / 15.0, dtype=int)nnacc = (y==x_pred).sum() / float(y.shape[0])nndata = {} ndata['acc'] = acc+np.random.random()/100.nnwith open('train_acc.json', 'w') as outfile:n json.dump(data, outfile)" >> train.py
echo -e "import numpy as npnimport jsonnntrain_data = np.load('data/mnist2.npz')nx = train_data['x_test']ny = train_data['y_test']nnx_pred = np.array(train_data['x_test'].mean(axis=(1,2)) / 15.0, dtype=int)nnacc = (y==x_pred).sum() / float(y.shape[0])nndata = {} ndata['acc'] = acc +np.random.random()/100.nnwith open('test_acc.json', 'w') as outfile:n json.dump(data, outfile)" >> test.py
dvc run -d data/mnist1.npz -m train_acc.json --no-exec python train.py
dvc run -d data/mnist2.npz -d train_acc.json -m test_acc.json --no-exec python test.py
git add -A
git commit -m 'build pipeline'
git push
git tag -a 001_firstexperiment -m 'First message!'
git tag -a 002_secondexperiment -m 'Second message!'
git push origin --tags
cd ..
git clone ~/test_repo/storage_git/myproject.git myproject1
cd myproject1
git checkout tags/001_firstexperiment -b result_001_firstexperiment
ln -s ../data_sshfs data
dvc repro -P
git add -A
git commit -m 'result_001_firstexperiment'
git push -u origin result_001_firstexperiment
dvc push
cd ..
git clone ~/test_repo/storage_git/myproject.git myproject2
cd myproject2
git checkout tags/002_secondexperiment -b result_002_secondexperiment
ln -s ../data_sshfs data
dvc repro -P
git add -A
git commit -m 'result_002_secondexperiment'
git push -u origin result_002_secondexperiment
dvc push
cd ..
cd repo
git pull
dvc pull
git branch -a
git checkout result_001_firstexperiment
dvc pull
git checkout result_002_secondexperiment
dvc pull
dvc metrics show -a
Note: Might be solved by migrating to dulwich.
If I update step 4, it fixed for me this error. (just remove the submodule before adding or pushing it to the remote branch.)
#########################################
# 4. Step: #
# Download the submodule #
# and creates the data in the sshfs-dir #
# the data can be that large that I do #
# not want them in my computer #
# or I don't want them multiple times #
# in different repos in my computer #
#########################################
cp .git/config .git/config_tmp
git submodule add https://github.com/mastaer/create_mnist_data.git create_dataset
cp .git/config_tmp .git/config
rm .git/config_tmp
rm .gitmodules
git add .gitmodules
git rm --cached create_dataset
rm -rf .git/modules/create_dataset
rm create_dataset/.git
dvc repro -P
echo -e "data\ncreate_dataset\ndata/*\ncreate_dataset/*\n" >> .dvcignore
(But now I get an other error: ERROR: unexpected error - expected str, bytes or os.PathLike object, not NoneType, But this is not part of this issue.)
How can I test it with dulwich ?
How can I test it with dulwich ?
@mastaer You would have to go through dvc/scm/git and replace all gitpython invocations with dulwich. Just to be clear, it is not a trivial task.
(But now I get an other error: ERROR: unexpected error - expected str, bytes or os.PathLike object, not NoneType, But this is not part of this issue.)
could you please post the full log with -v?