Dvc: Bug with showing metrics when repo contains a submodule

Created on 17 Apr 2019  路  7Comments  路  Source: iterative/dvc

version: 0.35.5
platform: Ubuntu 18.10
installation method: pip

Description:

dvc metrics show -T
ERROR: unexpected error - Cannot retrieve the name of a submodule if it was not set initially

The complete deletion of a submodule helps, but recloning of repo required

bug p2-medium

All 7 comments

From discussion on slack: using the upcoming dvcignore feature by adding submodule dir to .dvcignore might help, but since re-cloning was required it seems quite unlikely :slightly_frowning_face:

The bug itself is from gitpython: https://github.com/gitpython-developers/GitPython/issues/597 . Workaround listed in the issue does not work.

I'm having the same issue, but I'm running dvc fetch -a -T.

My workaround was to fetch the data from a previous commit without the submodule.

I have the same problem. And here is some code to reproduce this error:

```

!/bin/bash

mkdir ~/test_repo
cd ~/test_repo

mkdir repo storage_dvc storage_git data_sshfs

cd repo

git init
dvc init

#

1. Step:

GIT with LOCAL (remote) server

#

git init --bare ~/test_repo/storage_git/myproject.git

git remote add origin ~/test_repo/storage_git/myproject.git

git config --global credential.helper 'cache --timeout 7200'

git add -A
git commit -m 'init project'

git push origin master

#

2. Step:

DVC with LOCAL (remote) server

#

dvc remote add -d local_dvccache ~/test_repo/storage_dvc
dvc push

#

3. Step:

Create SSHFS to an empty folder

#

mkdir data

maybe need to install: sudo apt-get install openssh-server

sshfs $(id -un)@$(hostname -f):${HOME}/test_repo/data_sshfs/ data/

#

4. Step:

Download the submodule

and creates the data in the sshfs-dir

the data can be that large that I do

not want them in my computer

or I don't want them multiple times

in different repos in my computer

#

git submodule add https://github.com/mastaer/create_mnist_data.git create_dataset

dvc repro -P

echo -e "datancreate_datasetndata/ncreate_dataset/n" >> .dvcignore

#

4. Step:

Push Everything

#

git add -A
git commit -m 'create the dataset'
git push --set-upstream origin master
dvc push

#

5. Step:

SETUP A PROJECT

#

echo -e "import numpy as npnimport jsonnntrain_data = np.load('data/mnist1.npz')nx = train_data['x_train']ny = train_data['y_train']nnx_pred = np.array(train_data['x_train'].mean(axis=(1,2)) / 15.0, dtype=int)nnacc = (y==x_pred).sum() / float(y.shape[0])nndata = {} ndata['acc'] = acc+np.random.random()/100.nnwith open('train_acc.json', 'w') as outfile:n json.dump(data, outfile)" >> train.py

echo -e "import numpy as npnimport jsonnntrain_data = np.load('data/mnist2.npz')nx = train_data['x_test']ny = train_data['y_test']nnx_pred = np.array(train_data['x_test'].mean(axis=(1,2)) / 15.0, dtype=int)nnacc = (y==x_pred).sum() / float(y.shape[0])nndata = {} ndata['acc'] = acc +np.random.random()/100.nnwith open('test_acc.json', 'w') as outfile:n json.dump(data, outfile)" >> test.py

dvc run -d data/mnist1.npz -m train_acc.json --no-exec python train.py
dvc run -d data/mnist2.npz -d train_acc.json -m test_acc.json --no-exec python test.py

#

6. Step:

TEST EXECUTEPY

#

git add -A
git commit -m 'build pipeline'
git push
git tag -a 001_firstexperiment -m 'First message!'
git tag -a 002_secondexperiment -m 'Second message!'
git push origin --tags

cd ..

git clone ~/test_repo/storage_git/myproject.git myproject1
cd myproject1

git checkout tags/001_firstexperiment -b result_001_firstexperiment
ln -s ../data_sshfs data
dvc repro -P
git add -A
git commit -m 'result_001_firstexperiment'
git push -u origin result_001_firstexperiment
dvc push

cd ..

git clone ~/test_repo/storage_git/myproject.git myproject2
cd myproject2

git checkout tags/002_secondexperiment -b result_002_secondexperiment
ln -s ../data_sshfs data
dvc repro -P
git add -A
git commit -m 'result_002_secondexperiment'
git push -u origin result_002_secondexperiment
dvc push

cd ..
cd repo
git pull
dvc pull

git branch -a
git checkout result_001_firstexperiment
dvc pull
git checkout result_002_secondexperiment
dvc pull

Here comes the error

dvc metrics show -a

#

7. Step:

Remove the dummy project

#

cd ~

fusermount -u -z ${HOME}/test_repo/repo/data

rm -rf ~/test_repo```

Note: Might be solved by migrating to dulwich.

If I update step 4, it fixed for me this error. (just remove the submodule before adding or pushing it to the remote branch.)

#########################################
# 4. Step:                              #
# Download the submodule                #
# and creates the data in the sshfs-dir #
# the data can be that large that I do  #
# not want them in my computer          #
# or I don't want them multiple times   #
# in different repos in my computer     #
#########################################
cp .git/config .git/config_tmp

git submodule add https://github.com/mastaer/create_mnist_data.git create_dataset

cp .git/config_tmp .git/config
rm .git/config_tmp


rm .gitmodules
git add .gitmodules
git rm --cached create_dataset
rm -rf .git/modules/create_dataset

rm create_dataset/.git


dvc repro -P

echo -e "data\ncreate_dataset\ndata/*\ncreate_dataset/*\n" >> .dvcignore

(But now I get an other error: ERROR: unexpected error - expected str, bytes or os.PathLike object, not NoneType, But this is not part of this issue.)

How can I test it with dulwich ?

How can I test it with dulwich ?

@mastaer You would have to go through dvc/scm/git and replace all gitpython invocations with dulwich. Just to be clear, it is not a trivial task.

(But now I get an other error: ERROR: unexpected error - expected str, bytes or os.PathLike object, not NoneType, But this is not part of this issue.)

could you please post the full log with -v?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mfrata picture mfrata  路  3Comments

shcheklein picture shcheklein  路  3Comments

anotherbugmaster picture anotherbugmaster  路  3Comments

robguinness picture robguinness  路  3Comments

shcheklein picture shcheklein  路  3Comments