Please provide information about your setup
DVC version: 0.71.0
Platform: Ubuntu 18.04
Method of installation: pip
I've used dvc for managinf model files. I have not updated dvc for a while and after doing so going from 0.59.2 to 0.71.0 I could not add any new files to the existing dvc cache.
Here is the verbose output of the dvc add command
dvc add Empty\ Document -v
/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/git/repo/base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
warnings.warn("The use of environment variables in paths is deprecated" +
ERROR: unexpected error - unable to open database file
------------------------------------------------------------
Traceback (most recent call last):
File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/main.py", line 49, in main
ret = cmd.run()
File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/command/add.py", line 25, in run
fname=self.args.file,
File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/repo/__init__.py", line 35, in wrapper
with repo.lock, repo.state:
File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/state.py", line 136, in __enter__
self.load()
File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/state.py", line 228, in load
self.database = _connect_sqlite(self.state_file, {"nolock": 1})
File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/state.py", line 495, in _connect_sqlite
return sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file
Downgrading dvc version to 0.59.2 makes the problem dissapear again.
It feels like some backwards incopatibility issue?
Hi @sremm !
Indeed, looks like state backward compatibility issue 馃檨 , probably caused by some locking adjustments we've done. The only solution for you right now is to install the newest version and delete .dvc/state
, after that everything should start working. You will, however, lose all the cached hashes that are in that state db, but that will be recomputed once when you next use some particular file, so shouldn't be a big issue, even though inconvenient. Would that work for you?
I couldn't reproduce it @sremm, @efiop :disappointed:
pip install dvc==0.59.2
dvc init --no-scm
dvc run -o foo "echo foo > foo"
pip uninstall dvc && pip install dvc
rm foo
dvc checkout
dvc run -o bar "echo bar > bar"
Maybe your state file got corrupted somehow.
How did you perform the update?
Good catch @mroutis !
@sremm Btw, could you please show dvc version
output on both dvc versions? It contains some additional info, which might help us better understand what happened.
We have this new piece of code in https://github.com/iterative/dvc/blob/0.71.0/dvc/state.py#L491 , where we try to connect by the uri, if we can. Might be that we didn't quite get the sqlite requirements right. In which case, we should be able to recover from this issue without losing the state db.
Also, python -c 'import sqlite3; print(str(sqlite3.sqlite_version_info))'
would be helpful as well :)
@mroutis I used pip install dvc --upgrade
doing pip uninstall dvc && pip install dvc
like in your example did not change the behaviour
@efiop I tried updating dvc and then removing .dvc/state
but still experience the same behaviour.
Here are the outputs:
For the working version
dvc version
/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/git/repo/base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
warnings.warn("The use of environment variables in paths is deprecated" +
DVC version: 0.59.2
Python version: 3.7.4
Platform: Linux-4.15.0-70-generic-x86_64-with-debian-buster-sid
Binary: False
Cache: reflink - False, hardlink - False, symlink - True
Filesystem type (cache directory): ('fuseblk', '/dev/sda2')
Filesystem type (workspace): ('ext4', '/dev/sdb2')
and for the latest version
dvc version
/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/git/repo/base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
warnings.warn("The use of environment variables in paths is deprecated" +
DVC version: 0.71.0
Python version: 3.7.4
Platform: Linux-4.15.0-70-generic-x86_64-with-debian-buster-sid
Binary: False
Package: pip
Cache: reflink - False, hardlink - False, symlink - True
Filesystem type (cache directory): ('fuseblk', '/dev/sda2')
Filesystem type (workspace): ('ext4', '/dev/sdb2')
Output of python -c 'import sqlite3; print(str(sqlite3.sqlite_version_info))'
(3, 29, 0)
Perhaps there some other small example I could also try to reproduce?
@sremm You've removed .dvc/state _after_ upgrading, right?
Double checked and yes, upgraded -> removed .dvc/state -> try to add a simple file
same behaviour.
Since I also use a windows machine with the same repository, I thought I'd try to upgrade there as well.
DVC version: 0.71.0
Python version: 3.7.3
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: pip
Cache: reflink - False, hardlink - True, symlink - False
Filesystem type (cache directory): ('NTFS', 'F:\')
Filesystem type (workspace): ('NTFS', 'F:\')
first I upgraded (had a little older version than 0.59.2, but everything still worked with 0.59.2)
then tried to add a small text file and got a different error
dvc add test.txt -v
c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\git\repo\base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
warnings.warn("The use of environment variables in paths is deprecated" +
DEBUG: Trying to spawn '['c:\\users\\srg\\appdata\\local\\continuum\\anaconda3\\envs\\tf-gpu\\python.exe', 'C:\\Users\\srg\\AppData\\Local\\Continuum\\anaconda3\\envs\\tf-gpu\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\srg\\appdata\\local\\continuum\\anaconda3\\envs\\tf-gpu\\python.exe', 'C:\\Users\\srg\\AppData\\Local\\Continuum\\anaconda3\\envs\\tf-gpu\\Scripts\\dvc', 'daemon', '-q', 'updater']'
ERROR: unexpected error - [WinError 123] The filename, directory name, or volume label syntax is incorrect: ' 16232\n'
------------------------------------------------------------
Traceback (most recent call last):
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\main.py", line 49, in main
ret = cmd.run()
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\command\add.py", line 25, in run
fname=self.args.file,
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\repo\__init__.py", line 35, in wrapper
with repo.lock, repo.state:
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\flufl\lock\_lockfile.py", line 334, in __enter__
self.lock()
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\lock.py", line 78, in lock
super(Lock, self).lock(timedelta(seconds=DEFAULT_TIMEOUT))
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\flufl\lock\_lockfile.py", line 267, in lock
self._break()
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\flufl\lock\_lockfile.py", line 527, in _break
os.unlink(winner)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: ' 16232\n'
------------------------------------------------------------
But after removing the .dvcstate I got the same error as on Ubuntu
dvc add test.txt -v
c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\git\repo\base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
warnings.warn("The use of environment variables in paths is deprecated" +
DEBUG: Trying to spawn '['c:\\users\\srg\\appdata\\local\\continuum\\anaconda3\\envs\\tf-gpu\\python.exe', 'C:\\Users\\srg\\AppData\\Local\\Continuum\\anaconda3\\envs\\tf-gpu\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\srg\\appdata\\local\\continuum\\anaconda3\\envs\\tf-gpu\\python.exe', 'C:\\Users\\srg\\AppData\\Local\\Continuum\\anaconda3\\envs\\tf-gpu\\Scripts\\dvc', 'daemon', '-q', 'updater']'
ERROR: unexpected error - unable to open database file
------------------------------------------------------------
Traceback (most recent call last):
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\main.py", line 49, in main
ret = cmd.run()
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\command\add.py", line 25, in run
fname=self.args.file,
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\repo\__init__.py", line 35, in wrapper
with repo.lock, repo.state:
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\state.py", line 136, in __enter__
self.load()
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\state.py", line 228, in load
self.database = _connect_sqlite(self.state_file, {"nolock": 1})
File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\state.py", line 495, in _connect_sqlite
return sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file
------------------------------------------------------------
Rolling back to
DVC version: 0.59.2
Python version: 3.7.3
Platform: Windows-10-10.0.16299-SP0
Binary: False
Cache: reflink - False, hardlink - True, symlink - False
Filesystem type (cache directory): ('NTFS', 'F:\\')
Filesystem type (workspace): ('NTFS', 'F:\\')
And I can add the file without issues
@sremm Oh, that is very interesting. I suspect problems in our by-url sqlite connection. Let's try to narrow it down. Please install the latest dvc and run this script in the root of your repo:
import sqlite3
from dvc.state import _build_sqlite_uri
uri = _build_sqlite_uri(os.path.join(os.getcwd(), ".dvc", "state"), {"nolock": 1})
print(uri)
sqlite3.connect(uri, uri=True)
CC @Suor , maybe you can see better where uri might've broken 馃檪
Alrighty, here are the results, it does result in the same error.
test_for_dvc_bug.py
import sqlite3
import os
from dvc.state import _build_sqlite_uri
uri = _build_sqlite_uri(os.path.join(os.getcwd(), ".dvc", "state"), {"nolock": 1})
print(uri)
sqlite3.connect(uri, uri=True)
updated to latest dvc, removed .dvcstate again and ran the above script
this is the output
file:///F:/repo_base/.dvc/state?nolock=1
Traceback (most recent call last):
File "test_for_dvc_bug.py", line 7, in <module>
sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file
The above is from running on windows, but it was the same error on Ubuntu.
@sremm Would be great to see output on ubuntu too, since we are more *nix guys than windows, so it is easier for us to grasp posix paths 馃檪 But so far uri looks alright.
@sremm And how about:
import sqlite3
import os
sqlite3.connect(os.path.join(os.getcwd(), ".dvc", "state"))
? Just a sanity check to make sure that this is the uri that is broken.
@efiop
So :) the prevous output from Ubuntu
Note that in my previous comment I also had to change the actual repository name, I realised that special characters might also have an effect and that there is the "%20" in the folder name as seen below, everything else is just letters. (same name on widows for the actualy repository folder)
python test_for_dvc_bug.py
file:///home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1
Traceback (most recent call last):
File "test_for_dvc_bug.py", line 7, in <module>
sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file
and the new test test_for_dvc_bug_2.py
import sqlite3
import os
sqlite3.connect(os.path.join(os.getcwd(), ".dvc", "state"))
So this runs without errors. I guess this is what you expected since the "_build_sqlite_uri" function is under suspect. The "%20" in the name does not seem to be an issue for the sqlite3.connect() call though.
@sremm Thanks! Interestingly enough, I can reproduce if I feed a relative path to the _build_uri. How about this piece of code on Ubuntu:
import sqlite3
uri = "file:/home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)
I suppose that /home/mlburk/Repositories/Repo%20Base/.dvc/state
is a valid one for your machine, right? If so, please run the code above and show the output. I suspect that it might be "//" situation that is at fault.
@efiop
test_for_dvc_bug_3.py contents
import sqlite3
uri = "file:/home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)
Output
python test_for_dvc_bug_3.py
Traceback (most recent call last):
File "test_for_dvc_bug_3.py", line 4, in <module>
sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file
And yes with uri = "/home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1"
, there is no error, so without "file:" otherwise the same.
Also tried with uri = = "///home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1"
no error, empty output.
@sremm Without file:
it most likely creates state?nolock=1
file instead of state
. And probably created Repo%20Base
directory, though I'm not sure about this one. Pretty weird stuff. Let's try this one too:
import sqlite3
uri = "file:.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)
Is there anything special about your python or your setup in general?
The "%20" in the name does not seem to be an issue for the sqlite3.connect() call though.
Btw, how did you test that? I suppose you have a directory Repo Base
with the space in the name, right?
@efiop with the code contents as the below snipped, there is no error and a state file is created
import sqlite3
uri = "file:.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)
Hmm, the folder itself has the "%20" in the name, the repository in our internal server has a space though, but after forking with git it sets the "%20" to the actual folder name instead of the space.
And I do suspect that there is something to do with the "%20" since i tried renaming the folder to have an actualy space instead of "%20" and the issue dissapears. I can add files with the latest dvc version.
Here are some more cases where I get the error to help debug.
Tried to run with uri = "file:/home/mlburk/TestFolder/Base%20Repo/.dvc/state?nolock=1"
where the "TestFolder" does not exist and get the same old error.
Also tired with uri = "file:/home/mlburk/Repositories/NewFolder/.dvc/state?nolock=1"
where NewFolder exists, but ".dvc" folder does not.
So when some folders are missing sqlite3.OperationalError: unable to open database file
seems to pop up as well.
I also tested creating a new folder with path /home/mlburk/Repositories/Test%20Folder/.dvc
(no space the actual "%20" in the name) manually and tried running the code below, which also gave the error.
import sqlite3
uri = "file:/home/mlburk/Repositories/Test%20Folder/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)
Hmm, the folder itself has the "%20" in the name
Ah, ok, now it all makes sense! The uri should be encoded, and %20
means empty space, so sqlite tries to open "Repo Base" instead of "Repo%20Base", since it decodes it. So our uri builder should escape that properly. So most likely we need to
import urllib.parse
filename = urllib.parse.quote_plus(filename)
in https://github.com/iterative/dvc/blob/master/dvc/state.py#L499 . Are you familiar with modifying our project and installing the dev version? If you are, could you do those modifications and give it a try? E.g.
git clone https://github.com/iterative/dvc
cd dvc
# modify dvc/state.py
pip install .
@sremm To make sure that my conclusions are correct, could you try running:
import sqlite3
uri = "file:///home/mlburk/Repositories/Repo%2520Base/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)
Notice that "Repo%20Base" got encoded into "Repo%2520Base"
Running the snipped below
import sqlite3
uri = "file:///home/mlburk/Repositories/Repo%2520Base/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)
results in no errors and the state file is created in /home/mlburk/Repositories/Repo%20Base/.dvc/
@sremm Great news! So we've found the issue. Patch is coming soon, stay tuned! 馃檪 Thanks for the feedback!
@sremm FYI: fix was merged. Releasing a new version right now 馃檪 If you want to try this before the official release, feel free to install upstream version through pip install git+https://github.com/iterative/dvc
. Thanks a lot for the feedback! 馃檪