Dvc: "unexpected error - unable to open database" file after updating from 0.59.2 to 0.71.0

Created on 3 Dec 2019  路  21Comments  路  Source: iterative/dvc

Please provide information about your setup
DVC version: 0.71.0
Platform: Ubuntu 18.04
Method of installation: pip

I've used dvc for managinf model files. I have not updated dvc for a while and after doing so going from 0.59.2 to 0.71.0 I could not add any new files to the existing dvc cache.

Here is the verbose output of the dvc add command

dvc add Empty\ Document -v
/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/git/repo/base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
  warnings.warn("The use of environment variables in paths is deprecated" +
ERROR: unexpected error - unable to open database file
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/main.py", line 49, in main
    ret = cmd.run()
  File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/command/add.py", line 25, in run
    fname=self.args.file,
  File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/repo/__init__.py", line 35, in wrapper
    with repo.lock, repo.state:
  File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/state.py", line 136, in __enter__
    self.load()
  File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/state.py", line 228, in load
    self.database = _connect_sqlite(self.state_file, {"nolock": 1})
  File "/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/dvc/state.py", line 495, in _connect_sqlite
    return sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file

Downgrading dvc version to 0.59.2 makes the problem dissapear again.

It feels like some backwards incopatibility issue?

bug c8-full-day p0-critical

All 21 comments

Hi @sremm !

Indeed, looks like state backward compatibility issue 馃檨 , probably caused by some locking adjustments we've done. The only solution for you right now is to install the newest version and delete .dvc/state , after that everything should start working. You will, however, lose all the cached hashes that are in that state db, but that will be recomputed once when you next use some particular file, so shouldn't be a big issue, even though inconvenient. Would that work for you?

I couldn't reproduce it @sremm, @efiop :disappointed:

pip install dvc==0.59.2
dvc init --no-scm
dvc run -o foo "echo foo > foo"
pip uninstall dvc && pip install dvc
rm foo
dvc checkout
dvc run -o bar "echo bar > bar"

Maybe your state file got corrupted somehow.

How did you perform the update?

Good catch @mroutis !

@sremm Btw, could you please show dvc version output on both dvc versions? It contains some additional info, which might help us better understand what happened.

We have this new piece of code in https://github.com/iterative/dvc/blob/0.71.0/dvc/state.py#L491 , where we try to connect by the uri, if we can. Might be that we didn't quite get the sqlite requirements right. In which case, we should be able to recover from this issue without losing the state db.

Also, python -c 'import sqlite3; print(str(sqlite3.sqlite_version_info))' would be helpful as well :)

@mroutis I used pip install dvc --upgrade doing pip uninstall dvc && pip install dvc like in your example did not change the behaviour

@efiop I tried updating dvc and then removing .dvc/state but still experience the same behaviour.

Here are the outputs:
For the working version

dvc version
/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/git/repo/base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
  warnings.warn("The use of environment variables in paths is deprecated" +
DVC version: 0.59.2
Python version: 3.7.4
Platform: Linux-4.15.0-70-generic-x86_64-with-debian-buster-sid
Binary: False
Cache: reflink - False, hardlink - False, symlink - True
Filesystem type (cache directory): ('fuseblk', '/dev/sda2')
Filesystem type (workspace): ('ext4', '/dev/sdb2')

and for the latest version

dvc version
/home/mlburk/anaconda3/envs/tf-gpu/lib/python3.7/site-packages/git/repo/base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
  warnings.warn("The use of environment variables in paths is deprecated" +
DVC version: 0.71.0
Python version: 3.7.4
Platform: Linux-4.15.0-70-generic-x86_64-with-debian-buster-sid
Binary: False
Package: pip
Cache: reflink - False, hardlink - False, symlink - True
Filesystem type (cache directory): ('fuseblk', '/dev/sda2')
Filesystem type (workspace): ('ext4', '/dev/sdb2')

Output of python -c 'import sqlite3; print(str(sqlite3.sqlite_version_info))'
(3, 29, 0)

Perhaps there some other small example I could also try to reproduce?

@sremm You've removed .dvc/state _after_ upgrading, right?

Double checked and yes, upgraded -> removed .dvc/state -> try to add a simple file

same behaviour.

Since I also use a windows machine with the same repository, I thought I'd try to upgrade there as well.

DVC version: 0.71.0
Python version: 3.7.3
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: pip
Cache: reflink - False, hardlink - True, symlink - False
Filesystem type (cache directory): ('NTFS', 'F:\')
Filesystem type (workspace): ('NTFS', 'F:\')

first I upgraded (had a little older version than 0.59.2, but everything still worked with 0.59.2)
then tried to add a small text file and got a different error

dvc add test.txt -v
c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\git\repo\base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
  warnings.warn("The use of environment variables in paths is deprecated" +
DEBUG: Trying to spawn '['c:\\users\\srg\\appdata\\local\\continuum\\anaconda3\\envs\\tf-gpu\\python.exe', 'C:\\Users\\srg\\AppData\\Local\\Continuum\\anaconda3\\envs\\tf-gpu\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\srg\\appdata\\local\\continuum\\anaconda3\\envs\\tf-gpu\\python.exe', 'C:\\Users\\srg\\AppData\\Local\\Continuum\\anaconda3\\envs\\tf-gpu\\Scripts\\dvc', 'daemon', '-q', 'updater']'
ERROR: unexpected error - [WinError 123] The filename, directory name, or volume label syntax is incorrect: ' 16232\n'
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\main.py", line 49, in main
    ret = cmd.run()
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\command\add.py", line 25, in run
    fname=self.args.file,
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\repo\__init__.py", line 35, in wrapper
    with repo.lock, repo.state:
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\flufl\lock\_lockfile.py", line 334, in __enter__
    self.lock()
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\lock.py", line 78, in lock
    super(Lock, self).lock(timedelta(seconds=DEFAULT_TIMEOUT))
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\flufl\lock\_lockfile.py", line 267, in lock
    self._break()
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\flufl\lock\_lockfile.py", line 527, in _break
    os.unlink(winner)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: ' 16232\n'
------------------------------------------------------------

But after removing the .dvcstate I got the same error as on Ubuntu

dvc add test.txt -v
c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\git\repo\base.py:129: UserWarning: The use of environment variables in paths is deprecated
for security reasons and may be removed in the future!!
  warnings.warn("The use of environment variables in paths is deprecated" +
DEBUG: Trying to spawn '['c:\\users\\srg\\appdata\\local\\continuum\\anaconda3\\envs\\tf-gpu\\python.exe', 'C:\\Users\\srg\\AppData\\Local\\Continuum\\anaconda3\\envs\\tf-gpu\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\srg\\appdata\\local\\continuum\\anaconda3\\envs\\tf-gpu\\python.exe', 'C:\\Users\\srg\\AppData\\Local\\Continuum\\anaconda3\\envs\\tf-gpu\\Scripts\\dvc', 'daemon', '-q', 'updater']'
ERROR: unexpected error - unable to open database file
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\main.py", line 49, in main
    ret = cmd.run()
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\command\add.py", line 25, in run
    fname=self.args.file,
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\repo\__init__.py", line 35, in wrapper
    with repo.lock, repo.state:
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\state.py", line 136, in __enter__
    self.load()
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\state.py", line 228, in load
    self.database = _connect_sqlite(self.state_file, {"nolock": 1})
  File "c:\users\srg\appdata\local\continuum\anaconda3\envs\tf-gpu\lib\site-packages\dvc\state.py", line 495, in _connect_sqlite
    return sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file
------------------------------------------------------------

Rolling back to

DVC version: 0.59.2
Python version: 3.7.3
Platform: Windows-10-10.0.16299-SP0
Binary: False
Cache: reflink - False, hardlink - True, symlink - False
Filesystem type (cache directory): ('NTFS', 'F:\\')
Filesystem type (workspace): ('NTFS', 'F:\\')

And I can add the file without issues

@sremm Oh, that is very interesting. I suspect problems in our by-url sqlite connection. Let's try to narrow it down. Please install the latest dvc and run this script in the root of your repo:

import sqlite3
from dvc.state import _build_sqlite_uri

uri = _build_sqlite_uri(os.path.join(os.getcwd(), ".dvc", "state"), {"nolock": 1})
print(uri)
sqlite3.connect(uri, uri=True)

CC @Suor , maybe you can see better where uri might've broken 馃檪

Alrighty, here are the results, it does result in the same error.

test_for_dvc_bug.py

import sqlite3
import os
from dvc.state import _build_sqlite_uri

uri = _build_sqlite_uri(os.path.join(os.getcwd(), ".dvc", "state"), {"nolock": 1})
print(uri)
sqlite3.connect(uri, uri=True)

updated to latest dvc, removed .dvcstate again and ran the above script
this is the output

file:///F:/repo_base/.dvc/state?nolock=1
Traceback (most recent call last):
  File "test_for_dvc_bug.py", line 7, in <module>
    sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file

The above is from running on windows, but it was the same error on Ubuntu.

@sremm Would be great to see output on ubuntu too, since we are more *nix guys than windows, so it is easier for us to grasp posix paths 馃檪 But so far uri looks alright.

@sremm And how about:

import sqlite3
import os

sqlite3.connect(os.path.join(os.getcwd(), ".dvc", "state"))

? Just a sanity check to make sure that this is the uri that is broken.

@efiop
So :) the prevous output from Ubuntu

Note that in my previous comment I also had to change the actual repository name, I realised that special characters might also have an effect and that there is the "%20" in the folder name as seen below, everything else is just letters. (same name on widows for the actualy repository folder)

python test_for_dvc_bug.py 
file:///home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1
Traceback (most recent call last):
  File "test_for_dvc_bug.py", line 7, in <module>
    sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file

and the new test test_for_dvc_bug_2.py

import sqlite3
import os

sqlite3.connect(os.path.join(os.getcwd(), ".dvc", "state"))

So this runs without errors. I guess this is what you expected since the "_build_sqlite_uri" function is under suspect. The "%20" in the name does not seem to be an issue for the sqlite3.connect() call though.

@sremm Thanks! Interestingly enough, I can reproduce if I feed a relative path to the _build_uri. How about this piece of code on Ubuntu:

import sqlite3

uri = "file:/home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)

I suppose that /home/mlburk/Repositories/Repo%20Base/.dvc/state is a valid one for your machine, right? If so, please run the code above and show the output. I suspect that it might be "//" situation that is at fault.

@efiop
test_for_dvc_bug_3.py contents

import sqlite3

uri = "file:/home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)

Output

python test_for_dvc_bug_3.py 
Traceback (most recent call last):
  File "test_for_dvc_bug_3.py", line 4, in <module>
    sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file

And yes with uri = "/home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1", there is no error, so without "file:" otherwise the same.

Also tried with uri = = "///home/mlburk/Repositories/Repo%20Base/.dvc/state?nolock=1" no error, empty output.

@sremm Without file: it most likely creates state?nolock=1 file instead of state. And probably created Repo%20Base directory, though I'm not sure about this one. Pretty weird stuff. Let's try this one too:

import sqlite3

uri = "file:.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)

Is there anything special about your python or your setup in general?

The "%20" in the name does not seem to be an issue for the sqlite3.connect() call though.

Btw, how did you test that? I suppose you have a directory Repo Base with the space in the name, right?

@efiop with the code contents as the below snipped, there is no error and a state file is created

import sqlite3

uri = "file:.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)

Hmm, the folder itself has the "%20" in the name, the repository in our internal server has a space though, but after forking with git it sets the "%20" to the actual folder name instead of the space.
And I do suspect that there is something to do with the "%20" since i tried renaming the folder to have an actualy space instead of "%20" and the issue dissapears. I can add files with the latest dvc version.

Here are some more cases where I get the error to help debug.
Tried to run with uri = "file:/home/mlburk/TestFolder/Base%20Repo/.dvc/state?nolock=1" where the "TestFolder" does not exist and get the same old error.
Also tired with uri = "file:/home/mlburk/Repositories/NewFolder/.dvc/state?nolock=1" where NewFolder exists, but ".dvc" folder does not.
So when some folders are missing sqlite3.OperationalError: unable to open database file seems to pop up as well.

I also tested creating a new folder with path /home/mlburk/Repositories/Test%20Folder/.dvc (no space the actual "%20" in the name) manually and tried running the code below, which also gave the error.

import sqlite3

uri = "file:/home/mlburk/Repositories/Test%20Folder/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)

Hmm, the folder itself has the "%20" in the name

Ah, ok, now it all makes sense! The uri should be encoded, and %20 means empty space, so sqlite tries to open "Repo Base" instead of "Repo%20Base", since it decodes it. So our uri builder should escape that properly. So most likely we need to

import urllib.parse
filename = urllib.parse.quote_plus(filename)

in https://github.com/iterative/dvc/blob/master/dvc/state.py#L499 . Are you familiar with modifying our project and installing the dev version? If you are, could you do those modifications and give it a try? E.g.

git clone https://github.com/iterative/dvc
cd dvc
# modify dvc/state.py
pip install .

@sremm To make sure that my conclusions are correct, could you try running:

import sqlite3

uri = "file:///home/mlburk/Repositories/Repo%2520Base/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)

Notice that "Repo%20Base" got encoded into "Repo%2520Base"

Running the snipped below

import sqlite3

uri = "file:///home/mlburk/Repositories/Repo%2520Base/.dvc/state?nolock=1"
sqlite3.connect(uri, uri=True)

results in no errors and the state file is created in /home/mlburk/Repositories/Repo%20Base/.dvc/

@sremm Great news! So we've found the issue. Patch is coming soon, stay tuned! 馃檪 Thanks for the feedback!

@sremm FYI: fix was merged. Releasing a new version right now 馃檪 If you want to try this before the official release, feel free to install upstream version through pip install git+https://github.com/iterative/dvc. Thanks a lot for the feedback! 馃檪

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mdscruggs picture mdscruggs  路  3Comments

anotherbugmaster picture anotherbugmaster  路  3Comments

prihoda picture prihoda  路  3Comments

tc-ying picture tc-ying  路  3Comments

shcheklein picture shcheklein  路  3Comments