Output of dvc version:
1.7.2
Additional Information (if any):
Running dvc.api.get_url("some_dir") fail on a directory.
Traceback (most recent call last):
File "x.py", line 6, in <module>
print(dvc.api.get_url("some_dir"))
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/api.py", line 37, in get_url
hash_info = _repo.repo_tree.get_hash(path_info)
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/tree/base.py", line 263, in get_hash
hash_info = self.get_dir_hash(path_info, **kwargs)
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/tree/repo.py", line 379, in get_dir_hash
return dvc_tree.get_dir_hash(path_info, **kwargs)
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/tree/dvc.py", line 242, in get_dir_hash
self._fetch_dir(out, **kwargs)
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/tree/dvc.py", line 146, in _fetch_dir
if self.fetch and out.changed_cache(filter_info=filter_info):
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/output/base.py", line 197, in changed_cache
self.hash_info, filter_info=filter_info
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/cache/base.py", line 412, in changed_cache
hash_info, path_info=path_info, filter_info=filter_info
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/cache/base.py", line 404, in _changed_dir_cache
if self.changed_cache_file(entry_hash):
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/cache/base.py", line 366, in changed_cache_file
actual = self.tree.get_hash(cache_info)
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/tree/base.py", line 242, in get_hash
hash_ = self.state.get(path_info)
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/state.py", line 408, in get
existing_record = self.get_state_record_for_inode(actual_inode)
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/state.py", line 352, in get_state_record_for_inode
self._execute(cmd, (self._to_sqlite(inode),))
File "/Users/matthieu/conda/lib/python3.6/site-packages/dvc/state.py", line 133, in _execute
return self.cursor.execute(cmd, parameters)
AttributeError: 'NoneType' object has no attribute 'execute'
Demo file:
import os, subprocess, dvc.api
os.chdir("/tmp/")
subprocess.check_output(["git", "clone", "https://dagshub.com/matthieu_yokai/test_dagshub.git"])
os.chdir("/tmp/test_dagshub")
print(dvc.api.get_url("some_dir"))
Analysis
self.cursor is None because state.load() is never called, because state.__enter__() is never called, because state is initialized in repo/__init.py with self.state = State(self.cache.local), and without "with statements"
Related: #4135
Hi, @MatthieuBizien. Thanks for the report. As a workaround, I'll suggest you following:
import dvc.api
url = dvc.api.get_url("some_dir/file", repo="https://dagshub.com/matthieu_yokai/test_dagshub.git")
print(url)
You could also clone and then dvc pull before using dvc.api.get_url as well.
@MatthieuBizien Just wanted to clarify that currently get_url for a dir will give you a url to a .dir cache file, that is a json file with a list of files in the directory with their corresponding hashes. That might not be what you are looking for though, but it should be easily adaptable. We are considering changing that behaviour to something more friendly (e.g. a list of files and their urls), but that will require us breaking the backward compatibility, so we'll probably do the switch in the future https://github.com/iterative/dvc/issues/3182 .
Hi @efiop, thanks for the fix! I'm aware we get the .dir cache files. I think this behavior is file, as it is coherent with what is done with a file. Maybe add another function, or an option to dvc.api.get_url, could implement that feature without breaking backward compatibility?