When I run the first dvc get listed here, I get the following error:
paul ~/GitHub/dvc 禄 dvc get https://github.com/iterative/dataset-registry \
> get-started/data.xml -o data/data.xml
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
name: ../../home/ubuntu/GitHub/dvc/data/data.xml.jCpvxLLAJHBNeBGwZSWmUp.tmp, md5: a304afb96060aad90176268345e10355
WARNING: Cache 'a304afb96060aad90176268345e10355' not found. File 'data/data.xml.jCpvxLLAJHBNeBGwZSWmUp.tmp' won't be created.
ERROR: failed to get 'get-started/data.xml' from 'https://github.com/iterative/dataset-registry' - The path 'get-started/data.xml' does not exist in the target repository 'https://github.com/iterative/dataset-registry' neither as an output nor a git-handled file.
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
PR#1057 isn't the solution. Documentation needs to be updated to reflect steps at iterative/dataset-registry.
@paulkaefer I can't reproduce this. Could you run it with -v and share the log, please?
@shcheklein:
paul ~/GitHub/test 禄 dvc get -v https://github.com/iterative/dataset-registry \ get-started/data.xml -o data/data.xml
2020-03-16 11:51:23,701 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2020-03-16 11:51:23,873 DEBUG: erepo: git clone https://github.com/iterative/dataset-registry to a temporary dir
2020-03-16 11:51:24,267 DEBUG: erepo: making a copy of https://github.com/iterative/dataset-registry clone
2020-03-16 11:51:24,380 DEBUG: Removing '/home/ubuntu/GitHub/test/data/.3yXxuEYVW5sfU8QDEk5GC4'
2020-03-16 11:51:24,380 ERROR: failed to get ' get-started/data.xml' from 'https://github.com/iterative/dataset-registry' - The path ' get-started/data.xml' does not exist in the target repository 'https://github.com/iterative/dataset-registry' neither as an output nor a git-handled file.
------------------------------------------------------------
Traceback (most recent call last):
File "/snap/dvc/241/lib/python3.6/site-packages/dvc/external_repo.py", line 94, in pull_to
fs_copy(fspath(path_info), fspath(to_info))
File "/snap/dvc/241/lib/python3.6/site-packages/dvc/utils/fs.py", line 27, in fs_copy
shutil.copy2(src, dst)
File "/snap/dvc/241/usr/lib/python3.6/shutil.py", line 263, in copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/snap/dvc/241/usr/lib/python3.6/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpq0m5c03gdvc-erepo/ get-started/data.xml'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/snap/dvc/241/lib/python3.6/site-packages/dvc/command/get.py", line 41, in _get_file_from_repo
rev=self.args.rev,
File "/snap/dvc/241/lib/python3.6/site-packages/dvc/repo/get.py", line 55, in get
repo.pull_to(path, PathInfo(out))
File "/snap/dvc/241/lib/python3.6/site-packages/dvc/external_repo.py", line 96, in pull_to
raise PathMissingError(path, self.url)
dvc.exceptions.PathMissingError: The path ' get-started/data.xml' does not exist in the target repository 'https://github.com/iterative/dataset-registry' neither as an output nor a git-handled file.
------------------------------------------------------------
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
To be clear, I'm going through the tutorial. I was able to (1) clone the dataset-registry repo, and (2) run cp ../dataset-registry/get-started/data.xml data/ to get the data.xml file into my getting started local repo.
Ok, I see where that problem comes from
The path ' get-started/data.xml'
mind the space before the path.
Looks like it's copy-paste + terminal problem.
This is exactly how regular CLI tools behave:
touch " test"
rm -f \ test
What I would suggest is to change the command somehow to avoid this copy-paste problems?
@paulkaefer what OS, what browser and terminal do you use?
@shcheklein Ubuntu, default Terminal (bash). Brave Browser.
Command was copied from the top example @ https://dvc.org/doc/get-started/add-files.
@shcheklein good call, though. I removed the space before the \ and dvc get -v https://github.com/iterative/dataset-registry \get-started/data.xml -o data/data.xml works.
@paulkaefer so, when I copy it and paste I'm getting something like this in my browser:
(.env) [ivan@ivan /tmp]$ dvc get https://github.com/iterative/dataset-registry \
> get-started/data.xml -o data/data.xml
is it the same for you?
@shcheklein yes!
@paulkaefer and when you run w/o modifications, does it work? (it works for me, in my terminal as-is if I just copy-paste it)
@shcheklein yes. Did you change something? Maybe I mis-copied before? I believe the first time, I typed it out in the interest of developing muscle memory for dvc.
@paulkaefer No, I didn't change anything. Looks like some mis-copy or some honest typo :) Closing this for now. Thanks for reporting this though, if we get more complaints we'll think about simplifying some command to fit into a single line.
Thanks, @shcheklein. I've shared the tutorial internally, with my recommendation:
this is how tech tutorials _should_ be (easy to follow, colorful, expand boxes for concepts you might or might know).
I'll be sure and open issues or PRs if I find anything else.
Hi I'm also having a problem with this command, that doesn't seem to be related to the space issue:
I've tried copy and pasting directly from the docs:
dvc get -v https://github.com/iterative/dataset-registry \
get-started/data.xml -o data/data.xml
and putting it all on one line:
dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
In both cases I get:
$ dvc get -v https://github.com/iterative/dataset-registry get-started/data.xml -o data/data.xml
2020-07-02 11:36:24,693 DEBUG: Creating external repo https://github.com/iterative/dataset-registry@None
2020-07-02 11:36:24,693 DEBUG: erepo: git clone https://github.com/iterative/dataset-registry to a temporary dir
2020-07-02 11:36:27,278 DEBUG: Saving '../../../../../tmp/tmpn23yad7vdvc-clone/get-started/data.xml' to 'data/.HsELAmwFokBPr9emR7s3sd/a3/04afb96060aad90176268345e10355'.
2020-07-02 11:36:27,279 DEBUG: cache '/home/matthew/Documents/muanalytics/dvc/data/.HsELAmwFokBPr9emR7s3sd/a3/04afb96060aad90176268345e10355' expected 'a304afb96060aad90176268345e10355' actual 'None'
2020-07-02 11:36:27,279 DEBUG: cache '/home/matthew/Documents/muanalytics/dvc/data/.HsELAmwFokBPr9emR7s3sd/a3/04afb96060aad90176268345e10355' expected 'a304afb96060aad90176268345e10355' actual 'None'
2020-07-02 11:36:27,303 DEBUG: Preparing to download data from 'https://remote.dvc.org/dataset-registry'
2020-07-02 11:36:27,303 DEBUG: Preparing to collect status from https://remote.dvc.org/dataset-registry
2020-07-02 11:36:27,304 DEBUG: Collecting information from local cache...
2020-07-02 11:36:27,306 DEBUG: cache '/home/matthew/Documents/muanalytics/dvc/data/.HsELAmwFokBPr9emR7s3sd/a3/04afb96060aad90176268345e10355' expected 'a304afb96060aad90176268345e10355' actual 'None'
2020-07-02 11:36:27,308 DEBUG: Collecting information from remote cache...
2020-07-02 11:36:27,309 DEBUG: Matched '0' indexed hashes
2020-07-02 11:36:27,309 DEBUG: Querying 1 hashes via object_exists
2020-07-02 11:36:31,232 DEBUG: Removing '/home/matthew/Documents/muanalytics/dvc/data/.HsELAmwFokBPr9emR7s3sd'
2020-07-02 11:36:31,232 ERROR: failed to get 'get-started/data.xml' from 'https://github.com/iterative/dataset-registry' - could not perform a HEAD request
------------------------------------------------------------
Traceback (most recent call last):
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 976, in _validate_conn
conn.connect()
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connection.py", line 308, in connect
conn = self._new_conn()
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connection.py", line 172, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f99a1d22b38>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 765, in urlopen
**response_kw
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 765, in urlopen
**response_kw
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 765, in urlopen
**response_kw
[Previous line repeated 2 more times]
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 725, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/urllib3/util/retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='s3-us-east-2.amazonaws.com', port=443): Max retries exceeded with url: /dvc-public/remote/dataset-registry/a3/04afb96060aad90176268345e10355 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f99a1d22b38>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/http.py", line 104, in request
**kwargs,
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/requests/sessions.py", line 665, in send
history = [resp for resp in gen]
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/requests/sessions.py", line 665, in <listcomp>
history = [resp for resp in gen]
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/requests/sessions.py", line 245, in resolve_redirects
**adapter_kwargs
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='s3-us-east-2.amazonaws.com', port=443): Max retries exceeded with url: /dvc-public/remote/dataset-registry/a3/04afb96060aad90176268345e10355 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f99a1d22b38>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/command/get.py", line 41, in _get_file_from_repo
rev=self.args.rev,
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/repo/get.py", line 53, in get
repo.get_external(path, out)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/external_repo.py", line 143, in get_external
_, _, save_infos = self.fetch_external([path])
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/external_repo.py", line 133, in fetch_external
download_callback=download_update,
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/base.py", line 1161, in save
return self._save(path_info, tree, hash_, save_link, **kwargs)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/base.py", line 1169, in _save
return self._save_file(path_info, tree, hash_, save_link, **kwargs)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/base.py", line 1096, in _save_file
with tree.open(path_info, mode="rb") as fobj:
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/repo/tree.py", line 274, in open
path, mode=mode, encoding=encoding, **kwargs
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/repo/tree.py", line 94, in open
self.repo.cloud.pull(cache_info, remote=remote)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/data_cloud.py", line 85, in pull
cache, jobs=jobs, remote=remote, show_checksums=show_checksums
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/base.py", line 79, in wrapper
return f(obj, named_cache, remote, *args, **kwargs)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/local.py", line 710, in pull
download=True,
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/local.py", line 610, in _process
download=download,
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/local.py", line 469, in _status
md5s, jobs=jobs, name=str(remote.path_info)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/base.py", line 812, in hashes_exist
remote_hashes = self.tree.list_hashes_exists(hashes, jobs, name)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/base.py", line 701, in list_hashes_exists
ret = list(itertools.compress(hashes, in_remote))
File "/home/matthew/.pyenv/versions/3.7.2/lib/python3.7/concurrent/futures/_base.py", line 586, in result_iterator
yield fs.pop().result()
File "/home/matthew/.pyenv/versions/3.7.2/lib/python3.7/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/home/matthew/.pyenv/versions/3.7.2/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/matthew/.pyenv/versions/3.7.2/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/base.py", line 694, in exists_with_progress
ret = self.exists(path_info)
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/http.py", line 125, in exists
return bool(self.request("HEAD", path_info.url))
File "/home/matthew/Documents/muanalytics/dvc/build/virtualenv/lib/python3.7/site-packages/dvc/remote/http.py", line 122, in request
raise DvcException(f"could not perform a {method} request")
dvc.exceptions.DvcException: could not perform a HEAD request
------------------------------------------------------------
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
I'm on Ubuntu 18.04.4 LTS with zsh 5.4.2 (x86_64-ubuntu-linux-gnu). I installed with:
pip install dvc
pip install dvc[s3]
@ivyleavedtoadflax how about this:
wget https://s3-us-east-2.amazonaws.com/dvc-public/remote/dataset-registry/a3/04afb96060aad90176268345e10355
does it work for you?
Ah :facepalm: sorry @shcheklein I needed to disconnect my VPN. It's all working now :+1:
Most helpful comment
Ah :facepalm: sorry @shcheklein I needed to disconnect my VPN. It's all working now :+1: