Short description
The download manager should be able to download files.
However, for a certain file, I get an error.
Environment information
tensorflow-datasets/tfds-nightly version: 4.1.0+nightlytensorflow/tf-nightly version: 2.4.0tfds-nightly package (pip install --upgrade tfds-nightly) ? YesReproduction instructions
Create any dataset with the following _split_generators code (minimal reproduction)
def _split_generators(self, dl_manager: tfds.download.DownloadManager):
"""Returns SplitGenerators."""
dl_manager.download(
"https://aslsignbank.haskins.yale.edu/dictionary/protected_media/glossvideo/ASL/BO/BOOK-418.mp4")
Link to logs
It first says:
INFO[download_manager.py]: Downloading https://aslsignbank.haskins.yale.edu/dictionary/protected_media/glossvideo/ASL/BO/BOOK-418.mp4 into /home/nlp/amit/tensorflow_datasets/downloads/asls.hask.yale.edu_dict_prot_medi_glos_ASLfhvOgypj047EznvbC8apzfVHWR69qsI29Clf-twND88.mp4.tmp.7953dc7f67ab4d42a107981d38e45335...
and then:
tensorflow.python.framework.errors_impl.NotFoundError: /home/nlp/amit/tensorflow_datasets/downloads/asls.hask.yale.edu_dict_prot_medi_glos_ASLfhvOgypj047EznvbC8apzfVHWR69qsI29Clf-twND88.mp4.tmp.7953dc7f67ab4d42a107981d38e45335/glossvideo/ASL/BO/BOOK-418.mp4; No such file or directory
Expected behavior
The download should be successful
Could you share the full logs, including the stacktrace ? Are you using the returned value of downloaded_path = dl_manager.download('http://...')
I do use the downloaded path, but it is not required for the minimal reproduction example.
Here is the full log from the command tfds build
2021-01-08 21:29:59.552876: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-01-08 21:29:59.552935: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
INFO[build.py]: Loading dataset from path: /home/nlp/amit/datasets/wlasl/wlasl.py
2021-01-08 21:30:04.682786: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".
INFO[build.py]: download_and_prepare for dataset wlasl/default/0.3.0...
INFO[dataset_builder.py]: Generating dataset wlasl (/home/nlp/amit/tensorflow_datasets/wlasl/default/0.3.0)
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /home/nlp/amit/tensorflow_datasets/wlasl/default/0.3.0...
INFO[download_manager.py]: Downloading https://aslsignbank.haskins.yale.edu/dictionary/protected_media/glossvideo/ASL/BO/BOOK-418.mp4 into /home/nlp/amit/tensorflow_datasets/downloads/asls.hask.yale.edu_dict_prot_medi_glos_ASLfhvOgypj047EznvbC8apzfVHWR69qsI29Clf-twND88.mp4.tmp.b88962d8805149259ed48536c4d2e7bd...
Dl Size...: 0 MiB [00:01, ? MiB/s] | 0/1 [00:01<?, ? url/s]
Dl Completed...: 0%| | 0/1 [00:01<?, ? url/s]
Traceback (most recent call last):
File "/home/nlp/amit/anaconda2/envs/meta-scholar/bin/tfds", line 8, in <module>
sys.exit(launch_cli())
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/scripts/cli/main.py", line 120, in launch_cli
app.run(main, flags_parser=_parse_flags)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/scripts/cli/main.py", line 115, in main
args.subparser_fn(args)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/scripts/cli/build.py", line 199, in _build_datasets
_download_and_prepare(args, builder)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/scripts/cli/build.py", line 357, in _download_and_prepare
download_config=dl_config,
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 434, in download_and_prepare
download_config=download_config,
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1136, in _download_and_prepare
dl_manager, **optional_pipeline_kwargs
File "/home/nlp/amit/datasets/wlasl/wlasl.py", line 154, in _split_generators
"https://aslsignbank.haskins.yale.edu/dictionary/protected_media/glossvideo/ASL/BO/BOOK-418.mp4")
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 549, in download
return _map_promise(self._download, url_or_urls)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 777, in _map_promise
res = tf.nest.map_structure(lambda p: p.get(), all_promises) # Wait promises
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow/python/util/nest.py", line 659, in map_structure
structure[0], [func(*x) for x in entries],
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow/python/util/nest.py", line 659, in <listcomp>
structure[0], [func(*x) for x in entries],
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 777, in <lambda>
res = tf.nest.map_structure(lambda p: p.get(), all_promises) # Wait promises
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/promise/promise.py", line 512, in get
return self._target_settled_value(_raise=True)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/promise/promise.py", line 516, in _target_settled_value
return self._target()._settled_value(_raise)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/promise/promise.py", line 226, in _settled_value
reraise(type(raise_val), raise_val, self._traceback)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/promise/promise.py", line 844, in handle_future_result
resolve(future.result())
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow_datasets/core/download/downloader.py", line 224, in _sync_download
file_.write(block)
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 102, in write
self._prewrite_check()
File "/home/nlp/amit/anaconda2/envs/meta-scholar/lib/python3.7/site-packages/tensorflow/python/lib/io/file_io.py", line 88, in _prewrite_check
compat.path_to_bytes(self.__name), compat.as_bytes(self.__mode))
tensorflow.python.framework.errors_impl.NotFoundError: /home/nlp/amit/tensorflow_datasets/downloads/asls.hask.yale.edu_dict_prot_medi_glos_ASLfhvOgypj047EznvbC8apzfVHWR69qsI29Clf-twND88.mp4.tmp.b88962d8805149259ed48536c4d2e7bd/glossvideo/ASL/BO/BOOK-418.mp4; No such file or directory
We get the following response headers when we make a get request to the URL.
{
"headers":{
'Content-Disposition': 'inline;filename=glossvideo/ASL/BO/BOOK-418.mp4;filename*=UTF-8',
...
},
'url': 'https://aslsignbank.haskins.yale.edu/dictionary/protected_media/glossvideo/ASL/BO/BOOK-418.mp4',
....
}
So, when we pass the above response to the _get_filaname function:
We get glossvideo/ASL/BO/BOOK-418.mp4 as a return value but the expected value was BOOK-418.mp4.
So, we will have to make some changes here.
For Content-Disposition grammar rule reference, see https://tools.ietf.org/html/rfc6266#section-4.1
Hi, I would like to work on this issue
Most helpful comment
Hi, I would like to work on this issue