Short description
Loading the celeb_a dataset results in an error.
Environment information
tensorflow-datasets/tfds-nightly version: 1.0.1tensorflow/tensorflow-gpu/tf-nightly/tf-nightly-gpu version: tf-nightly 1.14.1-dev20190301Reproduction instructions
>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> import tensorflow_datasets as tfds
>>> r = tfds.load("celeb_a")
Downloading / extracting dataset celeb_a (?? GiB) to /home/ayush99/tensorflow_datasets/celeb_a/0.3.0...
Dl Completed...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 4/4 [05:44<00:00, 102.30s/ url]
Traceback (most recent call last): MiB/s]
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 90, in _sync_extract
for path, handle in iter_archive(from_path, method):
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 160, in iter_zip
extract_file = z.open(member)
File "/home/ayush99/anaconda3/lib/python3.7/zipfile.py", line 1480, in open
self._fpclose, self._lock, lambda: self._writing)
File "/home/ayush99/anaconda3/lib/python3.7/zipfile.py", line 722, in __init__
self.seekable = file.seekable
AttributeError: 'GFile' object has no attribute 'seekable'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/registered.py", line 259, in load
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/dataset_builder.py", line 220, in download_and_prepare
max_examples_per_split=download_config.max_examples_per_split)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/dataset_builder.py", line 651, in _download_and_prepare
for split_generator in self._split_generators(dl_manager):
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/image/celeba.py", line 122, in _split_generators
"landmarks_celeba": LANDMARKS_DATA,
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 340, in download_and_extract
return _map_promise(self._download_extract, url_or_urls)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 376, in _map_promise
res = utils.map_nested(_wait_on_promise, all_promises)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 128, in map_nested
for k, v in data_struct.items()
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 128, in <dictcomp>
for k, v in data_struct.items()
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 142, in map_nested
return function(data_struct)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 360, in _wait_on_promise
return p.get()
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 510, in get
return self._target_settled_value(_raise=True)
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 514, in _target_settled_value
return self._target()._settled_value(_raise)
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 224, in _settled_value
reraise(type(raise_val), raise_val, self._traceback)
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 842, in handle_future_result
resolve(future.result())
File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 93, in _sync_extract
raise ExtractError(resource, err)
tensorflow_datasets.core.download.extractor.ExtractError: Error while extracting file /home/ayush99/tensorflow_datasets/downloads/ucexport_download_id_0B7EVK8r0v71pZjFTYXZWM3FlDDaXUAQO8EGH_a7VqGNLRtW52mva1LzDrb-V723OQN8 (https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM): 'GFile' object has no attribute 'seekable'.
Is this datset able to load when you do this same task via dataset builder!?
No, same error.
Ok I will look for the problem!!
May want to try this on Python 3.6. We don't test against Python 3.7. It may be that the zipfile implementation changed. Please update here if you find that this is the issue.
same error while loading cats_vs_dogs
I have encountered the same problem pasting the logs here(if it helps):
Dl Completed...: 0 url [00:00, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]Downloading / extracting dataset cats_vs_dogs (786.68 MiB) to C:\Users\Jaydev\tensorflow_datasets\cats_vs_dogs\2.0.0...
0 examples [00:00, ? examples/s]Traceback (most recent call last):
File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-8-7b4aa71b13f0>", line 8, in <module>
with_info=True, as_supervised=True)
File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\lib\site-packages\wrapt\wrappers.py", line 564, in __call__
args, kwargs)
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\registered.py", line 253, in load
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\lib\site-packages\wrapt\wrappers.py", line 603, in __call__
args, kwargs)
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 219, in download_and_prepare
max_examples_per_split=download_config.max_examples_per_split)
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 668, in _download_and_prepare
output_files,
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 107, in write_from_generator
_write_tfrecords_from_generator(wrapped, output_files, shuffle=True)
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 272, in _write_tfrecords_from_generator
_round_robin_write(writers, generator)
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 285, in _round_robin_write
for i, example in enumerate(tqdm.tqdm(generator, unit=" examples")):
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tqdm\_tqdm.py", line 1022, in __iter__
for obj in iterable:
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 106, in <genexpr>
_dict_to_tf_example(d).SerializeToString() for d in generator_fn())
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 638, in generator_fn
for i, ex in enumerate(self._generate_examples(**kwargs)):
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\image\cats_vs_dogs.py", line 90, in _generate_examples
for fname, fobj in archive:
File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\download\extractor.py", line 160, in iter_zip
extract_file = z.open(member)
File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\Lib\zipfile.py", line 1480, in open
self._fpclose, self._lock, lambda: self._writing)
File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\Lib\zipfile.py", line 722, in __init__
self.seekable = file.seekable
AttributeError: 'GFile' object has no attribute 'seekable'
Similar error as above using Python 3.7.3 and a fresh anaconda install
Dl Completed...: 0 url [00:00, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Dl Completed...: 0 url [00:00, ? url/s] ? file/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Extraction completed...: 0%| | 0/1 [00:00, ? file/s]
Downloading / extracting dataset celeb_a (1.38 GiB) to /data8/martin/DifferentialPrivacy/TensorflowDatasets/celeb_a/0.3.0...
AttributeError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/tensorflow_datasets/core/download/extractor.py in _sync_extract(self, resource, to_path)
89 try:
---> 90 for path, handle in iter_archive(from_path, method):
91 _copy(handle, path and os.path.join(to_path_tmp, path) or to_path_tmp)
~/anaconda3/lib/python3.7/site-packages/tensorflow_datasets/core/download/extractor.py in iter_zip(arch_f)
159 for member in z.infolist():
--> 160 extract_file = z.open(member)
161 if extract_file: # File with data (not directory):
~/anaconda3/lib/python3.7/zipfile.py in open(self, name, mode, pwd, force_zip64)
1479 zef_file = _SharedFile(self.fp, zinfo.header_offset,
-> 1480 self._fpclose, self._lock, lambda: self._writing)
1481 try:
~/anaconda3/lib/python3.7/zipfile.py in __init__(self, file, pos, close, lock, writing)
721 self._writing = writing
--> 722 self.seekable = file.seekable
723 self.tell = file.tell
AttributeError: 'GFile' object has no attribute 'seekable'
@rsepassi The Python version indeed seems to be the issue - there was a breaking change in Gzip in Python 3.7.
This issue is fixed in the current tensorflow:master for Python 3.7 with the following PR: https://github.com/tensorflow/tensorflow/pull/28006
(If upgrading is an issue, an alternative workaround is to use Python 3.6 to call tfds.load() once in order to extract the dataset, which will then download and extract a dataset that's usable on future tfds.load() calls in Python 3.7.)
Thank you for investigating. It seems to be an issue with TF so I'm not sure we can do anything from our part. It seems it will resolve by itself in the next version of TF.
Leaving the issue open for reference.
mark, same issue
Found the same issue here...
Environment information
tensorflow-datasets version 1.0.2tensorflow version 2.0.0a0In the case of caltech101 dataset, the issue is a bit different through:
WARNING: Logging before flag parsing goes to stderr.
E0605 14:28:30.191188 140040460883840 registered.py:171] Failed to construct dataset caltech101
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
<ipython-input-7-bb54b14eaab4> in <module>()
4 (raw_train, raw_test), metadata = tfds.load(
5 'caltech101', split=list(splits),
----> 6 with_info=True, as_supervised=True)
13 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
546 None, None,
547 compat.as_text(c_api.TF_Message(self.status.status)),
--> 548 c_api.TF_GetCode(self.status.status))
549 # Delete the underlying status object from memory otherwise it stays alive
550 # as there is a reference to status from this from the traceback due to
NotFoundError: /usr/local/lib/python3.6/dist-packages/tensorflow_datasets/image/caltech101_labels.txt; No such file or directory
Environment information:
Having this issue for a number of tfds datasets
same error while loading cats_vs_dogs
AttributeError: 'GFile' object has no attribute 'seekable'
For those having this issue, using the tf-nightly should fix the issue.
Does anyone still have this issue ? (as the bug was not updated for 2 months).
same with tf 1.13.1 python3.7
This is still an error with imagenet_resized in py35, py36, py37 , tf1.15, tf2.0 and their combinations -.-
Have you tried tf-nightly ?
I have exactly the same problem.
I have tried:
"from __future__ import absolute_import, division, print_function, unicode_literals
try:
!pip install -q tf-nightly
except Exception:
passimport tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()import os
datasets = tfds.load(name='celeb_a', with_info=True, as_supervised=True)"
I still get the error massage:
NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM, downloaded to /root/tensorflow_datasets/downloads/ucexport_download_id_0B7EVK8r0v71pZjFTYXZWM3FlDDaXUAQO8EGH_a7VqGNLRtW52mva1LzDrb-V723OQN8.tmp.b487bbf725dd4b8999e33068270ffc74/uc, has wrong checksum.
Closing this issue as it should be solved with recent version of TF.
@faajabbari , see: https://github.com/tensorflow/datasets/issues/1482 for explainations on your issues
Most helpful comment
same error while loading cats_vs_dogs