Datasets: Error in loading the celeb_a dataset (Py 3.7)

Created on 3 Mar 2019  路  21Comments  路  Source: tensorflow/datasets

Short description
Loading the celeb_a dataset results in an error.

Environment information

  • Operating System: Ubuntu 18.04
  • Python version: python 3.7
  • tensorflow-datasets/tfds-nightly version: 1.0.1
  • tensorflow/tensorflow-gpu/tf-nightly/tf-nightly-gpu version: tf-nightly 1.14.1-dev20190301

Reproduction instructions

>>> import tensorflow as tf
>>> tf.enable_eager_execution()
>>> import tensorflow_datasets as tfds
>>> r = tfds.load("celeb_a")
Downloading / extracting dataset celeb_a (?? GiB) to /home/ayush99/tensorflow_datasets/celeb_a/0.3.0...
Dl Completed...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 4/4 [05:44<00:00, 102.30s/ url]
Traceback (most recent call last): MiB/s]
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 90, in _sync_extract
    for path, handle in iter_archive(from_path, method):
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 160, in iter_zip
    extract_file = z.open(member)
  File "/home/ayush99/anaconda3/lib/python3.7/zipfile.py", line 1480, in open
    self._fpclose, self._lock, lambda: self._writing)
  File "/home/ayush99/anaconda3/lib/python3.7/zipfile.py", line 722, in __init__
    self.seekable = file.seekable
AttributeError: 'GFile' object has no attribute 'seekable'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/registered.py", line 259, in load
    dbuilder.download_and_prepare(**download_and_prepare_kwargs)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/api_utils.py", line 52, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/dataset_builder.py", line 220, in download_and_prepare
    max_examples_per_split=download_config.max_examples_per_split)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/dataset_builder.py", line 651, in _download_and_prepare
    for split_generator in self._split_generators(dl_manager):
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/image/celeba.py", line 122, in _split_generators
    "landmarks_celeba": LANDMARKS_DATA,
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 340, in download_and_extract
    return _map_promise(self._download_extract, url_or_urls)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 376, in _map_promise
    res = utils.map_nested(_wait_on_promise, all_promises)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 128, in map_nested
    for k, v in data_struct.items()
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 128, in <dictcomp>
    for k, v in data_struct.items()
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/utils/py_utils.py", line 142, in map_nested
    return function(data_struct)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/download_manager.py", line 360, in _wait_on_promise
    return p.get()
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 510, in get
    return self._target_settled_value(_raise=True)
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 514, in _target_settled_value
    return self._target()._settled_value(_raise)
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 224, in _settled_value
    reraise(type(raise_val), raise_val, self._traceback)
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/ayush99/GitHub/dsets/lib/python3.7/site-packages/promise/promise.py", line 842, in handle_future_result
    resolve(future.result())
  File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/ayush99/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/ayush99/GitHub/datasets/tensorflow_datasets/core/download/extractor.py", line 93, in _sync_extract
    raise ExtractError(resource, err)
tensorflow_datasets.core.download.extractor.ExtractError: Error while extracting file /home/ayush99/tensorflow_datasets/downloads/ucexport_download_id_0B7EVK8r0v71pZjFTYXZWM3FlDDaXUAQO8EGH_a7VqGNLRtW52mva1LzDrb-V723OQN8 (https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM): 'GFile' object has no attribute 'seekable'.
bug

Most helpful comment

same error while loading cats_vs_dogs

All 21 comments

Is this datset able to load when you do this same task via dataset builder!?

No, same error.

Ok I will look for the problem!!

May want to try this on Python 3.6. We don't test against Python 3.7. It may be that the zipfile implementation changed. Please update here if you find that this is the issue.

same error while loading cats_vs_dogs

I have encountered the same problem pasting the logs here(if it helps):
Dl Completed...: 0 url [00:00, ? url/s] Dl Size...: 0 MiB [00:00, ? MiB/s]Downloading / extracting dataset cats_vs_dogs (786.68 MiB) to C:\Users\Jaydev\tensorflow_datasets\cats_vs_dogs\2.0.0... 0 examples [00:00, ? examples/s]Traceback (most recent call last): File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-8-7b4aa71b13f0>", line 8, in <module> with_info=True, as_supervised=True) File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\lib\site-packages\wrapt\wrappers.py", line 564, in __call__ args, kwargs) File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec return fn(*args, **kwargs) File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\registered.py", line 253, in load dbuilder.download_and_prepare(**download_and_prepare_kwargs) File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\lib\site-packages\wrapt\wrappers.py", line 603, in __call__ args, kwargs) File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\api_utils.py", line 52, in disallow_positional_args_dec return fn(*args, **kwargs) File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 219, in download_and_prepare max_examples_per_split=download_config.max_examples_per_split) File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 668, in _download_and_prepare output_files, File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 107, in write_from_generator _write_tfrecords_from_generator(wrapped, output_files, shuffle=True) File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 272, in _write_tfrecords_from_generator _round_robin_write(writers, generator) File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 285, in _round_robin_write for i, example in enumerate(tqdm.tqdm(generator, unit=" examples")): File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tqdm\_tqdm.py", line 1022, in __iter__ for obj in iterable: File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\file_format_adapter.py", line 106, in <genexpr> _dict_to_tf_example(d).SerializeToString() for d in generator_fn()) File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 638, in generator_fn for i, ex in enumerate(self._generate_examples(**kwargs)): File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\image\cats_vs_dogs.py", line 90, in _generate_examples for fname, fobj in archive: File "C:\ML_VirtualEnv\ml_venv\lib\site-packages\tensorflow_datasets\core\download\extractor.py", line 160, in iter_zip extract_file = z.open(member) File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\Lib\zipfile.py", line 1480, in open self._fpclose, self._lock, lambda: self._writing) File "C:\Users\Jaydev\AppData\Local\Programs\Python\Python37\Lib\zipfile.py", line 722, in __init__ self.seekable = file.seekable AttributeError: 'GFile' object has no attribute 'seekable'

Similar error as above using Python 3.7.3 and a fresh anaconda install

Dl Completed...: 0 url [00:00, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]

Dl Completed...: 0 url [00:00, ? url/s] ? file/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]

Extraction completed...: 0%| | 0/1 [00:00 Downloading / extracting dataset celeb_a (1.38 GiB) to /data8/martin/DifferentialPrivacy/TensorflowDatasets/celeb_a/0.3.0...


AttributeError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/tensorflow_datasets/core/download/extractor.py in _sync_extract(self, resource, to_path)
89 try:
---> 90 for path, handle in iter_archive(from_path, method):
91 _copy(handle, path and os.path.join(to_path_tmp, path) or to_path_tmp)

~/anaconda3/lib/python3.7/site-packages/tensorflow_datasets/core/download/extractor.py in iter_zip(arch_f)
159 for member in z.infolist():
--> 160 extract_file = z.open(member)
161 if extract_file: # File with data (not directory):

~/anaconda3/lib/python3.7/zipfile.py in open(self, name, mode, pwd, force_zip64)
1479 zef_file = _SharedFile(self.fp, zinfo.header_offset,
-> 1480 self._fpclose, self._lock, lambda: self._writing)
1481 try:

~/anaconda3/lib/python3.7/zipfile.py in __init__(self, file, pos, close, lock, writing)
721 self._writing = writing
--> 722 self.seekable = file.seekable
723 self.tell = file.tell

AttributeError: 'GFile' object has no attribute 'seekable'

@rsepassi The Python version indeed seems to be the issue - there was a breaking change in Gzip in Python 3.7.

This issue is fixed in the current tensorflow:master for Python 3.7 with the following PR: https://github.com/tensorflow/tensorflow/pull/28006

(If upgrading is an issue, an alternative workaround is to use Python 3.6 to call tfds.load() once in order to extract the dataset, which will then download and extract a dataset that's usable on future tfds.load() calls in Python 3.7.)

Thank you for investigating. It seems to be an issue with TF so I'm not sure we can do anything from our part. It seems it will resolve by itself in the next version of TF.
Leaving the issue open for reference.

mark, same issue

Found the same issue here...

Environment information

  • Operating System: macOS Mojave version 10.14.1
  • Python version: python 3.7.3
  • tensorflow-datasets version 1.0.2
  • tensorflow version 2.0.0a0

In the case of caltech101 dataset, the issue is a bit different through:

WARNING: Logging before flag parsing goes to stderr.
E0605 14:28:30.191188 140040460883840 registered.py:171] Failed to construct dataset caltech101
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
<ipython-input-7-bb54b14eaab4> in <module>()
      4 (raw_train, raw_test), metadata = tfds.load(
      5                 'caltech101', split=list(splits),
----> 6                 with_info=True, as_supervised=True)

13 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    546             None, None,
    547             compat.as_text(c_api.TF_Message(self.status.status)),
--> 548             c_api.TF_GetCode(self.status.status))
    549     # Delete the underlying status object from memory otherwise it stays alive
    550     # as there is a reference to status from this from the traceback due to

NotFoundError: /usr/local/lib/python3.6/dist-packages/tensorflow_datasets/image/caltech101_labels.txt; No such file or directory

Environment information:

  • OS: macOS Mojave version 10.14.1
  • Python ver: 3.7.1
  • tensorflow version 2.0.0a0
  • tensorflow-datasets version 1.0.2

Having this issue for a number of tfds datasets

same error while loading cats_vs_dogs

AttributeError: 'GFile' object has no attribute 'seekable'

For those having this issue, using the tf-nightly should fix the issue.

Does anyone still have this issue ? (as the bug was not updated for 2 months).

same with tf 1.13.1 python3.7

This is still an error with imagenet_resized in py35, py36, py37 , tf1.15, tf2.0 and their combinations -.-

Have you tried tf-nightly ?

I have exactly the same problem.
I have tried:

"from __future__ import absolute_import, division, print_function, unicode_literals
try:
!pip install -q tf-nightly
except Exception:
pass

import tensorflow_datasets as tfds
import tensorflow as tf
tfds.disable_progress_bar()

import os

datasets = tfds.load(name='celeb_a', with_info=True, as_supervised=True)"

I still get the error massage:
NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM, downloaded to /root/tensorflow_datasets/downloads/ucexport_download_id_0B7EVK8r0v71pZjFTYXZWM3FlDDaXUAQO8EGH_a7VqGNLRtW52mva1LzDrb-V723OQN8.tmp.b487bbf725dd4b8999e33068270ffc74/uc, has wrong checksum.

Closing this issue as it should be solved with recent version of TF.

@faajabbari , see: https://github.com/tensorflow/datasets/issues/1482 for explainations on your issues

Was this page helpful?
0 / 5 - 0 ratings