Short description
According to the TF Datasets Overview, "TensorFlow Datasets is compatible with both TensorFlow Eager mode and Graph mode."
However when I tried to run take(1) function on the dataset - while using Graph mode - I get this error:
RuntimeError: dataset.__iter__() is only supported when eager execution is enabled.
Environment information
tensorflow-datasets/tfds-nightly version: tensorflow-datasets 1.0.1tensorflow/tensorflow-gpu/tf-nightly/tf-nightly-gpu version: tensorflow 1.13.1Reproduction instructions
The code below is copied and pasted from https://www.tensorflow.org/datasets/overview
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN)
assert isinstance(mnist_train, tf.data.Dataset)
mnist_train
mnist_example, = mnist_train.take(1)
image, label = mnist_example["image"], mnist_example["label"]
plt.imshow(image.numpy()[:, :, 0].astype(np.float32), cmap=plt.get_cmap("gray"))
print("Label: %d" % label.numpy())
Link to logs
If applicable,
Expected behavior
I expect to get the results as shown in this tutorial:
https://www.tensorflow.org/datasets/overview
Additional context
Add any other context about the problem here.
tfds is compatible with both eager and graph modes, but that doesn't mean they can be used the same way in either context. The code you have copied is missing the tf.enable_eager_execution line from the link. As with most code, if you remove an arbitrary line, expecting subsequent lines to work unchanged is unrealistic.
take is based on __iter__, so will raise when in graph mode. There are other ways of iterating through datasets in graph mode.
Thank you @jackd for your response.
Looking at the API and Guide I couldn't find any information on which methods work with graph mode and which will work with eager mode. If you can guide me on how to know which functions or methods work with which mode that will be great.
I ended up finding this method - instead of take that worked with graph mode - from this article:
import tensorflow_datasets as tfds
# Fetch the dataset directly
mnist = tfds.image.MNIST()
# or by string name
mnist = tfds.builder('mnist')
# Describe the dataset with DatasetInfo
assert mnist.info.features['image'].shape == (28, 28, 1)
assert mnist.info.features['label'].num_classes == 10
assert mnist.info.splits['train'].num_examples == 60000
# Download the data, prepare it, and write it to disk
mnist.download_and_prepare()
# Load data from disk as tf.data.Datasets
datasets = mnist.as_dataset()
train_dataset, test_dataset = datasets['train'], datasets['test']
assert isinstance(train_dataset, tf.data.Dataset)
# And convert the Dataset to NumPy arrays if you'd like
for example in tfds.as_numpy(train_dataset):
image, label = example['image'], example['label']
assert isinstance(image, np.array)
@mostafaelhoushi while your solution works perfectly fine for exploring the dataset, it isn't very tensorflow-onic - i.e. you'll miss out on all the optimizations and everything that makes tf.data.Datasets great. If you want a tensor representation of the inputs, you can use outputs = dataset.make_one_shot_iterator().get_next(). These will be symbolic tensors, so to get numpy arrays out you'll need to launch a Session and do all that hoo-hah, but it will be much more performant for training models.
One important thing to understand is that this repo isn't about the Dataset API/framework (despite the name). That is managed by the core tensorflow repository under tf.data. This is repo is concerned with using that framework to bring many different publicly available datasets under the one interface with best practices for training pipelines.
@mostafaelhoushi while your solution works perfectly fine for exploring the dataset, it isn't very tensorflow-onic - i.e. you'll miss out on all the optimizations and everything that makes
tf.data.Datasets great. If you want a tensor representation of the inputs, you can useoutputs = dataset.make_one_shot_iterator().get_next(). These will be symbolic tensors, so to get numpy arrays out you'll need to launch aSessionand do all that hoo-hah, but it will be much more performant for training models.One important thing to understand is that this repo isn't about the
DatasetAPI/framework (despite the name). That is managed by the coretensorflowrepository undertf.data. This is repo is concerned with using that framework to bring many different publicly available datasets under the one interface with best practices for training pipelines.
Agree!
These will be symbolic tensors, so to get numpy arrays out you'll need to launch a
Sessionand do all that hoo-hah, but it will be much more performant for training models.
@jackd could you elaborate on this? I am trying this in tf 1.12 and would like to look at the images/get numpy arrays out but I am a bit lost regarding how to do that with a session.
@phiwei I would suggest you to read the TF1 tf.data guide if you want to use Graph mode: https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/datasets.md
ds = tfds.load(...)
iterator = ds.make_one_shot_iterator()
next_element = iterator.get_next()
with tf.Session() as sess:
for _ in range(100):
value = sess.run(next_element)
I'm facing a similar problem, but can't use the tf.Session() strategy due to using TF 2.0.
In case anyone has this same issue in TF2.0, I was able to solve the problem by using tf.data.experimental.get_single_element(dataset.take(1)) (see docs). The example code didn't work as written, but I could get it to work by using dataset.take(1). It seemed to think that the dataset contained more than one element regardless of batch_size.
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
it works for me.
@JoshEZiegler Note that you can still access graph-mode in TF2 using tf.compat.v1.Session, tf.compat.v1.placeholder,...
Most helpful comment
@mostafaelhoushi while your solution works perfectly fine for exploring the dataset, it isn't very tensorflow-onic - i.e. you'll miss out on all the optimizations and everything that makes
tf.data.Datasets great. If you want a tensor representation of the inputs, you can useoutputs = dataset.make_one_shot_iterator().get_next(). These will be symbolic tensors, so to get numpy arrays out you'll need to launch aSessionand do all that hoo-hah, but it will be much more performant for training models.One important thing to understand is that this repo isn't about the
DatasetAPI/framework (despite the name). That is managed by the coretensorflowrepository undertf.data. This is repo is concerned with using that framework to bring many different publicly available datasets under the one interface with best practices for training pipelines.