Datasets: TF Datasets only working in Eager Execution

Created on 12 Mar 2019 · 9Comments · Source: tensorflow/datasets

Short description
According to the TF Datasets Overview, "TensorFlow Datasets is compatible with both TensorFlow Eager mode and Graph mode."
However when I tried to run take(1) function on the dataset - while using Graph mode - I get this error:

RuntimeError: dataset.__iter__() is only supported when eager execution is enabled.

Environment information

Operating System: -Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Python version: 3.5.2
tensorflow-datasets/tfds-nightly version: tensorflow-datasets 1.0.1
tensorflow/tensorflow-gpu/tf-nightly/tf-nightly-gpu version: tensorflow 1.13.1

Reproduction instructions
The code below is copied and pasted from https://www.tensorflow.org/datasets/overview

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN)
assert isinstance(mnist_train, tf.data.Dataset)
mnist_train

mnist_example, = mnist_train.take(1)
image, label = mnist_example["image"], mnist_example["label"]

plt.imshow(image.numpy()[:, :, 0].astype(np.float32), cmap=plt.get_cmap("gray"))
print("Label: %d" % label.numpy())

Link to logs
If applicable,

Expected behavior
I expect to get the results as shown in this tutorial:
https://www.tensorflow.org/datasets/overview

Additional context
Add any other context about the problem here.

bug

Source

mostafaelhoushi

👍11

Most helpful comment

@mostafaelhoushi while your solution works perfectly fine for exploring the dataset, it isn't very tensorflow-onic - i.e. you'll miss out on all the optimizations and everything that makes tf.data.Datasets great. If you want a tensor representation of the inputs, you can use outputs = dataset.make_one_shot_iterator().get_next(). These will be symbolic tensors, so to get numpy arrays out you'll need to launch a Session and do all that hoo-hah, but it will be much more performant for training models.

One important thing to understand is that this repo isn't about the Dataset API/framework (despite the name). That is managed by the core tensorflow repository under tf.data. This is repo is concerned with using that framework to bring many different publicly available datasets under the one interface with best practices for training pipelines.

jackd on 13 Mar 2019

👍10

All 9 comments

tfds is compatible with both eager and graph modes, but that doesn't mean they can be used the same way in either context. The code you have copied is missing the tf.enable_eager_execution line from the link. As with most code, if you remove an arbitrary line, expecting subsequent lines to work unchanged is unrealistic.

take is based on __iter__, so will raise when in graph mode. There are other ways of iterating through datasets in graph mode.

jackd on 13 Mar 2019

Thank you @jackd for your response.
Looking at the API and Guide I couldn't find any information on which methods work with graph mode and which will work with eager mode. If you can guide me on how to know which functions or methods work with which mode that will be great.

I ended up finding this method - instead of take that worked with graph mode - from this article:

import tensorflow_datasets as tfds

# Fetch the dataset directly
mnist = tfds.image.MNIST()
# or by string name
mnist = tfds.builder('mnist')

# Describe the dataset with DatasetInfo
assert mnist.info.features['image'].shape == (28, 28, 1)
assert mnist.info.features['label'].num_classes == 10
assert mnist.info.splits['train'].num_examples == 60000

# Download the data, prepare it, and write it to disk
mnist.download_and_prepare()

# Load data from disk as tf.data.Datasets
datasets = mnist.as_dataset()
train_dataset, test_dataset = datasets['train'], datasets['test']
assert isinstance(train_dataset, tf.data.Dataset)

# And convert the Dataset to NumPy arrays if you'd like
for example in tfds.as_numpy(train_dataset):
  image, label = example['image'], example['label']
  assert isinstance(image, np.array)

mostafaelhoushi on 13 Mar 2019

👍7

jackd on 13 Mar 2019

👍10

@mostafaelhoushi while your solution works perfectly fine for exploring the dataset, it isn't very tensorflow-onic - i.e. you'll miss out on all the optimizations and everything that makes tf.data.Datasets great. If you want a tensor representation of the inputs, you can use outputs = dataset.make_one_shot_iterator().get_next(). These will be symbolic tensors, so to get numpy arrays out you'll need to launch a Session and do all that hoo-hah, but it will be much more performant for training models.

One important thing to understand is that this repo isn't about the Dataset API/framework (despite the name). That is managed by the core tensorflow repository under tf.data. This is repo is concerned with using that framework to bring many different publicly available datasets under the one interface with best practices for training pipelines.

Agree!

datianshi21 on 9 May 2019

These will be symbolic tensors, so to get numpy arrays out you'll need to launch a Session and do all that hoo-hah, but it will be much more performant for training models.

@jackd could you elaborate on this? I am trying this in tf 1.12 and would like to look at the images/get numpy arrays out but I am a bit lost regarding how to do that with a session.

phiwei on 3 Oct 2019

@phiwei I would suggest you to read the TF1 tf.data guide if you want to use Graph mode: https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/datasets.md

ds = tfds.load(...)
iterator = ds.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
  for _ in range(100):
    value = sess.run(next_element)

Conchylicultor on 3 Oct 2019

I'm facing a similar problem, but can't use the tf.Session() strategy due to using TF 2.0.

In case anyone has this same issue in TF2.0, I was able to solve the problem by using tf.data.experimental.get_single_element(dataset.take(1)) (see docs). The example code didn't work as written, but I could get it to work by using dataset.take(1). It seemed to think that the dataset contained more than one element regardless of batch_size.