Datasets: How do I train my keras model using Tf Datasets

Created on 1 Jul 2019 · 8Comments · Source: tensorflow/datasets

What I need help with / What I was wondering
Im looking for a clear example to split the labels and examples into x_train and y_train/ x_test and y_test for the cifar100 dataset.

The keras model doesn't take in the tf datasets object into it's fit function.
What I've tried so far

import tensorflow as tf
import tensorflow_datasets as tfds

# tfds works in both Eager and Graph modes
tf.enable_eager_execution()

# See available datasets
print(tfds.list_builders())

# Construct a tf.data.Dataset
dataset = tfds.load(name="cifar100", split=tfds.Split.TRAIN)

# Build your input pipeline
dataset = dataset.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
for features in dataset.take(1):
  image, label = features["image"], features["label"]
  print(type(image))
  print(type(label))

input_shape = (32,32, 3)
num_classes = 10
batch_size = 32
epochs = 10

l = tf.keras.layers

model = tf.keras.Sequential([
    l.Conv2D(
        32, 5, padding='same', activation='relu', input_shape=input_shape),
    l.MaxPooling2D((2, 2), (2, 2), padding='same'),
    l.BatchNormalization(),
    l.Conv2D(64, 5, padding='same', activation='relu'),
    l.MaxPooling2D((2, 2), (2, 2), padding='same'),
    l.Flatten(),
    l.Dense(1024, activation='relu'),
    l.Dropout(0.4),
    l.Dense(num_classes, activation='softmax')
])

model.summary()

callbacks = [tf.keras.callbacks.TensorBoard(log_dir=logdir, profile_batch=0)]

model.compile(
    loss=tf.keras.losses.categorical_crossentropy,
    optimizer='adam',
    metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          callbacks=callbacks,
          validation_data=(x_test, y_test))
score = model.evaluate(train_dt, val_dt, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

It would be nice if...
I could get some way to convert the train_dt tensorflow dataset into something that is vaible to input in as x_train and y_train.

It's really irritating that a demo isn't there to do something so trivial. What use is the colab demo notebook when it doesnt show us how to train a simple keras model with tf datasets. I though tensorflow is trying to unify everything...

Some help would be really appreciated.

help

Source

karanchahal

👍5

Most helpful comment

Note that for the documentation, let's try to use in_memory=True and
as_supervised=True instead of batch_size=-1 and manually splitting the
features. That way I think you could pass the dataset directly to the Keras
model.fit method.

On Mon, Jul 1, 2019 at 12:13 PM us notifications@github.com wrote:

Hey @karanchahal https://github.com/karanchahal thank you for feedback.
You are right the documentation about keras would be great. I will write
as soon as possible. Until then you can use this.

batch_size=-1 to get the full dataset in NumPy arrays from the returned tf.Tensor object

mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN, batch_size=-1 )
mnist_test = tfds.load(name="mnist", split=tfds.Split.TEST, batch_size=-1)

tfds.as_numpy return a generator that yields NumPy array records out of a tf.data.Dataset

mnist_train = tfds.as_numpy(mnist_train)
mnist_test = tfds.as_numpy(mnist_test)

x_train, y_train = mnist_train["image"], mnist_train["label"] # seperate the x and y
x_test, y_test = mnist_test["image"], mnist_test["label"]

For more split information please check this link
https://www.tensorflow.org/datasets/splits.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/datasets/issues/720?email_source=notifications&email_token=AAIQMW5BMVNBESQV3MMJK4LP5JJMRA5CNFSM4H4QGKS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY7CJ6I#issuecomment-507389177,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIQMWYZZVYSS75Z5RTX4VTP5JJMRANCNFSM4H4QGKSQ
.

rsepassi on 1 Jul 2019

👍5

All 8 comments

Hey @karanchahal thank you for feedback.
You are right the documentation about keras would be great. I will write as soon as possible. Until then you can use this.

# batch_size=-1 to get the full dataset in NumPy arrays from the returned tf.Tensor object
mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN, batch_size=-1 ) 
mnist_test = tfds.load(name="mnist", split=tfds.Split.TEST, batch_size=-1)

# tfds.as_numpy return a generator that yields NumPy array records out of a tf.data.Dataset
mnist_train = tfds.as_numpy(mnist_train) 
mnist_test = tfds.as_numpy(mnist_test)

x_train, y_train = mnist_train["image"], mnist_train["label"] # seperate the x and y
x_test, y_test = mnist_test["image"], mnist_test["label"]

For more split information please check this link.

us on 1 Jul 2019

👍1

On Mon, Jul 1, 2019 at 12:13 PM us notifications@github.com wrote:

Hey @karanchahal https://github.com/karanchahal thank you for feedback.
You are right the documentation about keras would be great. I will write
as soon as possible. Until then you can use this.

batch_size=-1 to get the full dataset in NumPy arrays from the returned tf.Tensor object

mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN, batch_size=-1 )
mnist_test = tfds.load(name="mnist", split=tfds.Split.TEST, batch_size=-1)

tfds.as_numpy return a generator that yields NumPy array records out of a tf.data.Dataset

mnist_train = tfds.as_numpy(mnist_train)
mnist_test = tfds.as_numpy(mnist_test)

x_train, y_train = mnist_train["image"], mnist_train["label"] # seperate the x and y
x_test, y_test = mnist_test["image"], mnist_test["label"]

For more split information please check this link
https://www.tensorflow.org/datasets/splits.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/datasets/issues/720?email_source=notifications&email_token=AAIQMW5BMVNBESQV3MMJK4LP5JJMRA5CNFSM4H4QGKS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY7CJ6I#issuecomment-507389177,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIQMWYZZVYSS75Z5RTX4VTP5JJMRANCNFSM4H4QGKSQ
.

rsepassi on 1 Jul 2019

👍5

Related issue #561.

ChanchalKumarMaji on 2 Jul 2019

Thank you :)

karanchahal on 7 Jul 2019

You can use as_supervised=True kwargs to return an (image, label) tuple expected by keras
For images, you would have in addition to cast/normalize the image to tf.float32, for this, you can use tf.data.Dataset.map.

def _normalize_img(img, label):
  img = tf.cast(img, tf.float32) / 255.
  return (img, label)

ds = tfds.load('mnist', split='train', as_supervised=True)
ds = ds.batch(32)
ds = ds.map(_normalize_img)

model.fit(ds_train, epochs=5)

Conchylicultor on 8 Jul 2019

👍2

@Conchylicultor why reinvent the wheel? tf.image.convert_image_dtype

ds = tfds.load('mnist', split='train', as_supervised=True)
ds = ds.batch(32)
ds = ds.map(lambda img, label: (tf.image.convert_image_dtype(img,dtype=tf.float32), label))

model.fit(ds_train, epochs=5)

hollowgalaxy on 6 Aug 2019

👍1

@Rishan123, not sure how your code corelates with the reference. But try feeding into tfkeras model directly. Something along the lines,

(train, test), info = tfds.load(name="mnist", split=[tfds.Split.TRAIN,tfds.Split.TEST], with_info=True)
train_size = info.splits['train'].num_examples

model = build_model(...)
model.compile(...)
# this works if train is a Dataset with first item img, and second item label
model.fit(x=train.batch(batch_size,drop_remainder=True).repeat.shuffle(1000),
              steps_per_epoch=train_size,
              validation_data=test,
              epochs=epochs)

Something like this should work.

hollowgalaxy on 23 Oct 2019

@Rishan123, as_numpy returns a generator, so you need to iterate over it first.

train_ds = tfds.as_numpy(train_ds)
for ex in train_ds:
  image = ex['image']

Conchylicultor on 23 Oct 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

AttributeError: 'ShuffleDataset' object has no attribute 'output_shapes'

zaabek · 5Comments

Has ReadInstruction been removed?

ericmclachlan · 5Comments

[data request] ted_hrlr_translate/pt_to_en

MahdiNicoo · 3Comments

robonet.py was not downloaded after pip install

Eshan-Agarwal · 3Comments

Broken link in Readme.md

keshan · 5Comments