What I need help with / What I was wondering
Im looking for a clear example to split the labels and examples into x_train and y_train/ x_test and y_test for the cifar100 dataset.
The keras model doesn't take in the tf datasets object into it's fit function.
What I've tried so far
import tensorflow as tf
import tensorflow_datasets as tfds
# tfds works in both Eager and Graph modes
tf.enable_eager_execution()
# See available datasets
print(tfds.list_builders())
# Construct a tf.data.Dataset
dataset = tfds.load(name="cifar100", split=tfds.Split.TRAIN)
# Build your input pipeline
dataset = dataset.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
for features in dataset.take(1):
image, label = features["image"], features["label"]
print(type(image))
print(type(label))
input_shape = (32,32, 3)
num_classes = 10
batch_size = 32
epochs = 10
l = tf.keras.layers
model = tf.keras.Sequential([
l.Conv2D(
32, 5, padding='same', activation='relu', input_shape=input_shape),
l.MaxPooling2D((2, 2), (2, 2), padding='same'),
l.BatchNormalization(),
l.Conv2D(64, 5, padding='same', activation='relu'),
l.MaxPooling2D((2, 2), (2, 2), padding='same'),
l.Flatten(),
l.Dense(1024, activation='relu'),
l.Dropout(0.4),
l.Dense(num_classes, activation='softmax')
])
model.summary()
callbacks = [tf.keras.callbacks.TensorBoard(log_dir=logdir, profile_batch=0)]
model.compile(
loss=tf.keras.losses.categorical_crossentropy,
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
callbacks=callbacks,
validation_data=(x_test, y_test))
score = model.evaluate(train_dt, val_dt, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
It would be nice if...
I could get some way to convert the train_dt tensorflow dataset into something that is vaible to input in as x_train and y_train.
It's really irritating that a demo isn't there to do something so trivial. What use is the colab demo notebook when it doesnt show us how to train a simple keras model with tf datasets. I though tensorflow is trying to unify everything...
Some help would be really appreciated.
Hey @karanchahal thank you for feedback.
You are right the documentation about keras would be great. I will write as soon as possible. Until then you can use this.
# batch_size=-1 to get the full dataset in NumPy arrays from the returned tf.Tensor object
mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN, batch_size=-1 )
mnist_test = tfds.load(name="mnist", split=tfds.Split.TEST, batch_size=-1)
# tfds.as_numpy return a generator that yields NumPy array records out of a tf.data.Dataset
mnist_train = tfds.as_numpy(mnist_train)
mnist_test = tfds.as_numpy(mnist_test)
x_train, y_train = mnist_train["image"], mnist_train["label"] # seperate the x and y
x_test, y_test = mnist_test["image"], mnist_test["label"]
For more split information please check this link.
Note that for the documentation, let's try to use in_memory=True and
as_supervised=True instead of batch_size=-1 and manually splitting the
features. That way I think you could pass the dataset directly to the Keras
model.fit method.
On Mon, Jul 1, 2019 at 12:13 PM us notifications@github.com wrote:
Hey @karanchahal https://github.com/karanchahal thank you for feedback.
You are right the documentation about keras would be great. I will write
as soon as possible. Until then you can use this.batch_size=-1 to get the full dataset in NumPy arrays from the returned tf.Tensor object
mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN, batch_size=-1 )
mnist_test = tfds.load(name="mnist", split=tfds.Split.TEST, batch_size=-1)tfds.as_numpy return a generator that yields NumPy array records out of a tf.data.Dataset
mnist_train = tfds.as_numpy(mnist_train)
mnist_test = tfds.as_numpy(mnist_test)x_train, y_train = mnist_train["image"], mnist_train["label"] # seperate the x and y
x_test, y_test = mnist_test["image"], mnist_test["label"]For more split information please check this link
https://www.tensorflow.org/datasets/splits.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/datasets/issues/720?email_source=notifications&email_token=AAIQMW5BMVNBESQV3MMJK4LP5JJMRA5CNFSM4H4QGKS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY7CJ6I#issuecomment-507389177,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAIQMWYZZVYSS75Z5RTX4VTP5JJMRANCNFSM4H4QGKSQ
.
Related issue #561.
Thank you :)
as_supervised=True kwargs to return an (image, label) tuple expected by kerastf.data.Dataset.map.def _normalize_img(img, label):
img = tf.cast(img, tf.float32) / 255.
return (img, label)
ds = tfds.load('mnist', split='train', as_supervised=True)
ds = ds.batch(32)
ds = ds.map(_normalize_img)
model.fit(ds_train, epochs=5)
@Conchylicultor why reinvent the wheel? tf.image.convert_image_dtype
ds = tfds.load('mnist', split='train', as_supervised=True)
ds = ds.batch(32)
ds = ds.map(lambda img, label: (tf.image.convert_image_dtype(img,dtype=tf.float32), label))
model.fit(ds_train, epochs=5)
@Rishan123, not sure how your code corelates with the reference. But try feeding into tfkeras model directly. Something along the lines,
(train, test), info = tfds.load(name="mnist", split=[tfds.Split.TRAIN,tfds.Split.TEST], with_info=True)
train_size = info.splits['train'].num_examples
model = build_model(...)
model.compile(...)
# this works if train is a Dataset with first item img, and second item label
model.fit(x=train.batch(batch_size,drop_remainder=True).repeat.shuffle(1000),
steps_per_epoch=train_size,
validation_data=test,
epochs=epochs)
Something like this should work.
@Rishan123, as_numpy returns a generator, so you need to iterate over it first.
train_ds = tfds.as_numpy(train_ds)
for ex in train_ds:
image = ex['image']
Most helpful comment
Note that for the documentation, let's try to use in_memory=True and
as_supervised=True instead of batch_size=-1 and manually splitting the
features. That way I think you could pass the dataset directly to the Keras
model.fit method.
On Mon, Jul 1, 2019 at 12:13 PM us notifications@github.com wrote: