Please make sure that the boxes below are checked before you submit your issue. Thank you!
Currently the TensorBoard callback does not create histograms when a generator is used to create validation data. When a generator is passed as validation_data
to model.fit_generator
, the self.validation_data
attribute is not set:
However, in order to generate histograms, self.validation_data
currently must evaluate to True
:
I would like to see a way to create histograms even when using a val_gen
, unfortunately I can't think of a very clear way to do this. My current workaround is to not pass a generator, but to exhaust it until I have the required number of validation samples. I then concatenate the samples and pass them as a plain array. However, this workaround will fail once my whole validation dataset does not fit into memory anymore. So I created this issue to discuss possible better solutions.
I just ran into this and I'm just putting my validation data in memory and that doesn't crash anything, but it's a rough edge that could be cleaned up.
The way to do this is probably to setup an accumulator variable for the distribution and add to through each batch of validation data, then divide and evaluate the summary node after we're done going through the batches. This would also resolve the comment about GPU memory that's currently there.
Alternatively, what Google's models seem to do is keep a moving average: https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py#L288 Which might be easier to implement than an accumulator, since I don't really see a way to capture the histogram to even accumulate...
does the pull request will be merged?
I'm having the same use case
Would be really nice to get this fixed. Having the same problem in keras 1.2.2 - 2.0.4.
same problem here!
any comment Monsieur @fchollet ? :)
i'm not sure but i think this works now in 2.0.5
This is still an issue for me in 2.0.5. It writes out scalars and graphs, but no histograms, or distributions.
A relatively simple fix is to "fill in" the validation_data
property before the TensorBoard on_epoch_end
hook is called by inheriting TensorBoard
in a wrapper like below. Obviously the way you fill in validation_data is specific to your problem. You can just replace your TensorBoard
callback with TensorBoardWrapper
, pass the batch_gen
and nb_steps
arguments, and then all of the same arguments as `Tensorboard. Unfortunately this also means that if you are using a generator for validation, it will get called once in the wrapper and then again for validation. If you can afford to keep your data in memory, the below solution could be moved into the on_train_begin hook.
class TensorBoardWrapper(TensorBoard):
'''Sets the self.validation_data property for use with TensorBoard callback.'''
def __init__(self, batch_gen, nb_steps, **kwargs):
super().__init__(**kwargs)
self.batch_gen = batch_gen # The generator.
self.nb_steps = nb_steps # Number of times to call next() on the generator.
def on_epoch_end(self, epoch, logs):
# Fill in the `validation_data` property. Obviously this is specific to how your generator works.
# Below is an example that yields images and classification tags.
# After it's filled in, the regular on_epoch_end method has access to the validation_data.
imgs, tags = None, None
for s in range(self.nb_steps):
ib, tb = next(self.batch_gen)
if imgs is None and tags is None:
imgs = np.zeros((self.nb_steps * ib.shape[0], *ib.shape[1:]), dtype=np.float32)
tags = np.zeros((self.nb_steps * tb.shape[0], *tb.shape[1:]), dtype=np.uint8)
imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
tags[s * tb.shape[0]:(s + 1) * tb.shape[0]] = tb
self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0]
return super().on_epoch_end(epoch, logs)
...
callbacks = [TensorBoardWrapper(gen_val, nb_steps=5, log_dir=self.cfg['cpdir'], histogram_freq=1,
batch_size=32, write_graph=False, write_grads=True)]
...
Thanks a lot! Unfortunately I ran into the next bug: issue 6364. But I least I can now fill the validation data.
Will post an update once I update to the newest keras version.
Thanks a lot for the code snippet!
Here's the Python2 copy & paste version for lazy people like me:
class TensorBoardWrapper(ks.callbacks.TensorBoard):
'''Sets the self.validation_data property for use with TensorBoard callback.'''
def __init__(self, batch_gen, nb_steps, **kwargs):
super(TensorBoardWrapper, self).__init__(**kwargs)
self.batch_gen = batch_gen # The generator.
self.nb_steps = nb_steps # Number of times to call next() on the generator.
def on_epoch_end(self, epoch, logs):
# Fill in the `validation_data` property. Obviously this is specific to how your generator works.
# Below is an example that yields images and classification tags.
# After it's filled in, the regular on_epoch_end method has access to the validation_data.
imgs, tags = None, None
for s in range(self.nb_steps):
ib, tb = next(self.batch_gen)
if imgs is None and tags is None:
imgs = np.zeros(((self.nb_steps * ib.shape[0],) + ib.shape[1:]), dtype=np.float32)
tags = np.zeros(((self.nb_steps * tb.shape[0],) + tb.shape[1:]), dtype=np.uint8)
imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
tags[s * tb.shape[0]:(s + 1) * tb.shape[0]] = tb
self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0]
return super(TensorBoardWrapper, self).on_epoch_end(epoch, logs)
If i understand correctly the solution @alexklibisz suggested, we still have to load the entire validation suite into memory.
Is there a way to view the histograms on the entire validation set without loading all in once to memory?
Same issue. I train over multiple GPUs and putting my validation data into memory isn't possible with today's computers. Following thread.
@GalAvineri I think that's the fundamental issue -- Tensorboard (or at least keras' usage of tensorboard) requires having all the data at once.
For what it's worth, in my experience if the model is performing poorly enough to require debugging via Tensorboard, the problems will still exist for a subset of the original validation set. Similarly for @isaacgerg, perhaps you can still use a small dataset on a single GPU to replicate the behavior which you're trying to debug/observe via Tensorboard.
@alexklibisz Makes sense. However, there still another issue. It looks likes the weights are only shown for some of my convolutional layers. Any idea why?
@isaacgerg I'm not sure why that would be. In my experience it's helpful to explicitly name them because it's very easy to get lost in the many kernels, biases, gradients, activations, outputs, etc..
@alexklibisz I usually develop with tensorflow but had a small toy problem and thought I would give keras a try again but alas, the tb functionality doesn't seem to work right at the moment. Would providing any screenshots or code help you to help me? (I dont want to make any extra work for anyone.)
I used alexklibisz's code above and it got me some TensorBoard functionality working using validation generators (ie. TensorBoard scalars). But TensorBoard Images were still blank
ie.
does anyone know a way to get Images working with validation generators?
@paragon00 This issue is similar to mine, except I get a subset of the images from the conv2d layers. Oddly enough, layers with stride=2 are the ones it exclusively displays.
@alexklibisz I adapted your code to my fit_generator:
callbacks = [
#EarlyStopping(monitor='val_loss', patience=10, verbose=0),
ModelCheckpoint('./models/XVIII-2-200-reseg5_weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', save_best_only=True, verbose=2),
CSVLogger('./log/XVIII-log.csv',append=False, separator=','),
TensorBoardWrapper(validation_generator, nb_steps=nb_validation_samples // batch_size, log_dir='./tf-log',
histogram_freq=1,
batch_size=int(batch_size), write_graph=False, write_grads=True)]
#train
model.fit_generator(train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,verbose=2
,callbacks=callbacks
)
but after epoch 1, there's a error:
Traceback (most recent call last):
File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-10-d09cfa5a9da2>", line 6, in <module>
,callbacks=callbacks
File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/training.py", line 1426, in fit_generator
initial_epoch=initial_epoch)
File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/training_generator.py", line 229, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/callbacks.py", line 77, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "<ipython-input-7-c03528813be3>", line 20, in on_epoch_end
imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
ValueError: could not broadcast input array from shape (64,250,250,3) into shape (47,250,250,3)
I guess there's a mismatch between batch size, how can I fix this?
hi @NTNguyen13 , what are the values for the following variables?
nb_train_samples
nb_validation_samples
batch_size
I suspect setting validation_steps=math.ceil(nb_validation_samples / batch_size)
will help
hi @ShiangYong
nb_train_samples = 9004
nb_validation_samples = 2245
batch_size = 64
I tried to set nb_validation_samples = 2240(equal to 64*35), but still, I got the same error. I will try your method and notify later! Thank you
I was wondering why using a Sequence
dataset was considered a generator? In the keras/engine/training_generator.py
, it basically doesn't assign self.validation_data
to each callback if val_gen
is True
. And the val_gen
is True
if the validation datasets is an instance of the Sequence
class! But this doesn't make much sense, since a Sequence
dataset still offers a random access interface, you can always call the __getitem__
function on any index into the dataset.
This makes it difficult to write callbacks that is intended to use the self.validation_data
as does the Tensorboard callback currently.
Using
class TensorBoardWrapper(TensorBoard):
'''Sets the self.validation_data property for use with TensorBoard callback.'''
def __init__(self, batch_gen, nb_steps, b_size, **kwargs):
super(TensorBoardWrapper, self).__init__(**kwargs)
self.batch_gen = batch_gen # The generator.
self.nb_steps = nb_steps # Number of times to call next() on the generator.
#self.batch_size = b_size
def on_epoch_end(self, epoch, logs):
# Fill in the `validation_data` property. Obviously this is specific to how your generator works.
# Below is an example that yields images and classification tags.
# After it's filled in, the regular on_epoch_end method has access to the validation_data.
imgs, tags = None, None
for s in range(self.nb_steps):
ib, tb = next(self.batch_gen)
if imgs is None and tags is None:
imgs = np.zeros(((self.nb_steps * self.batch_size,) + ib.shape[1:]), dtype=np.float32)
tags = np.zeros(((self.nb_steps * self.batch_size,) + tb.shape[1:]), dtype=np.uint8)
imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
tags[s * tb.shape[0]:(s + 1) * tb.shape[0]] = tb
self.validation_data = [imgs, tags, np.ones(imgs.shape[0])]
return super(TensorBoardWrapper, self).on_epoch_end(epoch, logs)
and
TensorBoardWrapper(self.val_generator2, ceil(self.val_dataset_size / self.batch_size), self.batch_size, log_dir="{}/{}".format(self.logs_dir, time()), histogram_freq=1, batch_size=self.batch_size)
worked for me. Notice that initialization of imgs
and tags
uses batch_size
( imgs = np.zeros(((self.nb_steps * self.batch_size,) + ib.shape[1:]), dtype=np.float32)
) instead of the first element of the batch's shape (imgs = np.zeros(((self.nb_steps * ib.shape[0],) + ib.shape[1:]), dtype=np.float32)
).
This is because if total_batches % batch_size != 0
the first call to next(self.batch_gen)
will return a batch who's shape's first element is not equal to the batch size, resulting in the same broadcast shape error @NTNguyen13 reported.
Then I got an AssertionError
in loc 884 from Keras' callbacks.py
. According to that loc validation_data
must have at most three elements just like that tensors
array.
I fixed that by changing self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0]
to self.validation_data = [imgs, tags, np.ones(imgs.shape[0])]
. I kept the third element as np.ones(imgs.shape[0])
because my generator only outputs images and labels.
Remember to use a generator that supports multi-threading or use two instances of the same generator to avoid getting a ValueError: generator already executing
. I used two instances for a quick fix.
I'm using keras 2.1.4 and tensorflow-gpu 1.4.1 on one NVIDIA Titan Xp with CUDA 8 and I haven't run into any memory issues.
Closing as this is resolved
@juiceboxjoe thanks for sharing the fixed code!
Excuse me if my questions are not that well written, I am still adapting to python, TensorFlow and keras :)
I have two questions:
on_epoch_end
function at the step of ib, tb = next(self.batch_gen)
because I got an error saying that my DataGenerator is not an iterator. Instead I used for ib, tb in self.batch_gen:
directly. Then you also don't need to take into account how often you call the DataGenerator. I am using the TensorBoardWrapper
like this:tbCallBack = TensorBoardWrapper(validation_generator, val_df.shape[0] // validation_generator.batch_size, validation_generator.batch_size, log_dir='./logs/')
val_df
is a pandas data frame containing the ids of the validation set, thus val_df.shape[0]
gives me the size of the validation set.
am I using it wrongly?
AssertionError
you mentioned. Can someone explain to me why that happens? Because I would like to use a Batch Normalisation for the benefits of faster convergence.I can confirm what @MaximilianProll says about AssertionError in the case that you have a BatchNormalization layer in your model. It would be nice to know a fix.
Using the TensorBoardWrapper classes provided here gives me:
Traceback (most recent call last):
File "/home/guy/workspace/shapedo.com/shapedo/external/ml/cnn_sliding_window/train.py", line 156, in <module>
shuffle=True,
File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 94, in fit_generator
callbacks.set_model(callback_model)
File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 54, in set_model
callback.set_model(model)
File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 799, in set_model
weight)
File "/usr/local/lib/python3.6/dist-packages/keras/optimizers.py", line 91, in get_gradients
raise ValueError('An operation has `None` for gradient. '
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My current optimizer looks like this:
opt = keras.optimizers.rmsprop()
Same with adam()
too
A Keras utils.Sequence should be allowed. But when using Keras Sequence, Tensorboard still replies back with:
ValueError: If printing histograms, validation_data must be provided, and cannot be a generator.
Keras sequences can support slices or at least the overloaded __getitem__
should handle a slice type.
TensorBoardWrapper Class indeed gives me great help when my net contains no BatchNormalization layer. But if I add BatchNormalization, I would get the same error as guysoft.
ValueError: An operation has
Nonefor gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Any suggestion or question will be thankful.
@CinderellaRobaker see https://github.com/keras-team/keras/issues/10881#issuecomment-436203980
I tried @juiceboxjoe's code, but it leads to the following exception:
...
File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 941, in on_epoch_end
result = self.sess.run([self.merged], feed_dict=feed_dict)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1128, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (32,) for Tensor 'dense_2_target:0', which has shape '(?, ?)'
I am using the default ImageDataGenerator
shipped with keras.
Any ideas what could be the problem here?
I managed to get @juiceboxjoe s version running, but I had to include some of the sections he removed ( self.batch_size = b_size
, self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0]
)
I'm using keras 2.2.4 based on tensorflow-gpu version 1.13.1
My full Wrapper:
class TensorBoardWrapper(keras.callbacks.TensorBoard):
'''Sets the self.validation_data property for use with TensorBoard callback.'''
def __init__(self, batch_gen, nb_steps, b_size, **kwargs):
super(TensorBoardWrapper, self).__init__(**kwargs)
self.batch_gen = batch_gen # The generator.
self.nb_steps = nb_steps # Number of times to call next() on the generator.
self.batch_size = b_size
def on_epoch_end(self, epoch, logs):
# Fill in the `validation_data` property. Obviously this is specific to how your generator works.
# Below is an example that yields images and classification tags.
# After it's filled in, the regular on_epoch_end method has access to the validation_data.
imgs, tags = None, None
for s in range(self.nb_steps):
ib, tb = next(self.batch_gen)
if imgs is None and tags is None:
imgs = np.zeros(((self.nb_steps * self.batch_size,) + ib.shape[1:]), dtype=np.float32)
tags = np.zeros(((self.nb_steps * self.batch_size,) + tb.shape[1:]), dtype=np.float32)
imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
tags[s * tb.shape[0]:(s + 1) * tb.shape[0]] = tb
self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0]
return super(TensorBoardWrapper, self).on_epoch_end(epoch, logs)
Called like this:
tBCallback = TensorBoardWrapper(test_it, math.ceil(image_count_test/config.batch_size), config.batch_size,
log_dir=model_dir_path, histogram_freq=5, write_graph=True, write_images=True, write_grads=True)
ValueError: could not broadcast input array from shape (64,224,224,3) into shape (0,224,224,3)
Hi can any one help me with this error?
Note:I am implementing above @rabenimmermehr code and I am using flow_from_dataframe as my generator.
Hello,
I have the same problem.
I'm using keras 2.2.5
in google colab
Use fit method instead of fit_generator and keep hist_freq as any integer
but not zero
On Tue 17 Mar, 2020, 10:29 PM Mehadi Hasan Menon, notifications@github.com
wrote:
Hello,
I have the same problem.
I'm using keras 2.2.5 in google colab
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/3358#issuecomment-600184831,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AJZB2Z342EI2B66T2CY2PJDRH6T7XANCNFSM4CLEPRIA
.
@srinivascnu166
My dataset is very large. So I can not load all the data at the same time.
So I have to use fit_generator
Is there any way to fix it with fit_generator
Thanks
Generator can also be used with fit method
In your code just replace fit_generator with fit method.
On Tue 17 Mar, 2020, 10:49 PM Mehadi Hasan Menon, notifications@github.com
wrote:
@srinivascnu166 https://github.com/srinivascnu166
My dataset is very large. So I can not load all the data at the same time.
So I have to use fit_generatorIs there any way to fix it with fit_generator
Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/3358#issuecomment-600195071,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AJZB2Z6RCUSHMRY3EHAXEZDRH6WKRANCNFSM4CLEPRIA
.
@srinivascnu166
After replace fit_generator
with fit
I'm getting this error,
TypeError: Unrecognized keyword arguments: {'generator': <generator object BatchGenerator.next_batch at 0x7f8fae260a40>}
Are you using tensorflow.keras or just tensorflow for importing libraries?
On Tue 17 Mar, 2020, 10:58 PM Mehadi Hasan Menon, notifications@github.com
wrote:
@srinivascnu166 https://github.com/srinivascnu166
After replace fit_generator with fit I'm getting this error,
TypeError: Unrecognized keyword arguments: {'generator':
BatchGenerator.next_batch at 0x7f8fae260a40>} —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/3358#issuecomment-600199459,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AJZB2Z5OXBU24GQUUR7YZ73RH6XKPANCNFSM4CLEPRIA
.
I'm not using tensorflow.keras
Then use that....
Because now keras is officially integrated with tensorflow in newer versions
On Wed 18 Mar, 2020, 7:23 PM Mehadi Hasan Menon, notifications@github.com
wrote:
I'm not using tensorflow.keras
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/3358#issuecomment-600636226,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AJZB2Z6YTLLIV3EQ4M63MHDRIDG5VANCNFSM4CLEPRIA
.
I tried @juiceboxjoe's code, but it leads to the following exception:
... File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 941, in on_epoch_end result = self.sess.run([self.merged], feed_dict=feed_dict) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1128, in _run str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (32,) for Tensor 'dense_2_target:0', which has shape '(?, ?)'
I am using the default
ImageDataGenerator
shipped with keras.
Any ideas what could be the problem here?
are you solve your problem?i have the same question with you
Most helpful comment
A relatively simple fix is to "fill in" the
validation_data
property before the TensorBoardon_epoch_end
hook is called by inheritingTensorBoard
in a wrapper like below. Obviously the way you fill in validation_data is specific to your problem. You can just replace yourTensorBoard
callback withTensorBoardWrapper
, pass thebatch_gen
andnb_steps
arguments, and then all of the same arguments as `Tensorboard. Unfortunately this also means that if you are using a generator for validation, it will get called once in the wrapper and then again for validation. If you can afford to keep your data in memory, the below solution could be moved into the on_train_begin hook.