Hey Everyone,
At the moment, the model fitting process returns the accuracy/loss metrics every epoch such as shown below:
Train on 10000 samples, validate on 2000 samples
Epoch 1/2
10000/10000 - 0s - loss: -10.1022 - acc: 0.0515 - val_loss: -49.2346 - val_acc: 0.0464
Epoch 2/2
10000/10000- 0s - loss: -67.9034 - acc: 0.0470 - val_loss: -111.7369 - val_acc: 0.0464
Is there a way to get a more granular control on this Progress Bar process? For example, can we display the accuracy every N batches (lets assume N=100). For example:
Train on 10000 samples, validate on 2000 samples
Epoch 1/2
100/10000 - 0s - loss: -10.1022 - acc: 0.0515 - val_loss: -49.2346 - val_acc: 0.0464
200/10000 - 0s - loss: -12.9034 - acc: 0.0490 - val_loss: -80.7369 - val_acc: 0.0403
...
...
100000/10000 - 0s - loss: -67.9034 - acc: 0.0470 - val_loss: -111.7369 - val_acc: 0.0464 #This final
# entry can be handled by on_epoch_end
Epoch 2/2
...
...
I have tried taking a looking at the source code in the Callbacks module to try to come up with something that can achieve this but I am just starting out in Python and could appreciate a bit of hand-holding/hinting. Thanks in advance!
Guru
class NBatchLogger(Callback):
def __init__(self, display):
self.seen = 0
self.display = display
def on_batch_end(self, batch, logs={}):
self.seen += logs.get('size', 0)
if self.seen % self.display == 0:
# you can access loss, accuracy in self.params['metrics']
print('\n{}/{} - loss ....\n'.format(self.seen, self.params['nb_sample']))
Thanks a lot @joelthchao ! I revised your code very slightly to create the NBatchLogger as follows:
class NBatchLogger(Callback):
def __init__(self,display=100):
'''
display: Number of batches to wait before outputting loss
'''
self.seen = 0
self.display = display
def on_batch_end(self,batch,logs={}):
self.seen += logs.get('size', 0)
if self.seen % self.display == 0:
print '\n{0}/{1} - Batch Loss: {2}'.format(self.seen,self.params['nb_sample'],
self.params['metrics'][0])
I am getting interesting behavior that doesn't quite correspond to what I want but is still acceptable. I am trying to fit a model with the following criteria:
Batch Size = 128
Number of Training Samples = 1872407
The snippet code to fit the model is:
# Output batch loss every 1000 batches
out_batch = NBatchLogger(display=1000)
model.fit([X_train_aux,X_train_main],Y_train,batch_size=128,callbacks=[out_batch])
Running the model doesn't net me a loss report every 1000 batches (or every 128*1000 = 128,000 training records) but a Progress Bar which for some reason shows me snapshots every 16,000 training samples as follows:
Train on 1872407 samples, validate on 468103 samples
Epoch 1/10
15872/1872407 [..............................] - ETA: 1893s - loss: 55.0340 - acc: 0.0000e+00
31872/1872407 [..............................] - ETA: 1875s - loss: 49.5706 - acc: 0.0000e+00
47872/1872407 [..............................] - ETA: 1858s - loss: 45.7401 - acc: 0.0000e+00
...
1855872/1872407 [============================>.] - ETA: 16s - loss: 7.5319 - acc: 1.0777e-06
1871872/1872407 [============================>.] - ETA: 0s - loss: 7.5010 - acc: 1.0684e-06
1872384/1872407 [============================>.] - ETA: 0s - loss: 7.5000 - acc: 1.0682e-06
What could explain this behavior?
@guruprad You can try to mute progress bar by setting verbose to 0. Progress bar sometimes overwrite other callback's print message.
Is this going to be in future versions?
The previous demo cannot work due to the 'nb_sample'
isn't in self.params
. I post a verified demo below, this may help who come up with the same question.
class NBatchLogger(Callback):
"""
A Logger that log average performance per `display` steps.
"""
def __init__(self, display):
self.step = 0
self.display = display
self.metric_cache = {}
def on_batch_end(self, batch, logs={}):
self.step += 1
for k in self.params['metrics']:
if k in logs:
self.metric_cache[k] = self.metric_cache.get(k, 0) + logs[k]
if self.step % self.display == 0:
metrics_log = ''
for (k, v) in self.metric_cache.items():
val = v / self.display
if abs(val) > 1e-3:
metrics_log += ' - %s: %.4f' % (k, val)
else:
metrics_log += ' - %s: %.4e' % (k, val)
print('step: {}/{} ... {}'.format(self.step,
self.params['steps'],
metrics_log))
self.metric_cache.clear()
what to do with
NameError: name 'Callback' is not defined
what to do with
NameError: name 'Callback' is not defined
from tensorflow.keras.callbacks import callback
@googlesu you can update the class NBatchLogger(callback)
with class NBatchLogger(tensorflow.keras.callbacks.Callback)
Thanks @wenmin-wu
I learned from your gist, and found an alternative way, it also works. We can subclass ProgbarLogger
, which is hard-coded in model.fit
in tensorflow. When subclassing it, we can modify the on_batch_end
, so we do not update progbar
every batch.
Besides, NBatchLogger from wenmin-wu will print the metric averaged per display
steps. And NBatchProgBarLogger will average per epoch, the same as the default behaviour in model.fit
.
To use this solution:
model.fit([X_train_aux,X_train_main],Y_train,batch_size=BS,verbose=0,callbacks=[NBatchProgBarLogger()])
please pay attention the verbose=0
, so we disable the hard-coded one, and use our subclassed NBatchProgBarLogger to log the losses and metrics.
The code is followed:
class NBatchProgBarLogger(tensorflow.keras.callbacks.ProgbarLogger):
def __init__(self, count_mode='samples', stateful_metrics=None, display_per_batches=1000, verbose=1):
super(NBatchProgBarLogger, self).__init__(count_mode, stateful_metrics)
self.display_per_batches = display_per_batches
self.display_step = 1
self.verbose = verbose
def on_train_begin(self, logs=None):
self.epochs = self.params['epochs']
def on_batch_end(self, batch, logs=None):
logs = logs or {}
batch_size = logs.get('size', 0)
# In case of distribution strategy we can potentially run multiple steps
# at the same time, we should account for that in the `seen` calculation.
num_steps = logs.get('num_steps', 1)
if self.use_steps:
self.seen += num_steps
else:
self.seen += batch_size * num_steps
for k in self.params['metrics']:
if k in logs:
self.log_values.append((k, logs[k]))
self.display_step += 1
# Skip progbar update for the last batch;
# will be handled by on_epoch_end.
if self.verbose and self.seen < self.target and self.display_step % self.display_per_batches == 0:
self.progbar.update(self.seen, self.log_values)
@pennz This implementation has some issues (but might be a step in the right direction). I don't see the progress bar that progresses. Furthermore, I only see the metrics at the end of the epoch and I don't see e.g. 100/100 (where 100, in this case, is the number of steps in one epoch), but I see something like 8000/100. Moreover, I see a very long progress bar.
Which TF and Keras version are you using?
I've found that this implementation will not perform validation. Does anybody know how to modify to ensure that validation is also performed after every N batches?
@pennz This implementation has some issues (but might be a step in the right direction). I don't see the progress bar that progresses. Furthermore, I only see the metrics at the end of the epoch and I don't see e.g. 100/100 (where 100, in this case, is the number of steps in one epoch), but I see something like 8000/100. Moreover, I see a very long progress bar.
Which TF and Keras version are you using?
@nbro you can check this https://www.kaggle.com/mmmarchetti/flowers-on-tpu-ii#Models . It just meets the needs for OP. And it is Tensorflow version 2.1.0.
The previous demo cannot work due to the
'nb_sample'
isn't inself.params
. I post a verified demo below, this may help who come up with the same question.class NBatchLogger(Callback): """ A Logger that log average performance per `display` steps. """ def __init__(self, display): self.step = 0 self.display = display self.metric_cache = {} def on_batch_end(self, batch, logs={}): self.step += 1 for k in self.params['metrics']: if k in logs: self.metric_cache[k] = self.metric_cache.get(k, 0) + logs[k] if self.step % self.display == 0: metrics_log = '' for (k, v) in self.metric_cache.items(): val = v / self.display if abs(val) > 1e-3: metrics_log += ' - %s: %.4f' % (k, val) else: metrics_log += ' - %s: %.4e' % (k, val) print('step: {}/{} ... {}'.format(self.step, self.params['steps'], metrics_log)) self.metric_cache.clear()
I am using fit_generator() and got KeyError: 'metrics'. The print out of self.params is {'verbose': 0, 'epochs': 2500, 'steps': 22} with 'metrics' nowhere in sight. I am using keras on tensorflow 1.12. What could be the problem?
Just curious: What would be difference compared to simply using fewer iterations (e.g. factor 10) per epoch? In this case, the network sees the complete training dataset every 10 "epochs", such that the validation accuracy and loss are calculated 10 times more often. Are there any disadvantages to this method, assuming that the training generator simply continues where it has left off the previous epoch and randomizes again if all the data has been seen?
for everyone with this problem, theres a really simple solution to this that can actually work for you
class print_on_end(Callback):
def on_batch_end(self, batch, logs={}):
print()
you want to call it like this
model.fit(training_dataset, steps_per_epoch=num_training_samples, epochs=EPOCHS,validation_data=validation_dataset, callbacks=[print_on_end()])
the output is like this:
1/18 [>.............................] - ETA: 0s - loss: 7.2655 - mean_squared_error: 7.2655
2/18 [==>...........................] - ETA: 29s - loss: 8.3142 - mean_squared_error: 8.3142
3/18 [====>.........................] - ETA: 36s - loss: 6.9459 - mean_squared_error: 6.9459
4/18 [=====>........................] - ETA: 38s - loss: 9.0257 - mean_squared_error: 9.0257
5/18 [=======>......................] - ETA: 38s - loss: 8.7444 - mean_squared_error: 8.7444
6/18 [=========>....................] - ETA: 36s - loss: 8.1615 - mean_squared_error: 8.1615
7/18 [==========>...................] - ETA: 34s - loss: 7.5837 - mean_squared_error: 7.5837
8/18 [============>.................] - ETA: 32s - loss: 7.2963 - mean_squared_error: 7.2963
9/18 [==============>...............] - ETA: 29s - loss: 7.3671 - mean_squared_error: 7.3671
the solution is described here
https://stackoverflow.com/questions/52205315/plot-loss-evolution-during-a-single-epoch-in-keras
you can also catch the history and plot the loss by doing:
history = model.fit(trainin...
Most helpful comment
The previous demo cannot work due to the
'nb_sample'
isn't inself.params
. I post a verified demo below, this may help who come up with the same question.