Keras: Show Loss Every N Batches

Created on 30 May 2016  路  15Comments  路  Source: keras-team/keras

Hey Everyone,

At the moment, the model fitting process returns the accuracy/loss metrics every epoch such as shown below:

Train on 10000 samples, validate on 2000 samples
Epoch 1/2
10000/10000 - 0s - loss: -10.1022 - acc: 0.0515 - val_loss: -49.2346 - val_acc: 0.0464
Epoch 2/2
10000/10000- 0s - loss: -67.9034 - acc: 0.0470 - val_loss: -111.7369 - val_acc: 0.0464

Is there a way to get a more granular control on this Progress Bar process? For example, can we display the accuracy every N batches (lets assume N=100). For example:

Train on 10000 samples, validate on 2000 samples
Epoch 1/2
100/10000 - 0s - loss: -10.1022 - acc: 0.0515 - val_loss: -49.2346 - val_acc: 0.0464
200/10000 -  0s - loss: -12.9034 - acc: 0.0490 - val_loss: -80.7369 - val_acc: 0.0403
...
...
100000/10000 - 0s - loss: -67.9034 - acc: 0.0470 - val_loss: -111.7369 - val_acc: 0.0464 #This final  
# entry can be handled by on_epoch_end

Epoch 2/2
...
...

I have tried taking a looking at the source code in the Callbacks module to try to come up with something that can achieve this but I am just starting out in Python and could appreciate a bit of hand-holding/hinting. Thanks in advance!
Guru

Most helpful comment

The previous demo cannot work due to the 'nb_sample' isn't in self.params. I post a verified demo below, this may help who come up with the same question.

class NBatchLogger(Callback):
    """
    A Logger that log average performance per `display` steps.
    """
    def __init__(self, display):
        self.step = 0
        self.display = display
        self.metric_cache = {}

    def on_batch_end(self, batch, logs={}):
        self.step += 1
        for k in self.params['metrics']:
            if k in logs:
                self.metric_cache[k] = self.metric_cache.get(k, 0) + logs[k]
        if self.step % self.display == 0:
            metrics_log = ''
            for (k, v) in self.metric_cache.items():
                val = v / self.display
                if abs(val) > 1e-3:
                    metrics_log += ' - %s: %.4f' % (k, val)
                else:
                    metrics_log += ' - %s: %.4e' % (k, val)
            print('step: {}/{} ... {}'.format(self.step,
                                          self.params['steps'],
                                          metrics_log))
            self.metric_cache.clear()

All 15 comments

class NBatchLogger(Callback):
    def __init__(self, display):
        self.seen = 0
        self.display = display

    def on_batch_end(self, batch, logs={}):
        self.seen += logs.get('size', 0)
        if self.seen % self.display == 0:
            # you can access loss, accuracy in self.params['metrics']
            print('\n{}/{} - loss ....\n'.format(self.seen, self.params['nb_sample'])) 


Thanks a lot @joelthchao ! I revised your code very slightly to create the NBatchLogger as follows:

class NBatchLogger(Callback):
    def __init__(self,display=100):
        '''
        display: Number of batches to wait before outputting loss
        '''
        self.seen = 0
        self.display = display

    def on_batch_end(self,batch,logs={}):
        self.seen += logs.get('size', 0)
        if self.seen % self.display == 0:
            print '\n{0}/{1} - Batch Loss: {2}'.format(self.seen,self.params['nb_sample'],
                                                self.params['metrics'][0])

I am getting interesting behavior that doesn't quite correspond to what I want but is still acceptable. I am trying to fit a model with the following criteria:
Batch Size = 128
Number of Training Samples = 1872407
The snippet code to fit the model is:

# Output batch loss every 1000 batches
out_batch = NBatchLogger(display=1000)
model.fit([X_train_aux,X_train_main],Y_train,batch_size=128,callbacks=[out_batch])

Running the model doesn't net me a loss report every 1000 batches (or every 128*1000 = 128,000 training records) but a Progress Bar which for some reason shows me snapshots every 16,000 training samples as follows:

Train on 1872407 samples, validate on 468103 samples
Epoch 1/10
  15872/1872407 [..............................] - ETA: 1893s - loss: 55.0340 - acc: 0.0000e+00
  31872/1872407 [..............................] - ETA: 1875s - loss: 49.5706 - acc: 0.0000e+00
47872/1872407 [..............................] - ETA: 1858s - loss: 45.7401 - acc: 0.0000e+00
...
1855872/1872407 [============================>.] - ETA: 16s - loss: 7.5319 - acc: 1.0777e-06
1871872/1872407 [============================>.] - ETA: 0s - loss: 7.5010 - acc: 1.0684e-06
1872384/1872407 [============================>.] - ETA: 0s - loss: 7.5000 - acc: 1.0682e-06

What could explain this behavior?

@guruprad You can try to mute progress bar by setting verbose to 0. Progress bar sometimes overwrite other callback's print message.

Is this going to be in future versions?

The previous demo cannot work due to the 'nb_sample' isn't in self.params. I post a verified demo below, this may help who come up with the same question.

class NBatchLogger(Callback):
    """
    A Logger that log average performance per `display` steps.
    """
    def __init__(self, display):
        self.step = 0
        self.display = display
        self.metric_cache = {}

    def on_batch_end(self, batch, logs={}):
        self.step += 1
        for k in self.params['metrics']:
            if k in logs:
                self.metric_cache[k] = self.metric_cache.get(k, 0) + logs[k]
        if self.step % self.display == 0:
            metrics_log = ''
            for (k, v) in self.metric_cache.items():
                val = v / self.display
                if abs(val) > 1e-3:
                    metrics_log += ' - %s: %.4f' % (k, val)
                else:
                    metrics_log += ' - %s: %.4e' % (k, val)
            print('step: {}/{} ... {}'.format(self.step,
                                          self.params['steps'],
                                          metrics_log))
            self.metric_cache.clear()

what to do with
NameError: name 'Callback' is not defined

what to do with
NameError: name 'Callback' is not defined

from tensorflow.keras.callbacks import callback

@googlesu you can update the class NBatchLogger(callback) with class NBatchLogger(tensorflow.keras.callbacks.Callback)

Thanks @wenmin-wu
I learned from your gist, and found an alternative way, it also works. We can subclass ProgbarLogger, which is hard-coded in model.fit in tensorflow. When subclassing it, we can modify the on_batch_end, so we do not update progbar every batch.

Besides, NBatchLogger from wenmin-wu will print the metric averaged per display steps. And NBatchProgBarLogger will average per epoch, the same as the default behaviour in model.fit.

To use this solution:
model.fit([X_train_aux,X_train_main],Y_train,batch_size=BS,verbose=0,callbacks=[NBatchProgBarLogger()])

please pay attention the verbose=0, so we disable the hard-coded one, and use our subclassed NBatchProgBarLogger to log the losses and metrics.

The code is followed:

class NBatchProgBarLogger(tensorflow.keras.callbacks.ProgbarLogger):
    def __init__(self, count_mode='samples', stateful_metrics=None, display_per_batches=1000, verbose=1):
        super(NBatchProgBarLogger, self).__init__(count_mode, stateful_metrics)
        self.display_per_batches = display_per_batches
        self.display_step = 1
        self.verbose = verbose

    def on_train_begin(self, logs=None):
        self.epochs = self.params['epochs']

    def on_batch_end(self, batch, logs=None):
        logs = logs or {}
        batch_size = logs.get('size', 0)
        # In case of distribution strategy we can potentially run multiple steps
        # at the same time, we should account for that in the `seen` calculation.
        num_steps = logs.get('num_steps', 1)
        if self.use_steps:
            self.seen += num_steps
        else:
            self.seen += batch_size * num_steps

        for k in self.params['metrics']:
            if k in logs:
                self.log_values.append((k, logs[k]))

        self.display_step += 1
        # Skip progbar update for the last batch;
        # will be handled by on_epoch_end.
        if self.verbose and self.seen < self.target and self.display_step % self.display_per_batches == 0:
            self.progbar.update(self.seen, self.log_values)

Here is how it looks:
asciicast

@pennz This implementation has some issues (but might be a step in the right direction). I don't see the progress bar that progresses. Furthermore, I only see the metrics at the end of the epoch and I don't see e.g. 100/100 (where 100, in this case, is the number of steps in one epoch), but I see something like 8000/100. Moreover, I see a very long progress bar.

Which TF and Keras version are you using?

I've found that this implementation will not perform validation. Does anybody know how to modify to ensure that validation is also performed after every N batches?

@pennz This implementation has some issues (but might be a step in the right direction). I don't see the progress bar that progresses. Furthermore, I only see the metrics at the end of the epoch and I don't see e.g. 100/100 (where 100, in this case, is the number of steps in one epoch), but I see something like 8000/100. Moreover, I see a very long progress bar.

Which TF and Keras version are you using?

@nbro you can check this https://www.kaggle.com/mmmarchetti/flowers-on-tpu-ii#Models . It just meets the needs for OP. And it is Tensorflow version 2.1.0.

The previous demo cannot work due to the 'nb_sample' isn't in self.params. I post a verified demo below, this may help who come up with the same question.

class NBatchLogger(Callback):
    """
    A Logger that log average performance per `display` steps.
    """
    def __init__(self, display):
        self.step = 0
        self.display = display
        self.metric_cache = {}

    def on_batch_end(self, batch, logs={}):
        self.step += 1
        for k in self.params['metrics']:
            if k in logs:
                self.metric_cache[k] = self.metric_cache.get(k, 0) + logs[k]
        if self.step % self.display == 0:
            metrics_log = ''
            for (k, v) in self.metric_cache.items():
                val = v / self.display
                if abs(val) > 1e-3:
                    metrics_log += ' - %s: %.4f' % (k, val)
                else:
                    metrics_log += ' - %s: %.4e' % (k, val)
            print('step: {}/{} ... {}'.format(self.step,
                                          self.params['steps'],
                                          metrics_log))
            self.metric_cache.clear()

I am using fit_generator() and got KeyError: 'metrics'. The print out of self.params is {'verbose': 0, 'epochs': 2500, 'steps': 22} with 'metrics' nowhere in sight. I am using keras on tensorflow 1.12. What could be the problem?

Just curious: What would be difference compared to simply using fewer iterations (e.g. factor 10) per epoch? In this case, the network sees the complete training dataset every 10 "epochs", such that the validation accuracy and loss are calculated 10 times more often. Are there any disadvantages to this method, assuming that the training generator simply continues where it has left off the previous epoch and randomizes again if all the data has been seen?

for everyone with this problem, theres a really simple solution to this that can actually work for you

class print_on_end(Callback):
  def on_batch_end(self, batch, logs={}):
    print()

you want to call it like this

model.fit(training_dataset, steps_per_epoch=num_training_samples, epochs=EPOCHS,validation_data=validation_dataset, callbacks=[print_on_end()])

the output is like this:

 1/18 [>.............................] - ETA: 0s - loss: 7.2655 - mean_squared_error: 7.2655
 2/18 [==>...........................] - ETA: 29s - loss: 8.3142 - mean_squared_error: 8.3142
 3/18 [====>.........................] - ETA: 36s - loss: 6.9459 - mean_squared_error: 6.9459
 4/18 [=====>........................] - ETA: 38s - loss: 9.0257 - mean_squared_error: 9.0257
 5/18 [=======>......................] - ETA: 38s - loss: 8.7444 - mean_squared_error: 8.7444
 6/18 [=========>....................] - ETA: 36s - loss: 8.1615 - mean_squared_error: 8.1615
 7/18 [==========>...................] - ETA: 34s - loss: 7.5837 - mean_squared_error: 7.5837
 8/18 [============>.................] - ETA: 32s - loss: 7.2963 - mean_squared_error: 7.2963
 9/18 [==============>...............] - ETA: 29s - loss: 7.3671 - mean_squared_error: 7.3671

the solution is described here
https://stackoverflow.com/questions/52205315/plot-loss-evolution-during-a-single-epoch-in-keras

you can also catch the history and plot the loss by doing:

history = model.fit(trainin...
Was this page helpful?
0 / 5 - 0 ratings

Related issues

lmoesch picture lmoesch  路  89Comments

phipleg picture phipleg  路  60Comments

wx405557858 picture wx405557858  路  71Comments

patyork picture patyork  路  73Comments

tetmin picture tetmin  路  83Comments