Keras: [API Bug] Model.fit()

Created on 11 Oct 2017 · 2Comments · Source: keras-team/keras

Model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
batch_size=128,
steps_per_epoch=2,
validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y))

complains:

Traceback (most recent call last):
  File "examples/mnist_siamese.py", line 133, in <module>
    validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y))
  File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 1603, in fit
  File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 1093, in _fit_loop
  File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 1031, in _check_num_samples
ValueError: If steps_per_epoch is set, the `batch_size` must be None.

but with

Model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y,
batch_size=None,
steps_per_epoch=2,
validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y))

complains :

Traceback (most recent call last):
  File "examples/mnist_siamese.py", line 133, in <module>
    validation_data=([te_pairs[:, 0], te_pairs[:, 1]], te_y))
  File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 1603, in fit
  File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 1155, in _fit_loop
  File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 1332, in _test_loop
  File "build/bdist.linux-x86_64/egg/keras/engine/training.py", line 374, in _make_batches
TypeError: float() argument must be a string or a number

the reason is that it calls _test_loop(batch_size=None)

Calling Model.fit() with Passing validation_steps has no effect on this.

This seems like an API bug to me. Possible solutions:

allow batch_size when steps_per_epoch is set, ignore for training, use for validation.
Introduce a separate validation_batch_size, defaulting to batch_size when steps_per_epoch=None
Use _test_loop(batch_size=1) (slow). Maybe only when validation_steps is set?

I also don't understand the reason for prohibiting simultaneous setting of steps_per_epoch and batch_size. I actually expected to simply train with 2 minibatches per epoch, with different epochs going either 0,1,2,3,4... or 0,1,0,1,0,1... This is Model.fit(), not Model.fit_generator().

Source

ozabluda

Most helpful comment

Same problem, don't understand why steps_per_epoch and batch_size cannot both be set. batch_size determines the number of samples per batch and steps_per_epoch should determine how many such batches must be considered in an epoch.

voletiv on 17 Dec 2017

👍7

All 2 comments

voletiv on 17 Dec 2017

👍7

Same here. I would like to be able to have multiple gradient updates per epoch, and cannot afford to have the entire batch in memory to feed into the model.fit() method. I'm using TFrecords to feed the batch, as in this example: https://github.com/keras-team/keras/blob/master/examples/mnist_tfrecord.py

Suggested simple fix:

I suggest renaming "steps_per_epoch" to "batches_per_epoch" for clarity.

_batch_size_ should govern gradient updates, while
_batches_per_epoch_ should control things such as monitoring progress (i.e. when is an epoch finished). For example:

EPOCH 1:
Batch 1 - loss = 1.22, acc = 0.7
Batch 2 - loss = 1.17, acc = 0.75
EPOCH 2:
Batch 1 - loss = 1.16, acc = 0.78
Batch 2 - loss = 1.05, acc = 0.81
.
.
(and so on).

As a quick workaround for people facing this issue, I suggest setting the steps_per_epoch as the number of samples before gradient update, and to know the actual number of epochs that have passed (i.e. how many times has your model seen the entire dataset), use the following:

n_epochs_true = (n_epochs_displayed * steps_per_epoch) / n_samples_in_dataset

Another workaround is to use the model.train_on_batch() method, but there is no official example as to how to integrate this method with TFrecords instead of numpy arrays.