Keras: Quick Question: can a model be fit for multiple times?

Created on 20 Nov 2016 · 17Comments · Source: keras-team/keras

I'm now having something around 500000 data points. They are all texts so they could be fit into the memory together. But if I fit all of those, when I try to compile the model, it gave me killed: 9 because there's no way to fit the model into the memory.

I've tried the first 1000 points of the data and it could be ft and trained well using batch-size 150.

So I'm now thinking maybe to take in the data points 1000 at a time or maybe bigger, and fit the model one by one. However, I'm not sure at this moment how to do that. Do I call model.fit multiple times? Also if I want to train more than one epoch, will it give me the same results to train the model with n_epoch = 1 and iterate for 10 times? Is that somehow like this?

for _ in range(10):
#somehow cut the data into slices and fit them one by one
    model.fit(data_slice, label_slice ......)

Source

wangtjwork

👍27

Most helpful comment

Yes, successive calls to fit will incrementally train the model.

fchollet on 20 Nov 2016

👍115 🎉26 ❤19

All 17 comments

Yes, successive calls to fit will incrementally train the model.

fchollet on 20 Nov 2016

👍115 🎉26 ❤19

I have a related issue. I generate data on the fly, and the generated data depends on the current state of the network. So I am calling fit many times in a loop. The data for each training session is very limited, hence there is no time to adapt hyperparameters during a single call to fit.

Therefore my question: when calling fit multiple times, is the internal state of the optimizer (in my case adadelta) preserved and are learning rates adapted as if I did a single "big" training session?

TGlas on 16 Mar 2017

👍14

Therefore my question: when calling fit multiple times, is the internal state of the optimizer (in my case adadelta) preserved and are learning rates adapted as if I did a single "big" training session?

Bump. I'd also like to hear an answer for this question.

@TGlas Did you get any new insights?

PythEsc on 12 Oct 2017

@PythEsc I don't have definite insights into this (I did not check the keras code), but from my experience it seems that the state is preserved. In other words, it seems to work.

Anyway, I'd also be interested in getting an official answer :)

TGlas on 12 Oct 2017

It is preserved. You can check for yourself what the current learning rate is between runs or during the run in a callback:

currentLearningRate = K.get_value(model.optimizer.lr)

NakramR on 22 Oct 2017

👍8 👎2

FWIW, a similar StackOverflow question was asked and one of the answers/comments recommended the use of the generator pattern. I had this question as well and went with the generator, but if I knew I could reinvoke .fit it might have made testing a bit easier (especially in the REPL). I think users would benefit from having this behavior documented.

mach-kernel on 24 Oct 2017

👍7

https://keras.io/preprocessing/image/ (scroll down to see an example in wich iterative fitting is used)

I suppose It's not "well" documented beacuse most backends train models like this and resetting connections weights to random after fit would have been a very strange choice as you'll need the status to be preserved for inference or testing anyway.

marioviti on 29 Nov 2017

👍1

Thanks for the pointer. Sure, resetting the weights would be strange, but it is not so clear whether the aggregated information in the optimizer (say, adadelta) should be reset or not. Not resetting makes the assumption that data passed to subsequent fit calls is from a similar distribution, while starting the optimizer from scratch makes no assumptions at all. That's why I don't think this is a clear case, so the behavior should better be documented.

TGlas on 29 Nov 2017

👍7

how to plot loss if we fit model at every iteration.

bhaskarhunt on 27 May 2019

👍3

It is preserved. You can check for yourself what the current learning rate is between runs or during the run in a callback:

currentLearningRate = K.get_value(model.optimizer.lr)

Misleading code; lr is the _initial_ learning rate, not _latest_. Latter can be found in source code, or 'partly' extracted - e.g. Adam (code below).

All train functions (fit, train_on_batch, etc) seem to update weights by calling optimizer.get_updates - which does update_add(iterations,1) - suggesting the iterations variable to be always present; trying fit vs. train_on_batch, I observe both updating iterations the same way. Further, aside from __init__, I couldn't find an iterations=0 line - suggesting it persists _permanently_ unless the optimizer is re-initialized.

Lastly, iterations is _not_ saved via model.save_weights(), and is reset to 0 after loading; unsure about model.to_json() and other 'complete' methods.

def get_adamState(model):
    lr         = K.get_value(model.optimizer.lr)
    iterations = K.get_value(model.optimizer.iterations)
    beta_1     = K.get_value(model.optimizer.beta_1)
    beta_2     = K.get_value(model.optimizer.beta_2)

    t    = iterations + 1.
    lr_t = lr * ( np.sqrt(1. - (beta_2**t)) / 
                         (1. - (beta_1**t)) )

    print('\nOptimizer state:\n')
    print('iterations: ' + str(iterations) ' (= # of updates))
    print('lr: ' + str(lr) + ' (initial)')
    print('lr_t: ' + '%.4e'%lr_t + ' (pre-momentum/RMSprop/amsgrad)')

Sample output:

Optimizer state:

iterations: 600 (= # of updates)
lr: 1e-04 (initial)
lr_t: 6.7183e-05 (pre-momentum/RMSprop/amsgrad)

OverLordGoldDragon on 7 Jun 2019

how to plot loss if we fit model at every iteration.

#FIT
train_history = model.fit(x,y,epochs=1)
train_loss.append(train_history.history['loss'])
train_acc.append(train_history.history['acc']) # if applicable

#TRAIN_ON_BATCH
batch_loss = model.train_on_batch(x,y)
train_loss.append(batch_loss[0])
train_acc.append(batch_loss[1]) # if applicable

plt.plot(train_loss)
plt.plot(train_acc)

OverLordGoldDragon on 7 Jun 2019

As this question was proposed 3 years ago, I am wondering does the answer still work now?
I am using a pre-trained model, so I need to load the model(with all the weights) first.
Then because I am using two sets of data(different images but with same shape), I need to call fit_generator a bunch of times in the code. Under this situation does the model incrementally train (this is what I want)? Or it will always start training from the loaded pre-trained model?
Thanks!

Darkhunter9 on 14 Jul 2019

@Darkhunter9 'Training' modifies whatever weights the model has when the call to train is made - i.e., training pretrained model will update the loaded weights. You can add/replace subsequent layers from any point via model.pop() in a Sequential model - or, as I prefer, a bypassing 'shortcut' connection:

...
x3 = Conv1d(...)(x2)

x4 = LSTM(...)(x3)
x4_ = LSTM(...)(x3) # to train from scratch
...
output = Dense(...)(xN)
model = Model(input,output)
model.load_weights(...) 

output_ = Dense(...)(xN_)
model = Model(input,output_)

x4 through output will be disconnected
x4_ through output_ will be trained from scratch
x3 through input will remain connected and be trained starting with pretrained weights

OverLordGoldDragon on 14 Jul 2019

@OverLordGoldDragon Thanks! I'll make my question clearer:
If the weights in the loaded model are version_0;
After I first call fit_generator and training is done, the weights become version_1;
At this time, if I call fit_generator again in my code, will the training start updating parameters from version_0, or version_1?

Darkhunter9 on 14 Jul 2019

@Darkhunter9 version_1, since it contains

whatever weights the model has when the call to train is made

(P.S. unless using fit_generator with use_multiprocessing=True, you may be better off with train_on_batch for repeated calls)

OverLordGoldDragon on 14 Jul 2019

@OverLordGoldDragon Thanks! I am using fit_generator with multiprocessing, cuz I need to do image processing (with different recipes) over a large dataset on the fly.

Darkhunter9 on 14 Jul 2019