I'm now having something around 500000 data points. They are all texts so they could be fit into the memory together. But if I fit all of those, when I try to compile the model, it gave me killed: 9 because there's no way to fit the model into the memory.
I've tried the first 1000 points of the data and it could be ft and trained well using batch-size 150.
So I'm now thinking maybe to take in the data points 1000 at a time or maybe bigger, and fit the model one by one. However, I'm not sure at this moment how to do that. Do I call model.fit multiple times? Also if I want to train more than one epoch, will it give me the same results to train the model with n_epoch = 1 and iterate for 10 times? Is that somehow like this?
for _ in range(10):
#somehow cut the data into slices and fit them one by one
model.fit(data_slice, label_slice ......)
Yes, successive calls to fit
will incrementally train the model.
I have a related issue. I generate data on the fly, and the generated data depends on the current state of the network. So I am calling fit
many times in a loop. The data for each training session is very limited, hence there is no time to adapt hyperparameters during a single call to fit
.
Therefore my question: when calling fit
multiple times, is the internal state of the optimizer (in my case adadelta) preserved and are learning rates adapted as if I did a single "big" training session?
Therefore my question: when calling fit multiple times, is the internal state of the optimizer (in my case adadelta) preserved and are learning rates adapted as if I did a single "big" training session?
Bump. I'd also like to hear an answer for this question.
@TGlas Did you get any new insights?
@PythEsc I don't have definite insights into this (I did not check the keras code), but from my experience it seems that the state is preserved. In other words, it seems to work.
Anyway, I'd also be interested in getting an official answer :)
It is preserved. You can check for yourself what the current learning rate is between runs or during the run in a callback:
currentLearningRate = K.get_value(model.optimizer.lr)
FWIW, a similar StackOverflow question was asked and one of the answers/comments recommended the use of the generator pattern. I had this question as well and went with the generator, but if I knew I could reinvoke .fit
it might have made testing a bit easier (especially in the REPL). I think users would benefit from having this behavior documented.
https://keras.io/preprocessing/image/ (scroll down to see an example in wich iterative fitting is used)
I suppose It's not "well" documented beacuse most backends train models like this and resetting connections weights to random after fit would have been a very strange choice as you'll need the status to be preserved for inference or testing anyway.
Thanks for the pointer. Sure, resetting the weights would be strange, but it is not so clear whether the aggregated information in the optimizer (say, adadelta) should be reset or not. Not resetting makes the assumption that data passed to subsequent fit calls is from a similar distribution, while starting the optimizer from scratch makes no assumptions at all. That's why I don't think this is a clear case, so the behavior should better be documented.
how to plot loss if we fit model at every iteration.
It is preserved. You can check for yourself what the current learning rate is between runs or during the run in a callback:
currentLearningRate = K.get_value(model.optimizer.lr)
Misleading code; lr
is the _initial_ learning rate, not _latest_. Latter can be found in source code, or 'partly' extracted - e.g. Adam (code below).
All train functions (fit
, train_on_batch
, etc) seem to update weights by calling optimizer.get_updates
- which does update_add(iterations,1)
- suggesting the iterations
variable to be always present; trying fit
vs. train_on_batch
, I observe both updating iterations
the same way. Further, aside from __init__
, I couldn't find an iterations=0
line - suggesting it persists _permanently_ unless the optimizer is re-initialized.
Lastly, iterations
is _not_ saved via model.save_weights()
, and is reset to 0
after loading; unsure about model.to_json()
and other 'complete' methods.
def get_adamState(model):
lr = K.get_value(model.optimizer.lr)
iterations = K.get_value(model.optimizer.iterations)
beta_1 = K.get_value(model.optimizer.beta_1)
beta_2 = K.get_value(model.optimizer.beta_2)
t = iterations + 1.
lr_t = lr * ( np.sqrt(1. - (beta_2**t)) /
(1. - (beta_1**t)) )
print('\nOptimizer state:\n')
print('iterations: ' + str(iterations) ' (= # of updates))
print('lr: ' + str(lr) + ' (initial)')
print('lr_t: ' + '%.4e'%lr_t + ' (pre-momentum/RMSprop/amsgrad)')
Sample output:
Optimizer state:
iterations: 600 (= # of updates)
lr: 1e-04 (initial)
lr_t: 6.7183e-05 (pre-momentum/RMSprop/amsgrad)
how to plot loss if we fit model at every iteration.
#FIT
train_history = model.fit(x,y,epochs=1)
train_loss.append(train_history.history['loss'])
train_acc.append(train_history.history['acc']) # if applicable
#TRAIN_ON_BATCH
batch_loss = model.train_on_batch(x,y)
train_loss.append(batch_loss[0])
train_acc.append(batch_loss[1]) # if applicable
plt.plot(train_loss)
plt.plot(train_acc)
As this question was proposed 3 years ago, I am wondering does the answer still work now?
I am using a pre-trained model, so I need to load the model(with all the weights) first.
Then because I am using two sets of data(different images but with same shape), I need to call fit_generator a bunch of times in the code. Under this situation does the model incrementally train (this is what I want)? Or it will always start training from the loaded pre-trained model?
Thanks!
@Darkhunter9 'Training' modifies whatever weights the model has when the call to train is made - i.e., training pretrained model will update the loaded weights. You can add/replace subsequent layers from any point via model.pop()
in a Sequential
model - or, as I prefer, a bypassing 'shortcut' connection:
...
x3 = Conv1d(...)(x2)
x4 = LSTM(...)(x3)
x4_ = LSTM(...)(x3) # to train from scratch
...
output = Dense(...)(xN)
model = Model(input,output)
model.load_weights(...)
output_ = Dense(...)(xN_)
model = Model(input,output_)
x4
through output
will be disconnectedx4_
through output_
will be trained from scratchx3
through input
will remain connected and be trained starting with pretrained weights@OverLordGoldDragon Thanks! I'll make my question clearer:
If the weights in the loaded model are version_0;
After I first call fit_generator and training is done, the weights become version_1;
At this time, if I call fit_generator again in my code, will the training start updating parameters from version_0, or version_1?
@Darkhunter9 version_1
, since it contains
whatever weights the model has when the call to train is made
(P.S. unless using fit_generator
with use_multiprocessing=True
, you may be better off with train_on_batch
for repeated calls)
@OverLordGoldDragon Thanks! I am using fit_generator with multiprocessing, cuz I need to do image processing (with different recipes) over a large dataset on the fly.
how to plot loss if we fit model at every iteration.
You can append to a list or array and plot after some iterations.
Most helpful comment
Yes, successive calls to
fit
will incrementally train the model.