I have built an image classifier with 2 classes, say 'A' and 'B'. I have also saved this model, using model.save().
Now, after a certain time, the requirement arose to add one more class 'C'. Is it possible to load_model() and then add only one class to the previously saved model so that we have the final model with 3 classes ('A','B' and 'C'), without having to retrain the whole model, for classes 'A and 'B' again?
Help me out, I'm stuck here @fchollet or anyone else.
Hi @abhijitnathwani ,
I hope that what i'm writing here is correct, after loading your model using load_model you can remove your last Dense layer which outputs 2 class and add a new Dense layer that outputs the desired number of classes - in your case 3(A, B, C) and retrain your model.
Here is an example of a model that outputs 2 classes and after loading the weights you remove the last Dense layer and add a new Dense layer that outputs 3 classes:
from keras.models import Sequential, Model
from keras.layers import Dense
import numpy as np
model = Sequential()
model.add(Dense(32, input_shape=(1,)))
model.add(Dense(2, activation='softmax'))
model.load_weights("D:\\dataset\\fff.hd5")
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(np.random.randn(500), np.random.randint(0, 1, (500, 2)))
print(model.predict(np.asarray([4])))
##remove the last Dense layer of our model
model.pop()
base_model_layers = model.output
pred = Dense(3, activation='softmax')(base_model_layers)
model = Model(inputs=model.input, outputs=pred)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(np.random.randn(500), np.random.randint(0, 2, (500, 3)))
print(model.predict(np.asarray([4])))
Hope it was helpfull
Best Regards
Hi @talhadar
I tried your way, there is no error in execution and the program runs fine, however, the validation is reducing drastically, instead, as per my understanding it should have increased, as it had already pre-trained weights for Classes A and B ( when i trained for it, i got the val_acc: 0.93.) and while training for three classes, i'm getting val_acc of 0.3667
Thanks
Abhijit
Hi @abhijitnathwani, @talhadar
What I think it is happening is that by removing the dense layer you lose all the pretrained weights of the output layer, then you add a new dense layer with an extra class but this layer has new weight, not fitted to your data and of course you would have a lower validation accuracy. l have the same problem and my idea is that instead of deleting the dense layer, just resize it while preserving the pretrained weights. I am not quite sure how to accomplish this though so any advice would be really appreciated
Hi @abhijitnathwani , I'm trying to do same thing as yours. Have you got any solution on this or sample code?
Hi @ivancruzbht
What you suggested is a great idea. Did you find a way to do that? If so, please let me know the steps to train a model in this way.
thanks!
Hi, @abhijitnathwani @ivancruzbht @dilipajm @saivineethkumar
how to do this?
Thank you!
Hi @rnirdhar
I was able to achieve this task using darknet framework but didn't find any method to do using keras.
Hi @rnirdhar @ivancruzbht @dilipajm
I am using the following codes on Keras 2.2.0. The original weights at the last layer are copied back into the new layer. Hope this can help. When model is a sequential model:
from keras.layers import Dense
import numpy as np
# save the original weights
weights_bak = model.layers[-1].get_weights()
nb_classes = model.layers[-1].output_shape[-1]
model.pop()
model.add(Dense(nb_classes + 1, activation='softmax'))
weights_new = model.layers[-1].get_weights()
# copy the original weights back
weights_new[0][:, :-1] = weights_bak[0]
weights_new[1][:-1] = weights_bak[1]
# use the average weight to init the new class.
weights_new[0][:, -1] = np.mean(weights_bak[0], axis=1)
weights_new[1][-1] = np.mean(weights_bak[1])
model.layers[-1].set_weights(weights_new)
When the model is defined by functional API:
from keras.models import Model
import numpy as np
# save the original weights
weights_bak = model.layers[-1].get_weights()
nb_classes = model.layers[-1].output_shape[-1]
model.layers.pop()
new_layer = Dense(nb_classes + 1, activation='softmax')
out = new_layer(model.layers[-1].output)
inp = model.input
model = Model(inp, out)
weights_new = model.layers[-1].get_weights()
# copy the original weights back
weights_new[0][:, :-1] = weights_bak[0]
weights_new[1][:-1] = weights_bak[1]
# use the average weight to init the new class.
weights_new[0][:, -1] = np.mean(weights_bak[0], axis=1)
weights_new[1][-1] = np.mean(weights_bak[1])
model.layers[-1].set_weights(weights_new)
@talhadar @abhijitnathwani @YuriWu Hi all,
I tried all the suggestions described here. My question is how exactly will it work if I make my Dense layer with 3 outputs? Will I still need the training data on which the first model with 2 classes was trained?. If not then, what to give as in terms of data for the other two classes on which the model is already trained, because this new model accepts 3 labels and and corresponding data. Please do comment if I am missing a point here. THANKS!
Rishabh Sahrawat
...
Will I still need the training data on which the first model with 2 classes was trained?
...
Rishabh Sahrawat
From my experience, you still need them. If you fine-tune the 3-output model by data with only 3rd class, the final model will perform poor on the first two classes (often called "catastrophic forgetting").
Using the code I've mentioned, I can learn the (n+1)th classes fewer epoches than from scratch, based on already trained n-class model.
Yuri
@YuriWu thank you for the quick reply. I see, but the problem is I have a huge dataset (700,000 classes) for my problem which can not be loaded onto RAM so using the older dataset for retraining is not possible.
I am basically performing text classification. I have a model trained & saved on 24,000 classes but I want to train it on complete dataset using transfer learning so like 24000, then 24000 and so on.. Unfortunately, I can not find any way with which I can achieve this. I dont' know how companies do this when they have enormous dataset. Please share your ideas if you have any for my problem. I will be very thankful.
@rishabhsahrawat
I don't know how companies do this either. But you can check my recent paper at ICML'19 and the code repo. I tried to solve the partial label space problem locally in multiparty style, and ensemble them together via communications among local models.
Please cite that paper if you find it useful :-)
Hi @rnirdhar @ivancruzbht @dilipajm
I am using the following codes on Keras 2.2.0. The original weights at the last layer are copied back into the new layer. Hope this can help. Whenmodelis a sequential model:from keras.layers import Dense import numpy as np # save the original weights weights_bak = model.layers[-1].get_weights() nb_classes = model.layers[-1].output_shape[-1] model.pop() model.add(Dense(nb_classes + 1, activation='softmax')) weights_new = model.layers[-1].get_weights() # copy the original weights back weights_new[0][:, :-1] = weights_bak[0] weights_new[1][:-1] = weights_bak[1] # use the average weight to init the new class. weights_new[0][:, -1] = np.mean(weights_bak[0], axis=1) weights_new[1][-1] = np.mean(weights_bak[1]) model.layers[-1].set_weights(weights_new)When the
modelis defined by functional API:from keras.models import Model import numpy as np # save the original weights weights_bak = model.layers[-1].get_weights() nb_classes = model.layers[-1].output_shape[-1] model.layers.pop() new_layer = Dense(nb_classes + 1, activation='softmax') out = new_layer(model.layers[-1].output) inp = model.input model = Model(inp, out) weights_new = model.layers[-1].get_weights() # copy the original weights back weights_new[0][:, :-1] = weights_bak[0] weights_new[1][:-1] = weights_bak[1] # use the average weight to init the new class. weights_new[0][:, -1] = np.mean(weights_bak[0], axis=1) weights_new[1][-1] = np.mean(weights_bak[1]) model.layers[-1].set_weights(weights_new)
Thanks for the reply. I have one question, I retrained my saved model for a new detected class based on what you provided. So, I could retrain the last layer of my vggface model by adding one new class, however, the model classifies the new class with a lower probability. In other words, the model doesn't learn the new class as good as previous ones. I appreciate your thoughts on this.
Most helpful comment
Hi @rnirdhar @ivancruzbht @dilipajm
I am using the following codes on Keras 2.2.0. The original weights at the last layer are copied back into the new layer. Hope this can help. When
modelis a sequential model:When the
modelis defined by functional API: