System information
Describe the bug
When optimizer.assign_average_vars(model.variables) is called on model A where model B was included in the composition of model A SWA will crash with a message about KeyError.
SWA also seems to fail with models running in mixed precision but that seems more like a feature that isn't there than a bug.
Code to reproduce the issue
I made a Kaggle notebook to show the issue, available at https://www.kaggle.com/morteng/petal2metal
I've also had the issue in my own work where I created both models that I was attempting to join.
Other info / logs
KeyError Traceback (most recent call last)
<ipython-input-14-a1c484b3e13d> in <module>
2 print("\nEvaluate before changing weights")
3 model.evaluate(validate, verbose=2)
----> 4 optimizer.assign_average_vars(model.variables)
5 ## one forward pass with low learning rate to adjust batch normalization
6 print("\nEvaluate before updating batch norm")
/opt/conda/lib/python3.7/site-packages/tensorflow_addons/optimizers/average_wrapper.py in assign_average_vars(self, var_list)
120 [
121 var.assign(self.get_slot(var, "average"))
--> 122 for var in var_list
123 if var.trainable
124 ]
/opt/conda/lib/python3.7/site-packages/tensorflow_addons/optimizers/average_wrapper.py in <listcomp>(.0)
121 var.assign(self.get_slot(var, "average"))
122 for var in var_list
--> 123 if var.trainable
124 ]
125 )
/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py in get_slot(self, var, slot_name)
773 def get_slot(self, var, slot_name):
774 var_key = _var_key(var)
--> 775 slot_dict = self._slots[var_key]
776 return slot_dict[slot_name]
777
KeyError: 'stem_conv/kernel_113'
Can you test with tf-nightly?
I've tried with tf-nightly now. It did not train when I used the GPU. On the CPU it failed at exactly the same point.
@grofte Thanks, just an extra effort if you can.. Do you have or can you extract a very minimal code snippet to reproduce this instead of the full Kaggle notebook?
/cc @shreyashpatodia
I think @shreyashpatodia no longer works at Google and isn't maintaining this optimizer class. But I tried both MovingAverage and Lookahead and they both work with a composite model.
Here's a different code example:
!pip install tensorflow-datasets
!pip install tensorflow-addons
import tensorflow_addons as tfa
import tensorflow as tf
import tensorflow_datasets as tfds
dataset, metadata = tfds.load('fashion_mnist', as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
def normalize(images, labels):
images = tf.cast(images, tf.float32)
images /= 255
tileing = tf.constant([1, 1, 3], tf.int32)
images = tf.image.pad_to_bounding_box(images, 2, 2, 32, 32)
images = tf.tile(images, tileing)
return {'image': images}, {'label': labels}
train_dataset = train_dataset.map(normalize).batch(1024)
test_dataset = test_dataset.map(normalize).batch(1024)
vgg = tf.keras.applications.VGG16(input_shape=(32, 32, 3),
weights='imagenet',
include_top=False)
vgg.trainable = False
input_images = tf.keras.layers.Input(shape=(32, 32, 3), name='image')
x = vgg(input_images)
output_labels = tf.keras.layers.Dense(len(class_names), name='label')(x)
model = tf.keras.models.Model(inputs=[input_images], outputs=[output_labels])
optimizer = tf.keras.optimizers.Adam(lr=1e-3)
optimizer = tfa.optimizers.SWA(optimizer)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
model.fit(train_dataset,
epochs=3)
model.evaluate(test_dataset, verbose=2)
optimizer.assign_average_vars(model.variables)
model.evaluate(test_dataset, verbose=2)
This ran in about 30 seconds in a Colaboratory notebook with a GPU.
I can take a look at this at some point this week. Thanks for reporting the bug @grofte!
There's also an extremely weird bug with mixed precision where turning on both mixed precision and SWA fails but only with GPU, not TPU =(
But maybe that's because I haven't initialised the TPUs correctly...
"""# Test with mixed precision"""
!pip install tensorflow-datasets
!pip install tensorflow-addons
import tensorflow_addons as tfa
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.mixed_precision import experimental as mixed_precision
dataset, metadata = tfds.load('fashion_mnist', as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
def normalize(images, labels):
images = tf.cast(images, tf.float32)
images /= 255
return {'image': images}, {'label': labels}
train_dataset = train_dataset.map(normalize).batch(1024)
test_dataset = test_dataset.map(normalize).batch(1024)
policy = mixed_precision.Policy('mixed_float16') # 'mixed_bfloat16' if on TPU instead of GPU
mixed_precision.set_policy(policy)
input_images = tf.keras.layers.Input(shape=(28, 28, 1), name='image')
x = tf.keras.layers.Flatten()(input_images)
output_labels = tf.keras.layers.Dense(len(class_names), name='label')(x)
model = tf.keras.models.Model(inputs=[input_images], outputs=[output_labels])
optimizer = tf.keras.optimizers.Adam(lr=1e-3)
optimizer = tfa.optimizers.SWA(optimizer) # toggle this line
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics='accuracy')
model.fit(train_dataset,
epochs=3)
model.evaluate(test_dataset, verbose=2)
Returns TypeError: apply_gradients() got an unexpected keyword argument 'experimental_aggregate_gradients'
@grofte For this second issue what TFA version are you trying to use? Cause probably was fixed in https://github.com/tensorflow/addons/pull/2137
Instead for the first issue as some of the backbone variables are not trainable is it not like optimizer.assign_average_vars(model.trainable_variables)?
@bhack You are absolutely right. If I install tfa-nightly instead (and make extra sure that I remember to restart the session) and then assign to model.trainable_weights instead of model.variables it runs without any errors.
Why the switch from .variables to weights though? Intuitively, I would have said that it should have been .trainable_variables.
@shreyashpatodia It seems that you are off the hook =]
@grofte It was a typo if you see the comment was edited but too late.
I've set up a watch on new releases and look forward to adding it into my code.
Thanks for the help!