Addons: Stocastic Weight Averaging (SWA) fails on composite models

Created on 3 Sep 2020  路  12Comments  路  Source: tensorflow/addons

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Kaggle Notebook environment + Ubuntu 18.04
  • TensorFlow version and how it was installed (source or binary): binary, both 2.2 and 2.3
  • TensorFlow-Addons version and how it was installed (source or binary): binary, 0.10.0
  • Python version: 3.7
  • Is GPU used? (yes/no): fails with both

Describe the bug

When optimizer.assign_average_vars(model.variables) is called on model A where model B was included in the composition of model A SWA will crash with a message about KeyError.

SWA also seems to fail with models running in mixed precision but that seems more like a feature that isn't there than a bug.

Code to reproduce the issue

I made a Kaggle notebook to show the issue, available at https://www.kaggle.com/morteng/petal2metal
I've also had the issue in my own work where I created both models that I was attempting to join.

Other info / logs

KeyError                                  Traceback (most recent call last)
<ipython-input-14-a1c484b3e13d> in <module>
      2 print("\nEvaluate before changing weights")
      3 model.evaluate(validate, verbose=2)
----> 4 optimizer.assign_average_vars(model.variables)
      5 ## one forward pass with low learning rate to adjust batch normalization
      6 print("\nEvaluate before updating batch norm")

/opt/conda/lib/python3.7/site-packages/tensorflow_addons/optimizers/average_wrapper.py in assign_average_vars(self, var_list)
    120             [
    121                 var.assign(self.get_slot(var, "average"))
--> 122                 for var in var_list
    123                 if var.trainable
    124             ]

/opt/conda/lib/python3.7/site-packages/tensorflow_addons/optimizers/average_wrapper.py in <listcomp>(.0)
    121                 var.assign(self.get_slot(var, "average"))
    122                 for var in var_list
--> 123                 if var.trainable
    124             ]
    125         )

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py in get_slot(self, var, slot_name)
    773   def get_slot(self, var, slot_name):
    774     var_key = _var_key(var)
--> 775     slot_dict = self._slots[var_key]
    776     return slot_dict[slot_name]
    777 

KeyError: 'stem_conv/kernel_113'

All 12 comments

Can you test with tf-nightly?

I've tried with tf-nightly now. It did not train when I used the GPU. On the CPU it failed at exactly the same point.

@grofte Thanks, just an extra effort if you can.. Do you have or can you extract a very minimal code snippet to reproduce this instead of the full Kaggle notebook?

/cc @shreyashpatodia

I think @shreyashpatodia no longer works at Google and isn't maintaining this optimizer class. But I tried both MovingAverage and Lookahead and they both work with a composite model.

Here's a different code example:

!pip install tensorflow-datasets
!pip install tensorflow-addons

import tensorflow_addons as tfa
import tensorflow as tf
import tensorflow_datasets as tfds

dataset, metadata = tfds.load('fashion_mnist', as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal',      'Shirt',   'Sneaker',  'Bag',   'Ankle boot']

def normalize(images, labels):
  images = tf.cast(images, tf.float32)
  images /= 255
  tileing = tf.constant([1, 1, 3], tf.int32)
  images = tf.image.pad_to_bounding_box(images, 2, 2, 32, 32)
  images = tf.tile(images, tileing)
  return {'image': images}, {'label': labels}

train_dataset =  train_dataset.map(normalize).batch(1024)
test_dataset  =  test_dataset.map(normalize).batch(1024)

vgg = tf.keras.applications.VGG16(input_shape=(32, 32, 3),
                                  weights='imagenet',
                                  include_top=False)

vgg.trainable = False

input_images = tf.keras.layers.Input(shape=(32, 32, 3), name='image')
x = vgg(input_images)
output_labels = tf.keras.layers.Dense(len(class_names), name='label')(x)
model = tf.keras.models.Model(inputs=[input_images], outputs=[output_labels])

optimizer = tf.keras.optimizers.Adam(lr=1e-3)
optimizer = tfa.optimizers.SWA(optimizer)
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')
model.fit(train_dataset,
          epochs=3)

model.evaluate(test_dataset, verbose=2)
optimizer.assign_average_vars(model.variables)
model.evaluate(test_dataset, verbose=2)

This ran in about 30 seconds in a Colaboratory notebook with a GPU.

I can take a look at this at some point this week. Thanks for reporting the bug @grofte!

There's also an extremely weird bug with mixed precision where turning on both mixed precision and SWA fails but only with GPU, not TPU =(
But maybe that's because I haven't initialised the TPUs correctly...

"""# Test with mixed precision"""

!pip install tensorflow-datasets
!pip install tensorflow-addons

import tensorflow_addons as tfa
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.mixed_precision import experimental as mixed_precision

dataset, metadata = tfds.load('fashion_mnist', as_supervised=True, with_info=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal',      'Shirt',   'Sneaker',  'Bag',   'Ankle boot']

def normalize(images, labels):
  images = tf.cast(images, tf.float32)
  images /= 255
  return {'image': images}, {'label': labels}

train_dataset =  train_dataset.map(normalize).batch(1024)
test_dataset  =  test_dataset.map(normalize).batch(1024)

policy = mixed_precision.Policy('mixed_float16') # 'mixed_bfloat16' if on TPU instead of GPU
mixed_precision.set_policy(policy)

input_images = tf.keras.layers.Input(shape=(28, 28, 1), name='image')
x = tf.keras.layers.Flatten()(input_images)
output_labels = tf.keras.layers.Dense(len(class_names), name='label')(x)
model = tf.keras.models.Model(inputs=[input_images], outputs=[output_labels])

optimizer = tf.keras.optimizers.Adam(lr=1e-3)
optimizer = tfa.optimizers.SWA(optimizer) # toggle this line
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics='accuracy')
model.fit(train_dataset,
          epochs=3)

model.evaluate(test_dataset, verbose=2)

Returns TypeError: apply_gradients() got an unexpected keyword argument 'experimental_aggregate_gradients'

@grofte For this second issue what TFA version are you trying to use? Cause probably was fixed in https://github.com/tensorflow/addons/pull/2137

Instead for the first issue as some of the backbone variables are not trainable is it not like optimizer.assign_average_vars(model.trainable_variables)?

@bhack You are absolutely right. If I install tfa-nightly instead (and make extra sure that I remember to restart the session) and then assign to model.trainable_weights instead of model.variables it runs without any errors.

Why the switch from .variables to weights though? Intuitively, I would have said that it should have been .trainable_variables.

@shreyashpatodia It seems that you are off the hook =]

@grofte It was a typo if you see the comment was edited but too late.

I've set up a watch on new releases and look forward to adding it into my code.

Thanks for the help!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

seanpmorgan picture seanpmorgan  路  4Comments

facaiy picture facaiy  路  3Comments

gabrieldemarmiesse picture gabrieldemarmiesse  路  3Comments

seanpmorgan picture seanpmorgan  路  3Comments

shun-lin picture shun-lin  路  4Comments