Addons: seq2seq - how to add rnn cell after attention inside decoder in eager mode

Created on 7 Nov 2019 · 11Comments · Source: tensorflow/addons

System information

OS Platform and Distribution: ubuntu
TensorFlow version and how it was installed: pip install tensorflow_gpu==2.0.0
TensorFlow-Addons version and how it was installed (source or binary): pip install tensorflow_addons==0.6.0
Python version: 3.6.8
Is GPU used? (yes/no): yes

I am trying to add an RNN cell after the attention mechanism, inside a decoder. This is typical e.g. in tacotron.

below is a minimal version of the code I am trying:

attn_mech = tfa.seq2seq.LuongAttention(128)
attn_cell = tfa.seq2seq.AttentionWrapper(
    cell=tf.keras.layers.GRUCell(256),
    attention_mechanism=attn_mech,
)
decoder_cell = tf.keras.layers.StackedRNNCells([
    attn_cell,
    tf.keras.layers.GRUCell(128),
])  # fails on this line


sampler = tfa.seq2seq.TrainingSample()
decoder = tfa.seq2seq.BasicDecoder(decoder_cell, sampler)

This fails with the following traceback:

ValueError                                Traceback (most recent call last)
<ipython-input-229-14d37ee697e4> in <module>
      6 decoder_cell = tf.keras.layers.StackedRNNCells([
      7     attn_cell,
----> 8     tf.keras.layers.GRUCell(128),
      9 ])
     10 sampler = tfa.seq2seq.TrainingSample()

~/my-project/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/recurrent.py in __init__(self, cells, **kwargs)
     74         raise ValueError('All cells must have a `call` method. '
     75                          'received cells:', cells)
---> 76       if not hasattr(cell, 'state_size'):
     77         raise ValueError('All cells must have a '
     78                          '`state_size` attribute. '

~/my-project/venv/lib/python3.6/site-packages/tensorflow_addons/seq2seq/attention_wrapper.py in state_size(self)
   1807             cell_state=self._cell.state_size,
   1808             time=tf.TensorShape([]),
-> 1809             attention=self._get_attention_layer_size(),
   1810             alignments=self._item_or_tuple(
   1811                 a.alignments_size for a in self._attention_mechanisms),

~/my-project/venv/lib/python3.6/site-packages/tensorflow_addons/seq2seq/attention_wrapper.py in _get_attention_layer_size(self)
   1752         if self._attention_layer_size is not None:
   1753             return self._attention_layer_size
-> 1754         self._attention_mechanisms_checks()
   1755         attention_output_sizes = (
   1756             attention_mechanism.values.shape[-1]

~/my-project/venv/lib/python3.6/site-packages/tensorflow_addons/seq2seq/attention_wrapper.py in _attention_mechanisms_checks(self)
   1733         for attention_mechanism in self._attention_mechanisms:
   1734             if not attention_mechanism.memory_initialized:
-> 1735                 raise ValueError("The AttentionMechanism instances passed to "
   1736                                  "this AttentionWrapper should be initialized "
   1737                                  "with a memory first, either by passing it "

ValueError: The AttentionMechanism instances passed to this AttentionWrapper should be initialized with a memory first, either by passing it to the AttentionMechanism constructor or calling attention_mechanism.setup_memory()

This is because the StackedRNNCell checks its constituent cells all have the state_size property, by calling hasattr(cell, 'state_size'). That in turn actually evaluates cell.state_size to see if it raises an error, which it does because the state size is dependent on the memory, which hasn't been setup yet.

I suppose I could move these cells inside the AttentionWrapper, but the I think it would only see the previous timestep's attention output.

Here is some non-eager mode code that creates RNN cells in the way I am trying:
https://github.com/keithito/tacotron/blob/00058d6cc466476badc085e01d097637e28e881c/models/tacotron.py#L64

seq2seq

Source

matthen

Most helpful comment

I think we have a similar report for the high order structure as the state size, https://github.com/tensorflow/tensorflow/issues/34269. Will probably address this issue there as well.

Btw, I will take some long vacation and probably won't work much during the holiday season. Feel free to send a PR to TF if any of you have the time to fix the issue. Someone on TF team will handle the PR properly when I am away.

qlzh727 on 20 Nov 2019

👍2

All 11 comments

Apart from initializing the memory before creating the StackedRNNCells, I'm not sure we have a good way to make this work.

Instead, you could build the StackedRNNCells first with GRUCell only and then wrap the first cell with the AttentionWrapper to bypass the check. You can find an example here.

guillaumekln on 7 Nov 2019

thanks for the tip! I think it's really the fault of the code in keras/layers/recurrent.py using hasattr in init. perhaps it could check if it is a subclass of the AbstractRNNCell first, avoiding actually calling the property methods. But that's outside the scope of this repo.

matthen on 8 Nov 2019

Thanks! Please file an issue on TF core to see if there should be an upstream fix.

seanpmorgan on 8 Nov 2019

Hi, I have the same issue as @matthen, with my StackedRNNCells built inside tf.keras.layers.RNN.

Following @guillaumekln's advice, I tried something like this:

cells = [tf.keras.layers.LSTMCell(13) for _ in range(3)]
layers = tf.keras.layers.RNN(cells)
layers.cell.cells[0] = _add_attention(layers.cell.cells[0])

but am getting the following error: TypeError: Dimension value must be integer or None or have an __index__ method, got [256, 256] in recurrent.py@576 when building self.state_spec:

self.state_spec = [
          InputSpec(shape=[None] + tensor_shape.as_shape(dim).as_list())
          for dim in state_size
      ]

Here, state_size looks like this:

state_size = {list: 3} [AttentionWrapperState(cell_state=[256, 256], attention=256, time=TensorShape([]), alignments=47, alignment_history=(), attention_state=47), [256, 256], [256, 256]]
 0 = {AttentionWrapperState: 6} AttentionWrapperState(cell_state=[256, 256], attention=256, time=TensorShape([]), alignments=47, alignment_history=(), attention_state=47)
 1 = {list: 2} [256, 256]
 2 = {list: 2} [256, 256]

Any advice please ?

Could not find any related issue raised in tensorflow/tensorflow. @qlzh727 would you please look into this, when you have time ?

Thanks.

georgesterpu on 19 Nov 2019

I got it to work by first creating a single cell with tf.keras.layers.StackedRNNCells, I'm not sure if that will help

matthen on 20 Nov 2019

Thanks, @matthen. I get the same error during RNN.call(). It is that construct for self.state_spec in particular that doesn't accept an AttentionWrapperState. For brevity, this is how my code looks like when manually appending cells to already built StackedRNNCells:

cells = [tf.keras.layers.LSTMCell(13) for _ in range(3)]
layers = tf.keras.layers.RNN(cells[1:])
wrapper = _add_attention(cells[0])
layers.cell.cells.append(wrapper)

Building worked in the previous snippet as well.
I do call setup_memory before the RNN.call() to correctly pass the attention mechanism check, it is just that mixing LSTMCells with AttentionWrapped cells is incompatible with RNN.

georgesterpu on 20 Nov 2019

It seems like a TensorFlow issue to me. They should probably do something like:

self.state_spec = nest.map_structure(
    lambda dim: InputSpec(shape=[None] + tensor_shape.as_shape(dim).as_list()),
    state_size)

guillaumekln on 20 Nov 2019

👍1

I think we have a similar report for the high order structure as the state size, https://github.com/tensorflow/tensorflow/issues/34269. Will probably address this issue there as well.

qlzh727 on 20 Nov 2019

👍2

@guillaumekln your code works, however state_spec has a different structure from the original one.
Assume this state_size:

0 = {list: 2} [256, 256]
1 = {list: 2} [256, 256]
2 = {list: 2} [256, 256]

current state_spec

0 = {InputSpec} InputSpec(shape=[None, 256, 256], ndim=3)
1 = {InputSpec} InputSpec(shape=[None, 256, 256], ndim=3)
2 = {InputSpec} InputSpec(shape=[None, 256, 256], ndim=3)

your state_spec

0 = {list: 2} [InputSpec(shape=[None, 256], ndim=2), InputSpec(shape=[None, 256], ndim=2)]
1 = {list: 2} [InputSpec(shape=[None, 256], ndim=2), InputSpec(shape=[None, 256], ndim=2)]
2 = {list: 2} [InputSpec(shape=[None, 256], ndim=2), InputSpec(shape=[None, 256], ndim=2)]

georgesterpu on 20 Nov 2019

Please note that the PR above addresses @matthen's original issue, and not the state_spec one.
We'll need a separate fix for state_spec.

@guillaumekln when I try your snippet in graph mode, the following error is returned:
TypeError: Cannot iterate over a scalar tensor.

Update: I have slightly modified your code to address the Tensor issue:

      def get_state_spec(dim):
        if isinstance(dim, ops.Tensor):
          return InputSpec(shape=[None] + dim.shape.as_list())
        else:
          return InputSpec(shape=[None] + tensor_shape.as_shape(dim).as_list())
      self.state_spec = nest.map_structure(get_state_spec, state_size)

but on the latest compatible nightly build (20191119) an error is returned in the while loop, graph mode only:

ValueError: Input tensor 'rnn_2/AttentionWrapperZeroState/zeros_3:0' enters the loop with shape (), but has shape (None, 1) after one iteration. To allow the shape to vary across iterations, use the `shape_invariants` argument of tf.while_loop to specify a less-specific shape.

I am not exactly sure if this is because of the code above.

georgesterpu on 21 Nov 2019

Hi @matthen, I have the same problem now, could you solve it? I would be very grateful...