System information
pip install tensorflow_gpu==2.0.0 pip install tensorflow_addons==0.6.0I am trying to add an RNN cell after the attention mechanism, inside a decoder. This is typical e.g. in tacotron.
below is a minimal version of the code I am trying:
attn_mech = tfa.seq2seq.LuongAttention(128)
attn_cell = tfa.seq2seq.AttentionWrapper(
cell=tf.keras.layers.GRUCell(256),
attention_mechanism=attn_mech,
)
decoder_cell = tf.keras.layers.StackedRNNCells([
attn_cell,
tf.keras.layers.GRUCell(128),
]) # fails on this line
sampler = tfa.seq2seq.TrainingSample()
decoder = tfa.seq2seq.BasicDecoder(decoder_cell, sampler)
This fails with the following traceback:
ValueError Traceback (most recent call last)
<ipython-input-229-14d37ee697e4> in <module>
6 decoder_cell = tf.keras.layers.StackedRNNCells([
7 attn_cell,
----> 8 tf.keras.layers.GRUCell(128),
9 ])
10 sampler = tfa.seq2seq.TrainingSample()
~/my-project/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/layers/recurrent.py in __init__(self, cells, **kwargs)
74 raise ValueError('All cells must have a `call` method. '
75 'received cells:', cells)
---> 76 if not hasattr(cell, 'state_size'):
77 raise ValueError('All cells must have a '
78 '`state_size` attribute. '
~/my-project/venv/lib/python3.6/site-packages/tensorflow_addons/seq2seq/attention_wrapper.py in state_size(self)
1807 cell_state=self._cell.state_size,
1808 time=tf.TensorShape([]),
-> 1809 attention=self._get_attention_layer_size(),
1810 alignments=self._item_or_tuple(
1811 a.alignments_size for a in self._attention_mechanisms),
~/my-project/venv/lib/python3.6/site-packages/tensorflow_addons/seq2seq/attention_wrapper.py in _get_attention_layer_size(self)
1752 if self._attention_layer_size is not None:
1753 return self._attention_layer_size
-> 1754 self._attention_mechanisms_checks()
1755 attention_output_sizes = (
1756 attention_mechanism.values.shape[-1]
~/my-project/venv/lib/python3.6/site-packages/tensorflow_addons/seq2seq/attention_wrapper.py in _attention_mechanisms_checks(self)
1733 for attention_mechanism in self._attention_mechanisms:
1734 if not attention_mechanism.memory_initialized:
-> 1735 raise ValueError("The AttentionMechanism instances passed to "
1736 "this AttentionWrapper should be initialized "
1737 "with a memory first, either by passing it "
ValueError: The AttentionMechanism instances passed to this AttentionWrapper should be initialized with a memory first, either by passing it to the AttentionMechanism constructor or calling attention_mechanism.setup_memory()
This is because the StackedRNNCell checks its constituent cells all have the state_size property, by calling hasattr(cell, 'state_size'). That in turn actually evaluates cell.state_size to see if it raises an error, which it does because the state size is dependent on the memory, which hasn't been setup yet.
I suppose I could move these cells inside the AttentionWrapper, but the I think it would only see the previous timestep's attention output.
Here is some non-eager mode code that creates RNN cells in the way I am trying:
https://github.com/keithito/tacotron/blob/00058d6cc466476badc085e01d097637e28e881c/models/tacotron.py#L64
Apart from initializing the memory before creating the StackedRNNCells, I'm not sure we have a good way to make this work.
Instead, you could build the StackedRNNCells first with GRUCell only and then wrap the first cell with the AttentionWrapper to bypass the check. You can find an example here.
thanks for the tip! I think it's really the fault of the code in keras/layers/recurrent.py using hasattr in init. perhaps it could check if it is a subclass of the AbstractRNNCell first, avoiding actually calling the property methods. But that's outside the scope of this repo.
Thanks! Please file an issue on TF core to see if there should be an upstream fix.
Hi, I have the same issue as @matthen, with my StackedRNNCells built inside tf.keras.layers.RNN.
Following @guillaumekln's advice, I tried something like this:
cells = [tf.keras.layers.LSTMCell(13) for _ in range(3)]
layers = tf.keras.layers.RNN(cells)
layers.cell.cells[0] = _add_attention(layers.cell.cells[0])
but am getting the following error: TypeError: Dimension value must be integer or None or have an __index__ method, got [256, 256] in recurrent.py@576 when building self.state_spec:
self.state_spec = [
InputSpec(shape=[None] + tensor_shape.as_shape(dim).as_list())
for dim in state_size
]
Here, state_size looks like this:
state_size = {list: 3} [AttentionWrapperState(cell_state=[256, 256], attention=256, time=TensorShape([]), alignments=47, alignment_history=(), attention_state=47), [256, 256], [256, 256]]
0 = {AttentionWrapperState: 6} AttentionWrapperState(cell_state=[256, 256], attention=256, time=TensorShape([]), alignments=47, alignment_history=(), attention_state=47)
1 = {list: 2} [256, 256]
2 = {list: 2} [256, 256]
Any advice please ?
Could not find any related issue raised in tensorflow/tensorflow. @qlzh727 would you please look into this, when you have time ?
Thanks.
I got it to work by first creating a single cell with tf.keras.layers.StackedRNNCells, I'm not sure if that will help
Thanks, @matthen. I get the same error during RNN.call(). It is that construct for self.state_spec in particular that doesn't accept an AttentionWrapperState. For brevity, this is how my code looks like when manually appending cells to already built StackedRNNCells:
cells = [tf.keras.layers.LSTMCell(13) for _ in range(3)]
layers = tf.keras.layers.RNN(cells[1:])
wrapper = _add_attention(cells[0])
layers.cell.cells.append(wrapper)
Building worked in the previous snippet as well.
I do call setup_memory before the RNN.call() to correctly pass the attention mechanism check, it is just that mixing LSTMCells with AttentionWrapped cells is incompatible with RNN.
It seems like a TensorFlow issue to me. They should probably do something like:
self.state_spec = nest.map_structure(
lambda dim: InputSpec(shape=[None] + tensor_shape.as_shape(dim).as_list()),
state_size)
I think we have a similar report for the high order structure as the state size, https://github.com/tensorflow/tensorflow/issues/34269. Will probably address this issue there as well.
Btw, I will take some long vacation and probably won't work much during the holiday season. Feel free to send a PR to TF if any of you have the time to fix the issue. Someone on TF team will handle the PR properly when I am away.
@guillaumekln your code works, however state_spec has a different structure from the original one.
Assume this state_size:
0 = {list: 2} [256, 256]
1 = {list: 2} [256, 256]
2 = {list: 2} [256, 256]
current state_spec
0 = {InputSpec} InputSpec(shape=[None, 256, 256], ndim=3)
1 = {InputSpec} InputSpec(shape=[None, 256, 256], ndim=3)
2 = {InputSpec} InputSpec(shape=[None, 256, 256], ndim=3)
your state_spec
0 = {list: 2} [InputSpec(shape=[None, 256], ndim=2), InputSpec(shape=[None, 256], ndim=2)]
1 = {list: 2} [InputSpec(shape=[None, 256], ndim=2), InputSpec(shape=[None, 256], ndim=2)]
2 = {list: 2} [InputSpec(shape=[None, 256], ndim=2), InputSpec(shape=[None, 256], ndim=2)]
Please note that the PR above addresses @matthen's original issue, and not the state_spec one.
We'll need a separate fix for state_spec.
@guillaumekln when I try your snippet in graph mode, the following error is returned:
TypeError: Cannot iterate over a scalar tensor.
Update: I have slightly modified your code to address the Tensor issue:
def get_state_spec(dim):
if isinstance(dim, ops.Tensor):
return InputSpec(shape=[None] + dim.shape.as_list())
else:
return InputSpec(shape=[None] + tensor_shape.as_shape(dim).as_list())
self.state_spec = nest.map_structure(get_state_spec, state_size)
but on the latest compatible nightly build (20191119) an error is returned in the while loop, graph mode only:
ValueError: Input tensor 'rnn_2/AttentionWrapperZeroState/zeros_3:0' enters the loop with shape (), but has shape (None, 1) after one iteration. To allow the shape to vary across iterations, use the `shape_invariants` argument of tf.while_loop to specify a less-specific shape.
I am not exactly sure if this is because of the code above.
Hi @matthen, I have the same problem now, could you solve it? I would be very grateful...
Most helpful comment
I think we have a similar report for the high order structure as the state size, https://github.com/tensorflow/tensorflow/issues/34269. Will probably address this issue there as well.
Btw, I will take some long vacation and probably won't work much during the holiday season. Feel free to send a PR to TF if any of you have the time to fix the issue. Someone on TF team will handle the PR properly when I am away.