Hello!
I'm trying to build an encoder-decoder model using attention mechanism with tensorflow.
I'm using tensorflow_addons repository trying to reproduce and understand this model : https://www.tensorflow.org/addons/tutorials/networks_seq2seq_nmt
Unfortunately there are not enough documentation on BasicDecoder, Sampler and AttentionWrapper objects for me to use them perfectly. During my research the most explicit documentation that I could find is this one : https://medium.com/@dhirensk/tensorflow-addons-seq2seq-example-using-attention-and-beam-search-9f463b58bc6b.
The most fuzzy stages is when TrainingSampler() and GreedyEmbeddingSampler() are used but he does not go deeper into the context of sampler and the only information that I have for understand is this piece of code found on https://www.tensorflow.org/addons/api_docs/python/tfa/seq2seq/Sampler :
#Sampler instances are used by BasicDecoder. The normal usage of a sampler is like below:
sampler = Sampler(init_args)
(initial_finished, initial_inputs) = sampler.initialize(input_tensors)
cell_input = initial_inputs
cell_state = cell.get_initial_state(...)
for time_step in tf.range(max_output_length):
cell_output, cell_state = cell(cell_input, cell_state)
sample_ids = sampler.sample(time_step, cell_output, cell_state)
(finished, cell_input, cell_state) = sampler.next_inputs(
time_step, cell_output, cell_state, sample_ids)
if tf.reduce_all(finished):
break
Furthermore, my model does not contains embedding layers because my input vectors does not need it. So, I suppose that I have to use another sampler instead of GreedyEmbeddingSampler() during test/Inference.
I hope I was clear enough and that someone can help me understand.
Hi,
You can find a complete example in the README of the seq2seq module (see the second code block, which mentions "TF 2.0, new style").
Also, the tests are usually a good place to find additional examples. For example, the test test_dynamic_decode_rnn uses TrainingSampler and BasicDecoder.
thanks for your answer but I already have a training part in my model. I was searching about the test/inference part if i don't use GreedyEmbeddingSampler(). And these links does not explain what is the purpose of Sampler object.
Am I insane, or is there a fundamental problem with the 2.0 interface, in that the Decoder.call function requires the input? Shouldn't, by definition, the input to the decoder not be known? In the README, the input is just a variable decoder_inputs that never gets defined, and the seq2seq shows needing to manually initialize() and step() through the network for evaluation, which doesn't seem at all helpful, compared to just manually calling an rnn cell directly. Am I missing something here?
@tessanix Sampler objects sample word IDs from the output distribution (e.g. using argmax) and produce the input vector for the next decoding step.
In your case, how would you produce the input vector from the predicted word ID without using an embedding layer?
@phsyron Usually you don't need to manually call initialize and step unless you need to implement a custom decoding loop. It seems the tutorial is a bad example and should be updated with better practices.
See the beam search example at the end of the README. Here, the decoder call method runs the full decoding and you don't need to pass the full input, just the start_tokens.
@guillaumekln that's closer to what I was expecting in terms of behavior, except (1) that seem to also be completely undocumented, and that example is also passing an undefined embedding_decoder as the first argument to call(), and (2) I don't want to do beamsearch decoding (unless we mean different things by this term, again no indication of what this class does).
is there any way to just pass the initial value with e.g. BasicDecoder?
The usage is similar. Here is a small example using a BasicDecoder to run greedy search:
import tensorflow as tf
import tensorflow_addons as tfa
batch_size = 4
hidden_size = 32
vocab_size = 64
start_token_id = 1
end_token_id = 2
embedding_layer = tf.keras.layers.Embedding(vocab_size, hidden_size)
decoder_cell = tf.keras.layers.LSTMCell(hidden_size)
output_layer = tf.keras.layers.Dense(vocab_size)
sampler = tfa.seq2seq.GreedyEmbeddingSampler(embedding_layer)
decoder = tfa.seq2seq.BasicDecoder(
decoder_cell, sampler, output_layer, maximum_iterations=10
)
start_tokens = tf.fill([batch_size], start_token_id)
initial_state = decoder_cell.get_initial_state(batch_size=batch_size, dtype=tf.float32)
final_output, final_state, final_lengths = decoder(
None, start_tokens=start_tokens, end_token=end_token_id, initial_state=initial_state
)
print(final_output.sample_id)
We pass None as the input because we don't know the full input as you mentioned.
I agree that the documentation should be completed...
I feel like maybe we're imagining completely different use cases for this. I'm imagining a model that I can pass a thought vector (i.e. initial state) to, and have it create a sequence as output. This means I have an initial_state coming from an Input layer. SInce this is symbolic, I can't pass None as the "input" to BasicDecoder, since keras Layer.__call__ raises a ValueError if it gets some symbolic and some non-symbolic tensors (Input being symbolic and None not being symbolic). Am I going about this the wrong way?
@tessanix
Samplerobjects sample word IDs from the output distribution (e.g. usingargmax) and produce the input vector for the next decoding step.
In your case, how would you produce the input vector from the predicted word ID without using an embedding layer?
Sorry for my late answer...
My input is a vector composed of stock prices data. I understood that embedding layers are useful for map integer vectors to float tensors but in my context, there is no need of mapping to float because stock prices data are already float.
In luong method, the decoder input is a concatenation of the context vector and the actual decoder hidden state and the context vector is the weighted sum of attention weights and encoder hidden state, right? So... I don't see where sampling (find sample_ids) is usefull