The paper attention is all you need explains that the source sentence is position encoded and fed into the 1st layer which generates the key-value pairs. But what sort of encoding is given as input to position encoder. How are the word embeddings of the source sentence encoded without RNN ? Please help .
The symbol modality takes the sparse token ID values and embeds the words. Passing both the IDs and the embedded tensors to the model under the "inputs_raw" and "inputs" keys in the features dict.
These embeddings are randomly initialised and trained in-situ with the rest of the model. It would be possible by using a different modality (or writing your own) to change this.
Hope this helps.
@kaushalshetty Is this issue solved now? Otherwise I guess we could close that issue :)
@stefan-it Yes I will close it. Thanks.
Most helpful comment
The symbol modality takes the sparse token ID values and embeds the words. Passing both the IDs and the embedded tensors to the model under the "inputs_raw" and "inputs" keys in the features dict.
These embeddings are randomly initialised and trained in-situ with the rest of the model. It would be possible by using a different modality (or writing your own) to change this.
Hope this helps.