Fairseq: Question about positional encoding.

Created on 7 Dec 2018 · 1Comment · Source: pytorch/fairseq

I have a question about the implementation of the position embedding. It seems like position endoding is randomly initialized and updated in the training just like tokens embedding. What confuses me is how does this ways works learn specific position information? Can you point out what I misunderstood?

Source

Jingyilang

Most helpful comment

There's two kinds of positional embeddings.

The first are learned ones [1], which learn a separate embedding for each position in the input. For example, if your sentence is:

    words: the cat sat on the mat
positions:  0   1   2   3  4   5

input to network: emb(the)+emb(pos0)   emb(cat)+emb(pos1)   emb(sat)+emb(pos2)   ...

Another kind of positional embedding is the sinusoidal ones introduced in the "Attention is all you need" paper. These are a static function of the position number [2].

[1] https://github.com/pytorch/fairseq/blob/master/fairseq/modules/learned_positional_embedding.py
[2] https://github.com/pytorch/fairseq/blob/master/fairseq/modules/sinusoidal_positional_embedding.py

myleott on 7 Dec 2018

👍2 🎉1

>All comments

There's two kinds of positional embeddings.

The first are learned ones [1], which learn a separate embedding for each position in the input. For example, if your sentence is:

    words: the cat sat on the mat
positions:  0   1   2   3  4   5

input to network: emb(the)+emb(pos0)   emb(cat)+emb(pos1)   emb(sat)+emb(pos2)   ...

Another kind of positional embedding is the sinusoidal ones introduced in the "Attention is all you need" paper. These are a static function of the position number [2].

myleott on 7 Dec 2018

👍2 🎉1

Was this page helpful?

0 / 5 - 0 ratings