Transformers: For Hugging Face transformer's hidden_states output, is the first hidden state tensor that is returned the out of the embeddings?

Created on 8 Jan 2020 · 1Comment · Source: huggingface/transformers

According to the Hugging Face Transformer documentation for the GPT2DoubleHeadsModel (under the 'output' section)

hidden_states: (optional, returned when config.output_hidden_states=True)
list of torch.FloatTensor (one for the output of each layer + the output of the embeddings)

So in this case, would the first hidden_states tensor (index of 0) that is returned be the output of the embeddings, or would the very last hidden_states tensor that is returned be the output of the embeddings?

I am confused about the order in which the hidden_states tensors are returned, because the documentation seem to indicate that the output of the embeddings is the last hidden_state tensor that is returned.

Thank you,

Source

h56cho

Most helpful comment

Indeed, the documentation might be misleading in that regard. The first value is the embedding output, every following value is the result of the preceding value being passed through an additional layer. I'll update the documentation shortly.