According to the Hugging Face Transformer documentation for the GPT2DoubleHeadsModel (under the 'output' section)
hidden_states: (optional, returned when config.output_hidden_states=True)
list of torch.FloatTensor (one for the output of each layer + the output of the embeddings)
So in this case, would the first hidden_states tensor (index of 0) that is returned be the output of the embeddings, or would the very last hidden_states tensor that is returned be the output of the embeddings?
I am confused about the order in which the hidden_states tensors are returned, because the documentation seem to indicate that the output of the embeddings is the last hidden_state tensor that is returned.
Thank you,
Indeed, the documentation might be misleading in that regard. The first value is the embedding output, every following value is the result of the preceding value being passed through an additional layer. I'll update the documentation shortly.
Most helpful comment
Indeed, the documentation might be misleading in that regard. The first value is the embedding output, every following value is the result of the preceding value being passed through an additional layer. I'll update the documentation shortly.