It is the example given in the documentation of transformers pytorch library
from transformers import BertTokenizer, BertForTokenClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0) # Batch size 1
outputs = model(input_ids, labels=labels)
loss, scores, hidden_states,attentions = outputs
Here hidden_states is a tuple of length 13 and contains hidden-states of the model at the output of each layer plus the initial embedding outputs. I would like to know, whether hidden_states[0] or hidden_states[12] represent the final hidden state vectors?
Thanks in advance @thomwolf @nreimers
AFAIK, 12 does
For detailed explanation for this, refer
https://stackoverflow.com/questions/60847291/confusion-in-understanding-the-output-of-bertfortokenclassification-class-from-t
Most helpful comment
AFAIK,
12does