Transformers: Confusion in understanding the output of BERTforTokenClassification class from Transformers library

Created on 25 Mar 2020  路  2Comments  路  Source: huggingface/transformers

It is the example given in the documentation of transformers pytorch library

from transformers import BertTokenizer, BertForTokenClassification
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForTokenClassification.from_pretrained('bert-base-uncased')

input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0)  # Batch size 1
labels = torch.tensor([1] * input_ids.size(1)).unsqueeze(0)  # Batch size 1
outputs = model(input_ids, labels=labels)

loss, scores, hidden_states,attentions = outputs

Here hidden_states is a tuple of length 13 and contains hidden-states of the model at the output of each layer plus the initial embedding outputs. I would like to know, whether hidden_states[0] or hidden_states[12] represent the final hidden state vectors?

Thanks in advance @thomwolf @nreimers

Most helpful comment

AFAIK, 12 does

All 2 comments

AFAIK, 12 does

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lcswillems picture lcswillems  路  3Comments

fyubang picture fyubang  路  3Comments

delip picture delip  路  3Comments

siddsach picture siddsach  路  3Comments

chuanmingliu picture chuanmingliu  路  3Comments