How can I extract embeddings for a sentence or a set of words directly from pre-trained models (Standard BERT)? For example, I am using Spacy for this purpose at the moment where I can do it as follows:
sentence vector:
sentence_vector = bert_model("This is an apple").vector
word_vectors:
words = bert_model("This is an apple")
word_vectors = [w.vector for w in words]
I am wondering if this is possible directly with huggingface pre-trained models (especially BERT).
You can use BertModel
, it'll return the hidden states for the input sentence.
Found it, thanks @bkkaggle . Just for others who are looking for the same information.
Using Pytorch:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
Using Tensorflow:
import tensorflow as tf
from transformers import BertTokenizer, TFBertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertModel.from_pretrained('bert-base-uncased')
input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1
outputs = model(input_ids)
last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
This is a bit different for ...ForSequenceClassification
models. I've found that the item at outputs[0]
are the logits and the only way to get the hidden_states
is to set config.output_hidden_states=True
when initializing the model. Only then was I able to get the hidden_states
which are located at outputs[1]
.
Example:
inputs = {
"input_ids": batch[0],
"attention_mask": batch[1]
}
output = bertmodel(**inputs)
logits = output[0]
hidden_states = output[1]
By using this code, you can obtain a PyTorch tensor of (1, N, 768) shape, where _N_ is the number of different tokens extracted from BertTokenizer
. If you want to build the sentence vector by exploiting these N tensors, how do you do that? @engrsfi
Found it, thanks @bkkaggle . Just for others who are looking for the same information.
Using Pytorch:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
Using Tensorflow:
import tensorflow as tf from transformers import BertTokenizer, TFBertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertModel.from_pretrained('bert-base-uncased') input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
This is a bit different for
...ForSequenceClassification
models. I've found that the item atoutputs[0]
are the logits and the only way to get thehidden_states
is to setconfig.output_hidden_states=True
when initializing the model. Only then was I able to get thehidden_states
which are located atoutputs[1]
.Example:
inputs = { "input_ids": batch[0], "attention_mask": batch[1] } output = bertmodel(**inputs) logits = output[0] hidden_states = output[1]
I am interested in the last hidden states which are seen as kind of embeddings. I think you are referring to all hidden states including the output of the embedding layer.
"**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
of shape ``(batch_size, sequence_length, hidden_size)``:
Hidden-states of the model at the output of each layer plus the initial embedding outputs
.
By using this code, you can obtain a PyTorch tensor of (1, N, 768) shape, where _N_ is the number of different tokens extracted from
BertTokenizer
. If you want to build the sentence vector by exploiting these N tensors, how do you do that? @engrsfiFound it, thanks @bkkaggle . Just for others who are looking for the same information.
Using Pytorch:tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
Using Tensorflow:
import tensorflow as tf from transformers import BertTokenizer, TFBertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertModel.from_pretrained('bert-base-uncased') input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
You can take an average of them. However, I think the embeddings at first position [CLS] are considered a kind of sentence vector because only those are fed to a further classifier if any for downstream tasks. Disclaimer: I am not sure about it.
This is a bit different for
...ForSequenceClassification
models. I've found that the item atoutputs[0]
are the logits and the only way to get thehidden_states
is to setconfig.output_hidden_states=True
when initializing the model. Only then was I able to get thehidden_states
which are located atoutputs[1]
.
Example:inputs = { "input_ids": batch[0], "attention_mask": batch[1] } output = bertmodel(**inputs) logits = output[0] hidden_states = output[1]
I am interested in the last hidden states which are seen as kind of embeddings. I think you are referring to all hidden states including the output of the embedding layer.
"**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``) list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings) of shape ``(batch_size, sequence_length, hidden_size)``: Hidden-states of the model at the output of each layer plus the initial embedding outputs ```.
Should be as simple as grabbing the last element in the list:
last_layer = hidden_states[-1]
@maxzzze According to the documentation, one can get the last hidden states directly without setting this flag to True. See below.
https://huggingface.co/transformers/_modules/transformers/modeling_bert.html#BertModel
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs:
**last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)``
Sequence of hidden-states at the output of the last layer of the model.
**pooler_output**: ``torch.FloatTensor`` of shape ``(batch_size, hidden_size)``
Last layer hidden-state of the first token of the sequence (classification token)
further processed by a Linear layer and a Tanh activation function. The Linear
layer weights are trained from the next sentence prediction (classification)
objective during Bert pretraining. This output is usually *not* a good summary
of the semantic content of the input, you're often better with averaging or pooling
the sequence of hidden-states for the whole input sequence.
**hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``)
list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings)
of shape ``(batch_size, sequence_length, hidden_size)``:
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``:
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
BTW, for me, the shape of hidden_states in the below code is (batch_size, 768)
when I set this Flag to True, not sure if I can extract last hidden states from that.
output = bertmodel(**inputs)
logits = output[0]
hidden_states = output[1]
@maxzzze According to the documentation, one can get the last hidden states directly without setting this flag to True. See below.
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs: **last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)`` Sequence of hidden-states at the output of the last layer of the model. **pooler_output**: ``torch.FloatTensor`` of shape ``(batch_size, hidden_size)`` Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during Bert pretraining. This output is usually *not* a good summary of the semantic content of the input, you're often better with averaging or pooling the sequence of hidden-states for the whole input sequence. **hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``) list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings) of shape ``(batch_size, sequence_length, hidden_size)``: Hidden-states of the model at the output of each layer plus the initial embedding outputs. **attentions**: (`optional`, returned when ``config.output_attentions=True``) list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``: Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
BTW, for me, the shape of hidden_states in the below code is
(batch_size, 768)
whereas it should be(batch_size, num_heads, sequence_length, sequence_length)
.output = bertmodel(**inputs) logits = output[0] hidden_states = output[1]
I believe your comment is in reference to the standard models, but its hard to tell without a link. Can you link where to where in the documentation the pasted doc string is from?
I dont know if you saw my original comment but I was providing an example for how to get hidden_states
from the ..ForSequenceClassification
models, not the standard ones. The ..ForSequenceClassification
models do not output hidden_states
by default: https://huggingface.co/transformers/model_doc/bert.html#bertforsequenceclassification
Sorry, I missed that part :) I am referring to the standard BERTMODEL. Doc link:
https://huggingface.co/transformers/model_doc/bert.html#bertmodel
@maxzzze According to the documentation, one can get the last hidden states directly without setting this flag to True. See below.
Outputs: `Tuple` comprising various elements depending on the configuration (config) and inputs: **last_hidden_state**: ``torch.FloatTensor`` of shape ``(batch_size, sequence_length, hidden_size)`` Sequence of hidden-states at the output of the last layer of the model. **pooler_output**: ``torch.FloatTensor`` of shape ``(batch_size, hidden_size)`` Last layer hidden-state of the first token of the sequence (classification token) further processed by a Linear layer and a Tanh activation function. The Linear layer weights are trained from the next sentence prediction (classification) objective during Bert pretraining. This output is usually *not* a good summary of the semantic content of the input, you're often better with averaging or pooling the sequence of hidden-states for the whole input sequence. **hidden_states**: (`optional`, returned when ``config.output_hidden_states=True``) list of ``torch.FloatTensor`` (one for the output of each layer + the output of the embeddings) of shape ``(batch_size, sequence_length, hidden_size)``: Hidden-states of the model at the output of each layer plus the initial embedding outputs. **attentions**: (`optional`, returned when ``config.output_attentions=True``) list of ``torch.FloatTensor`` (one for each layer) of shape ``(batch_size, num_heads, sequence_length, sequence_length)``: Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
BTW, for me, the shape of hidden_states in the below code is
(batch_size, 768)
whereas it should be(batch_size, num_heads, sequence_length, sequence_length)
.output = bertmodel(**inputs) logits = output[0] hidden_states = output[1]
I believe your comment is in reference to the standard models, but its hard to tell without a link. Can you link where to where in the documentation the pasted doc string is from?
I dont know if you saw my original comment but I was providing an example for how to get
hidden_states
from the..ForSequenceClassification
models, not the standard ones. The..ForSequenceClassification
models do not outputhidden_states
by default: https://huggingface.co/transformers/model_doc/bert.html#bertforsequenceclassification
@engrsfi @maxzzze @bkkaggle
Please, look here. I hope it can help :)
@TheEdoardo93 is this example taking the first element in each of the hidden_states
?
@engrsfi You can process the hidden states of BERT (all layers or only the last layer) in whatever way you want.
Most people usually only take the hidden states of the [CLS] token of the last layer - using the hidden states for all tokens or from multiple layers doesn't usually help you that much.
If you want to get the embeddings for classification, just do something like:
input_sentence = torch.tensor(tokenizer.encode("[CLS] My sentence")).unsqueeze(0)
out = model(input_sentence)
embeddings_of_last_layer = out[0]
cls_embeddings = embeddings_of_last_layer[0]
@engrsfi You can process the hidden states of BERT (all layers or only the last layer) in whatever way you want.
Most people usually only take the hidden states of the [CLS] token of the last layer - using the hidden states for all tokens or from multiple layers doesn't usually help you that much.
If you want to get the embeddings for classification, just do something like:
input_sentence = torch.tensor(tokenizer.encode("[CLS] My sentence")).unsqueeze(0) out = model(input_sentence) embeddings_of_last_layer = out[0] cls_embeddings = embeddings_of_last_layer[0]
Do you have any reference as to "people usually only take the hidden states of the [CLS] token of the last layer"?
There is some clarification about the use of the last hidden states in the BERT Paper.
According to the paper, the last hidden state for [CLS] is mainly used for classification tasks and the last hidden states for all tokens are used for token level tasks such as sequence tagging or question answering.
From the paper:
At the output, the token representations are fed into an output layer for token level tasks, such as sequence tagging or question answering, and the [CLS] representation is fed into an output layer for classification, such as entailment or sentiment analysis.
Reference:
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/pdf/1810.04805.pdf)
What about ALBERT? The output of the last hidden state isn't the same of the embedding because in the doc they say that the embedding have a size of 128 for every model (https://arxiv.org/pdf/1909.11942.pdf page 6).
But I'm not sure if the 128-embedding referenced in the table is something internally used to represent words or the final word embedding.
Found it, thanks @bkkaggle . Just for others who are looking for the same information.
Using Pytorch:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
Using Tensorflow:
import tensorflow as tf from transformers import BertTokenizer, TFBertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertModel.from_pretrained('bert-base-uncased') input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
if batch size is N, how to convert?
What about ALBERT? The output of the last hidden state isn't the same of the embedding because in the doc they say that the embedding have a size of 128 for every model (https://arxiv.org/pdf/1909.11942.pdf page 6).
But I'm not sure if the 128-embedding referenced in the table is something internally used to represent words or the final word embedding.
128 is used internally by Albert. The output of the model (last hidden state) is your actual word embeddings. In order to understand this better, you should read the following blog from Google.
https://ai.googleblog.com/2019/12/albert-lite-bert-for-self-supervised.html
Quote:
"The key to optimizing performance, captured in the design of ALBERT, is to allocate the model’s capacity more efficiently. Input-level embeddings (words, sub-tokens, etc.) need to learn context-independent representations, a representation for the word “bank”, for example. In contrast, hidden-layer embeddings need to refine that into context-dependent representations, e.g., a representation for “bank” in the context of financial transactions, and a different representation for “bank” in the context of river-flow management.
This is achieved by factorization of the embedding parametrization — the embedding matrix is split between input-level embeddings with a relatively-low dimension (e.g., 128), while the hidden-layer embeddings use higher dimensionalities (768 as in the BERT case, or more). With this step alone, ALBERT achieves an 80% reduction in the parameters of the projection block, at the expense of only a minor drop in performance — 80.3 SQuAD2.0 score, down from 80.4; or 67.9 on RACE, down from 68.2 — with all other conditions the same as for BERT."
Found it, thanks @bkkaggle . Just for others who are looking for the same information.
Using Pytorch:tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
Using Tensorflow:
import tensorflow as tf from transformers import BertTokenizer, TFBertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertModel.from_pretrained('bert-base-uncased') input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
if batch size is N, how to convert?
If I understand you correctly, you are asking for how to get the last hidden states for all entries in a batch of size N. If that's the case, then here is the explanation.
Your model expect input of the following shape:
(batch_size, sequence_length)
and returns last hidden states of the following shape:
(batch_size, sequence_length, hidden_size)
You can just go through the last hidden states to get the individual last hidden state for each input in the batch size of N.
Reference:
https://huggingface.co/transformers/model_doc/bert.html
@engrsfi : What if I want to use bert embedding vector of each token as an input to an LSTM network? Can I get the embedding of each token of the sentence from the last hidden layer of the bert model? In this case I think I can't just use the embedding for [CLS] token as I need word embedding of each token?
I used the code below to get bert's word embedding for all tokens of my sentences. I padded all my sentences to have maximum length of 80 and also used attention mask to ignore padded elements. in this case the shape of last_hidden_states element is of size (batch_size ,80 ,768). However, when I see my embeddings, I can see that embedding vectors for padded elements are not the same? like I have a vector of size 768 for each token of the sentence(most of them are padded tokens). but vectors for padded element are not equal. is it natural?
import tensorflow as tf
import numpy as np
from transformers import BertTokenizer, TFBertModel
bert_model = TFBertModel.from_pretrained("bert-base-uncased")
bert_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
tokenized = x_train['token'].apply((lambda x: bert_tokenizer.encode(x, add_special_tokens=True, max_length=80)))
padded = np.array([i + [0]*(80-len(i)) for i in tokenized.values])
attention_mask = np.where(padded != 0, 1, 0)
input_ids = tf.constant(padded)
attention_mask = tf.constant(attention_mask)
output= bert_model(input_ids, attention_mask=attention_mask)
last_hidden_states=output[0]
How can I extract embeddings for a sentence or a set of words directly from pre-trained models (Standard BERT)? For example, I am using Spacy for this purpose at the moment where I can do it as follows:
sentence vector:
sentence_vector = bert_model("This is an apple").vector
word_vectors:
words = bert_model("This is an apple") word_vectors = [w.vector for w in words]
I am wondering if this is possible directly with huggingface pre-trained models (especially BERT).
Hi, could I ask how you would use Spacy to do this? Is there a link? Thanks a lot.
How can I extract embeddings for a sentence or a set of words directly from pre-trained models (Standard BERT)? For example, I am using Spacy for this purpose at the moment where I can do it as follows:
sentence vector:
sentence_vector = bert_model("This is an apple").vector
word_vectors:words = bert_model("This is an apple") word_vectors = [w.vector for w in words]
I am wondering if this is possible directly with huggingface pre-trained models (especially BERT).
Hi, could I ask how you would use Spacy to do this? Is there a link? Thanks a lot.
Here is the link:
https://spacy.io/usage/vectors-similarity
@engrsfi You can process the hidden states of BERT (all layers or only the last layer) in whatever way you want.
Most people usually only take the hidden states of the [CLS] token of the last layer - using the hidden states for all tokens or from multiple layers doesn't usually help you that much.
If you want to get the embeddings for classification, just do something like:
input_sentence = torch.tensor(tokenizer.encode("[CLS] My sentence")).unsqueeze(0) out = model(input_sentence) embeddings_of_last_layer = out[0] cls_embeddings = embeddings_of_last_layer[0]
Thank you for sharing the code. It really helped in understanding tokenization in BERT. I ran this and had a minor problem. Shouldn't it be:
cls_embeddings = embeddings_of_last_layer[0][0]
? This is because embeddings_of_last_layer is of the dimension: 1#tokens#hidden-units. Then, since [CLS] is the first token (and usually have 101 as id), we want embedding corresponding to just [CLS]. embeddings_of_last_layer[0]
is of shape #tokens*#hidden-units and contains embeddings of all the tokens.
@sahand91
pooled_output, sequence_output = bert_model(input_)
pooled_output.shape = (1, 768), one vector on 768 entries (represent the whole sentence)
sequence_output.shape = (batch_size, max_len, dim), (1, 256, 768) bs = 1, n_tokens = 256
sequence output gives the vector for each token of the sentence.
I have used the sequence output for classification task like sentiment analysis. As the paper mentions that the pooled output is not a good representation of the whole sentence so we use the sequence output and feed it further in a CNN or LSTM.
So I don't see any problem in using the sequence output for classification tasks as we get to see the actual vector representation of the word say "bank" in both contexts "commercial" and "location" (bank of a river)
@engrsfi You can process the hidden states of BERT (all layers or only the last layer) in whatever way you want.
Most people usually only take the hidden states of the [CLS] token of the last layer - using the hidden states for all tokens or from multiple layers doesn't usually help you that much.
If you want to get the embeddings for classification, just do something like:input_sentence = torch.tensor(tokenizer.encode("[CLS] My sentence")).unsqueeze(0) out = model(input_sentence) embeddings_of_last_layer = out[0] cls_embeddings = embeddings_of_last_layer[0]
Thank you for sharing the code. It really helped in understanding tokenization in BERT. I ran this and had a minor problem. Shouldn't it be:
cls_embeddings = embeddings_of_last_layer[0][0]
? This is because embeddings_of_last_layer is of the dimension: 1#tokens#hidden-units. Then, since [CLS] is the first token (and usually have 101 as id), we want embedding corresponding to just [CLS].embeddings_of_last_layer[0]
is of shape #tokens*#hidden-units and contains embeddings of all the tokens.
Yes i think the same. @sumitsidana
embeddings_of_last_layer[0][0].shape
Out[179]: torch.Size([144]) # where 144 in my case is the hidden_size
Anyone confirming that embeddings_of_last_layer[0][0] is the embedding related to CLS token for each sequence?
@engrsfi You can process the hidden states of BERT (all layers or only the last layer) in whatever way you want.
Most people usually only take the hidden states of the [CLS] token of the last layer - using the hidden states for all tokens or from multiple layers doesn't usually help you that much.
If you want to get the embeddings for classification, just do something like:input_sentence = torch.tensor(tokenizer.encode("[CLS] My sentence")).unsqueeze(0) out = model(input_sentence) embeddings_of_last_layer = out[0] cls_embeddings = embeddings_of_last_layer[0]
Thank you for sharing the code. It really helped in understanding tokenization in BERT. I ran this and had a minor problem. Shouldn't it be:
cls_embeddings = embeddings_of_last_layer[0][0]
? This is because embeddings_of_last_layer is of the dimension: 1#tokens#hidden-units. Then, since [CLS] is the first token (and usually have 101 as id), we want embedding corresponding to just [CLS].embeddings_of_last_layer[0]
is of shape #tokens*#hidden-units and contains embeddings of all the tokens.Yes i think the same. @sumitsidana
embeddings_of_last_layer[0][0].shape
Out[179]: torch.Size([144]) # where 144 in my case is the hidden_sizeAnyone confirming that embeddings_of_last_layer[0][0] is the embedding related to CLS token for each sequence?
Yes it is. but it is only for first batch. you will have to loop through all the batches and get the first element (CLS) for each sentence.
Yes gotcha. Thanks
This is a bit different for
...ForSequenceClassification
models. I've found that the item atoutputs[0]
are the logits and the only way to get thehidden_states
is to setconfig.output_hidden_states=True
when initializing the model. Only then was I able to get thehidden_states
which are located atoutputs[1]
.Example:
inputs = { "input_ids": batch[0], "attention_mask": batch[1] } output = bertmodel(**inputs) logits = output[0] hidden_states = output[1]
logtis = output[0] means the word embedding. So, does hidden_states = output[1] means the sentence level embedding ?
Found it, thanks @bkkaggle . Just for others who are looking for the same information.
Using Pytorch:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
Using Tensorflow:
import tensorflow as tf from transformers import BertTokenizer, TFBertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertModel.from_pretrained('bert-base-uncased') input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
outputs[0] is sentence embedding for "Hello, my dog is cute" right?
then what is outputs[1]?
Found it, thanks @bkkaggle . Just for others who are looking for the same information.
Using Pytorch:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased') input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute")).unsqueeze(0) # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
Using Tensorflow:
import tensorflow as tf from transformers import BertTokenizer, TFBertModel tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertModel.from_pretrained('bert-base-uncased') input_ids = tf.constant(tokenizer.encode("Hello, my dog is cute"))[None, :] # Batch size 1 outputs = model(input_ids) last_hidden_states = outputs[0] # The last hidden-state is the first element of the output tuple
If I want to encode a list of strings,
input_ids = torch.tensor(tokenizer.encode(["Hello, my dog is cute", "how are you"])).unsqueeze(0)
It does not really gives me 2*768 array. The only is would be
input_ids = [torch.tensor([tokenizer.encode(text) for text in ["Hello, my dog is cute", "how are you"]]).unsqueeze(0)]
Anything to make it faster?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Found it, thanks @bkkaggle . Just for others who are looking for the same information.
Using Pytorch:
Using Tensorflow: