Is there any way we can get access to the vocabulary in GPT2? Like a list: [subtoken1, subtoken2, ...subtoken 10000...]
Thank you in advance!
You can obtain the 50.257 different tokens with the following code:
import transformers
from transformers import GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
vocab = list(tokenizer.encoder.keys())
assert(len(vocab) == tokenizer.vocab_size) # it returns True!
Close the issue if you've resolved your problem! ;)
Questions & Help
Is there any way we can get access to the vocabulary in GPT2? Like a list: [subtoken1, subtoken2, ...subtoken 10000...]
Thank you in advance!
thank you!
Most helpful comment
You can obtain the 50.257 different tokens with the following code:
Close the issue if you've resolved your problem! ;)