Transformers: "This tokenizer does not make use of special tokens." warning

Created on 15 Nov 2019 · 9Comments · Source: huggingface/transformers

❓ Questions & Help

I just updated to the latest version transformers. Now when I use tokenizer to encode word, it always show the warning "This tokenizer does not make use of special tokens."

Is there any way to hide that warning? Thank you.

Source

weiguowilliam

👍4

Most helpful comment

Is this fixed? If not, I think it should be open until it's been fixed.

iedmrc on 27 Nov 2019

👍2

All 9 comments

Same here. Is there any way to suppress this warning? I use run_lm_finetuning.py to finetune distilgpt2 and it outputs thousands "This tokenizer does not make use of special tokens.". It's so annoying :(

iedmrc on 16 Nov 2019

Here's how to suppress the warnings until this is fixed:

import logging
logging.getLogger('transformers.tokenization_utils').setLevel(logging.ERROR)

rvanasa on 19 Nov 2019

👍2

Here's how to suppress the warnings until this is fixed:
import logging
logging.getLogger('transformers.tokenization_utils').disabled = True

Thank you!

weiguowilliam on 19 Nov 2019

Is this fixed? If not, I think it should be open until it's been fixed.

iedmrc on 27 Nov 2019

👍2

This has been fixed on the master and in the latest release (2.2.1)

thomwolf on 5 Dec 2019

👍1

Hi, I use the latest release but I still have this problem.

yeliu918 on 7 Dec 2019

@iedmrc
I close it because the 'log' method works. I don't know whether it's a bug or not.

weiguowilliam on 9 Dec 2019

Hi @yeliu918, could you please show us what you obtain when running this script in your environment?

from transformers import GPT2Tokenizer, __version__
print(__version__)

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
print(tokenizer.encode("What does this output?"))

LysandreJik on 9 Dec 2019

I am getting warning despite trying everything mentioned above....

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, __version__
import logging
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
print(__version__) 
logging.getLogger('transformers.tokenization_utils').disabled = True
tokens_tensor = torch.tensor([tokenizer.encode("some example sentence")])
greedy_output = model.generate(tokens_tensor, max_length=60, num_beams=16)

Version 2.8.0

Setting pad_token_id to 50256 (first eos_token_id) to generate sequence