Transformers: BertTokenizerFast does not support `pad_to_max_length` argument

Created on 25 Jun 2020  路  4Comments  路  Source: huggingface/transformers

馃悰 Bug

The fast tokenizer has different behavior from the normal tokenizer.

from transformers import BertTokenizer, BertTokenizerFast

BertTokenizer.from_pretrained("bert-base-uncased").encode("hello world", max_length=128, pad_to_max_length="right")
# succeeds
BertTokenizerFast.from_pretrained("bert-base-uncased").encode("hello world", max_length=128, pad_to_max_length="right")
*** TypeError: enable_padding() got an unexpected keyword argument 'max_length'

Environment info

  • transformers version: 2.11.0
  • tokenizers version: 0.8.0rc3
  • Platform: Ubuntu 18.04
  • Python version: 3.7
Tokenization

Most helpful comment

Yes, we even have a nice tutorial on the new tokenizer API now thanks to the amazing @sgugger:
https://huggingface.co/transformers/master/preprocessing.html

All 4 comments

Hi @jarednielsen, if you installed from source then padding is handled in a different way. You'll need to use the newly added padding argument. According to the docs

padding (:obj:Union[bool, str], optional, defaults to :obj:False):
Activate and control padding. Accepts the following values:

        * `True` or `'longest'`: pad to the longest sequence in the batch (or no padding if only a single sequence if provided),
        * `'max_length'`: pad to a max length specified in `max_length` or to the max acceptable input length for the model if no length is provided (`max_length=None`)
        * `False` or `'do_not_pad'` (default): No padding (i.e. can output batch with sequences of uneven lengths)

Yes, this works on master (both the old and new tokenizer API) and should work in the new release that will be out very soon.

Thank you for the quick response! Reading https://github.com/huggingface/transformers/pull/4510 makes it much clearer.

Yes, we even have a nice tutorial on the new tokenizer API now thanks to the amazing @sgugger:
https://huggingface.co/transformers/master/preprocessing.html

Was this page helpful?
0 / 5 - 0 ratings

Related issues

siddsach picture siddsach  路  3Comments

HansBambel picture HansBambel  路  3Comments

iedmrc picture iedmrc  路  3Comments

fabiocapsouza picture fabiocapsouza  路  3Comments

chuanmingliu picture chuanmingliu  路  3Comments