Transformers: pad_to_max_length param is not supported in PreTrainedTokenizer.encode

Created on 13 Dec 2019  ยท  4Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help

Hello,

I've installed the current version of transformers package (2.2.1) through pip on Python 3.6.8rc1 on Windows 10 Pro (build 17763.678 if it is important). I am trying to get a sentence encoded and padded at the same time:

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
temp = tokenizer.encode(text, add_special_tokens=True, max_length=MAX_LENGTH,
                        pad_to_max_length=True)

And I'm getting an error, that pad_to_max_length is unrecognized option. What am I missing?

All 4 comments

Hello, can you try with the patch that was released today (2.2.2) and let me know if it works for you?

By updating the Transformers library from 2.2.1 to 2.2.2, it works as expected without the bug highlighted by @madrugado.

My environment is the following:

  • Python 3.6.9
  • OS: Ubuntu 16.04
  • Transformers: 2.2.2 (installed from PyPi with pip install transformers)
  • PyTorch: 1.3.1.
  • TensorFlow: 2.0

The stack trace is the following:

Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import transformers
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/<user>/anaconda3/envs/huggingface/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
>>> text='Hello, my name is Edward'
>>> temp = tokenizer.encode(text, add_special_tokens=True, max_length=50, pad_to_max_length=True)
>>> temp
[101, 7592, 1010, 2026, 2171, 2003, 3487, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> 

Hello, can you try with the patch that was released today (2.2.2) and let me know if it works for you?

I also confirm that with 2.2.2 version everything is working fine. Thanks!

There is no clear documentation on pad_to_max_length param I had hard time finding this. It would be great if it is added to docs, or if it is present can you point me to that page. Thanks

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fabiocapsouza picture fabiocapsouza  ยท  3Comments

guanlongtianzi picture guanlongtianzi  ยท  3Comments

adigoryl picture adigoryl  ยท  3Comments

iedmrc picture iedmrc  ยท  3Comments

zhezhaoa picture zhezhaoa  ยท  3Comments