Transformers: BERT tokenizer - set special tokens

Created on 10 May 2019  路  3Comments  路  Source: huggingface/transformers

Hi,

I was wondering whether the team could expand BERT so that fine-tuning with newly defined special tokens would be possible - just like the GPT allows.

@thomwolf Could you share your thought with me on that?

Regards,
Adrian.

wontfix

Most helpful comment

Hi Adrian, BERT already has a few unused tokens that can be used similarly to the special_tokens of GPT/GPT-2.
For more details see https://github.com/google-research/bert/issues/9#issuecomment-434796704 and issue #405 for instance.

All 3 comments

Hi Adrian, BERT already has a few unused tokens that can be used similarly to the special_tokens of GPT/GPT-2.
For more details see https://github.com/google-research/bert/issues/9#issuecomment-434796704 and issue #405 for instance.

In case we use an unused special token from the vocabulary, is it enough to finetune a classification task or do we need to train an embedding from scratch? Did anyone already do this?

Two different and somehow related questions I had when looking into the implementation:

1) The Bert paper mentions a (learned) positional embedding. How is this implemented here? examples/extract_features/convert_examples_to_features() defines tokens (representation), input_type_ids (the difference between the first and second sequence) and an input_mask (distinguishing padding/real tokens) but no positional embedding. Is this done internally?

2) Can I use a special token as input_type_ids for Bert? In the classification example, only values of [0,1] are possible and I'm wondering what would happen if I would choose a special token instead? Is this possible with a pretrained embedding or do i need to retrain the whole embedding as a consequence?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings