Allennlp: upgrade pytorch-pretrained-bert => pytorch-transformers

Created on 16 Jul 2019 · 8Comments · Source: allenai/allennlp

they renamed the library, I'm not sure if they changed any APIs

Source

joelgrus

👍5

Most helpful comment

I am working on this right now, my goal is to present a unified TransformerTokenIndexer + TransformerTokenEmbedder that wraps the huggingface abstraction and works with all of their models, but certainly the devil will be in the details

joelgrus on 18 Jul 2019

👍15

All 8 comments

or if we can take advance of a unified API

joelgrus on 16 Jul 2019

👍1

Also XLNet is added to pytorch-transformers, are we going to add XLNet support in allennlp?

rulai-huajunzeng on 17 Jul 2019

The API of tokenizer is also changed in the latest version of pytorch-transformers

from pytorch_pretrained_bert.tokenization import BertTokenizer

from pytorch_transformers import BertTokenizer

BrambleXu on 18 Jul 2019

joelgrus on 18 Jul 2019

👍15

@joelgrus Any updates? I can use this for a project if ready, otherwise will have to hack something together myself.

amolk on 26 Oct 2019

This was added a while ago for cases where you have matched tokenization and embedding (see the pretrained_transformer options for tokenizers, indexers, and embedders). It's only the mismatched case that still needs work (and there might be features missing in the new stuff, too). I'm going to close this issue; I think we have an issue for handling mismatched tokenization already, but if now, we should open a separate one.

matt-gardner on 27 Oct 2019

@matt-gardner Could you please elaborate more about the mismatched tokenization?

ruijianw on 6 Nov 2019

Mismatched tokenization is when you tokenize by words but do modeling on subwords (or any other similar mismatch). You might do this, e.g., for using BERT for tagging, where you have labels and want to make predictions on words, not on subword units. There are a few ways to handle this, but one way is to have this mismatched tokenization. This is how the pretrained BERT indexer / embedder work (the ones before I added the new pretrained_transformer classes). The new functionality doesn't handle this case, but we should add it.

matt-gardner on 11 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings