Datasets: Replacement for tfds.deprecated.text.SubwordTextEncoder

Created on 17 Dec 2020  路  7Comments  路  Source: tensorflow/datasets

Unfortunately there is no statement addressing the deprecation of tfds.deprecated.text.SubwordTextEncoder.

Why was the SubwordTextEncoder deprecated? Will there be a replacement and what can/should we use instead?

help

Most helpful comment

I believe TF-Text plan to make it easier.

Yes. TFText has all the necessary classes to replace this. It's also implemented in tensorflow so you can export this preprocessing with the model.

I'm working on getting a tutorial published on how to do this.

All 7 comments

I think it was replaced by tensorflow/text

@IIIBlueberry Hm, they do not seem to have a SubwordTextEncoder yet as far as I can see.

There's discussions about it in https://github.com/tensorflow/text/issues/417 but it seems nothing has materialized so far.

We've warned of this changed in our release notes and were raising warning when user was using it.

The reason it is deprecated is because it was non maintained, had known bug and performance issues.

Tensorflow text has a few subword tokenizer, like text.BertTokenizer or SentencepieceTokenizer. There is also pretrained tokenizer that you can install from TF-Hub: like https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/2

Indeed, it was stated in v3.2.0.

BertTokenizer and SentencepieceTokenizer do not look like they can create a vocabulary for you so I guess this is something we have to do on our own now - except I am overlooking something.

@Conchylicultor thank you for clarification though. I guess my question is answered. :)

Yes, creating vocabulary isn't straightforward now. I believe TF-Text plan to make it easier. You should submit an issue on their repository to ask them directly.

@Conchylicultor Will do, thank you! :)

I believe TF-Text plan to make it easier.

Yes. TFText has all the necessary classes to replace this. It's also implemented in tensorflow so you can export this preprocessing with the model.

I'm working on getting a tutorial published on how to do this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rao208 picture rao208  路  5Comments

ericmclachlan picture ericmclachlan  路  5Comments

ageron picture ageron  路  4Comments

AmitMY picture AmitMY  路  4Comments

ashutosh1919 picture ashutosh1919  路  5Comments