Flair: Few-Shot and Zero-Shot Classification (TARS) for other languages

Created on 4 Dec 2020 · 2Comments · Source: flairNLP/flair

I'd like to know about whether TARS model can be applied the languages like burmese and thai languages.
If it can be applied, how should I write label_name_map like in here (I mean I must write the expression of labels in label_name_map in my native language ??).
I'd also like to know that can I use my own transformer embedding model which is emphasized on my native language instead of default bert model.

question

Source

Michael95-m

Most helpful comment

Hi @Michael95-m , Thanks for the questions. Please find my answers below:

Currently, we offer a pre-trained model 'tars-base', which was trained on 9 classification corpora only in English. We will consider releasing a multi-lingual model available soon.
Yes, TARS tries to find a match between the label name and the actual text under consideration. So it is recommended that the expression of labels is in the same language as that of the input text. We have not evaluated anything other than English yet though. We would be curious to know how it performs in your case. Keep us posted if possible.
By default, it uses 'bert-base-uncased', but you should be able to use any transformer model without issues. You can use either of the following:

corpus = ClassificationCorpus('path_to_your_dataset_in_fasttext_format', 
                label_name_map={'label1':'something,
                'label2':'something else'})

embeddings = TransformerDocumentEmbeddings(model='your_favourite_transformer_model', fine_tune=True, batch_size=16)

tars = TARSClassifier(task_name="your_task", label=corpus.make_label_dictionary(), document_embeddings=embeddings)

trainer = ModelTrainer(tars, corpus)

trainer.train(...)

corpus = ClassificationCorpus('path_to_your_dataset_in_fasttext_format', 
                label_name_map={'label1':'something,
                'label2':'something else'})

tars = TARSClassifier(task_name="your_task", label=corpus.make_label_dictionary(), document_embeddings='your_favourite_transformer_model')

trainer = ModelTrainer(tars, corpus)

trainer.train(...)

Hope this helps!

Kishaloy

kishaloyhalder on 4 Dec 2020

🎉3

All 2 comments

Hi @Michael95-m , Thanks for the questions. Please find my answers below:

Currently, we offer a pre-trained model 'tars-base', which was trained on 9 classification corpora only in English. We will consider releasing a multi-lingual model available soon.
Yes, TARS tries to find a match between the label name and the actual text under consideration. So it is recommended that the expression of labels is in the same language as that of the input text. We have not evaluated anything other than English yet though. We would be curious to know how it performs in your case. Keep us posted if possible.
By default, it uses 'bert-base-uncased', but you should be able to use any transformer model without issues. You can use either of the following:

corpus = ClassificationCorpus('path_to_your_dataset_in_fasttext_format', 
                label_name_map={'label1':'something,
                'label2':'something else'})

embeddings = TransformerDocumentEmbeddings(model='your_favourite_transformer_model', fine_tune=True, batch_size=16)

tars = TARSClassifier(task_name="your_task", label=corpus.make_label_dictionary(), document_embeddings=embeddings)

trainer = ModelTrainer(tars, corpus)

trainer.train(...)

corpus = ClassificationCorpus('path_to_your_dataset_in_fasttext_format', 
                label_name_map={'label1':'something,
                'label2':'something else'})

tars = TARSClassifier(task_name="your_task", label=corpus.make_label_dictionary(), document_embeddings='your_favourite_transformer_model')

trainer = ModelTrainer(tars, corpus)

trainer.train(...)

Hope this helps!

Kishaloy

kishaloyhalder on 4 Dec 2020

🎉3

@kishaloyhalder, thanks for your kind answers. That's what I'd like to know..

Michael95-m on 5 Dec 2020

Was this page helpful?

0 / 5 - 0 ratings