Hi @Michael95-m , Thanks for the questions. Please find my answers below:
Currently, we offer a pre-trained model 'tars-base', which was trained on 9 classification corpora only in English. We will consider releasing a multi-lingual model available soon.
Yes, TARS tries to find a match between the label name and the actual text under consideration. So it is recommended that the expression of labels is in the same language as that of the input text. We have not evaluated anything other than English yet though. We would be curious to know how it performs in your case. Keep us posted if possible.
By default, it uses 'bert-base-uncased', but you should be able to use any transformer model without issues. You can use either of the following:
corpus = ClassificationCorpus('path_to_your_dataset_in_fasttext_format',
label_name_map={'label1':'something,
'label2':'something else'})
embeddings = TransformerDocumentEmbeddings(model='your_favourite_transformer_model', fine_tune=True, batch_size=16)
tars = TARSClassifier(task_name="your_task", label=corpus.make_label_dictionary(), document_embeddings=embeddings)
trainer = ModelTrainer(tars, corpus)
trainer.train(...)
or
corpus = ClassificationCorpus('path_to_your_dataset_in_fasttext_format',
label_name_map={'label1':'something,
'label2':'something else'})
tars = TARSClassifier(task_name="your_task", label=corpus.make_label_dictionary(), document_embeddings='your_favourite_transformer_model')
trainer = ModelTrainer(tars, corpus)
trainer.train(...)
Hope this helps!
Kishaloy
@kishaloyhalder, thanks for your kind answers. That's what I'd like to know..
Most helpful comment
Hi @Michael95-m , Thanks for the questions. Please find my answers below:
Currently, we offer a pre-trained model 'tars-base', which was trained on 9 classification corpora only in English. We will consider releasing a multi-lingual model available soon.
Yes, TARS tries to find a match between the label name and the actual text under consideration. So it is recommended that the expression of labels is in the same language as that of the input text. We have not evaluated anything other than English yet though. We would be curious to know how it performs in your case. Keep us posted if possible.
By default, it uses 'bert-base-uncased', but you should be able to use any transformer model without issues. You can use either of the following:
or
Hope this helps!
Kishaloy