Fairseq: How can I feed a binarized class label file to BART training?

Created on 25 Jun 2020  路  7Comments  路  Source: pytorch/fairseq

Is there any way that I can feed a label file to the training mechanism, Farrelly with source and target files.

question

All 7 comments

Could you be more specific, please!

@Vsanku01 Thank you for the interest.

Basically I want to feed a class label for the source text. I am thinking about whether I can feed a class label, while feeding source and target text (similar to text generation or translation task) in the training time.

I think the easiest way would be to build this into your vocabulary. For example, find a unique token (ex: __class_label_0__, __class__label_1__, ..., __class_label_n__) and prepend these special tokens on to the beginning (or end) of your sequences before calling fairseq-preprocess.

Thank you very much.

@lematt1991

How can I create a unique token as you mentioned above?

What if I append a token like "__class_label_0__" to the text and then do the tokenization.

What if I append a token like "class_label_0" to the text and then do the tokenization.

Yep, that's exactly what I meant.

Thanks a lot.

Was this page helpful?
0 / 5 - 0 ratings