Enable per-token classification in RoBERTa (also called "sentence tagging" or "sequence tagging" in the original BERT paper)
Currently, you can only classify whole sentences using RoBERTA. There are many usecases for wanting to classify each input token.
Add --classify-per-token or similar flag to sentence_prediction task and ensure that the classification head and sentence_prediction loss can handle processing all tokens.
-- edit: Settled on creating separate sequence_tagging task and criterion
Using the translation task and a different architecture.
https://arxiv.org/abs/1810.04805 (search "tagging")
A sequence tagging task would be helpful. I'd prefer not to complicate the existing sentence_prediction.py task, but please feel free to copy/paste into sequence_tagging.py and adapt as needed.
Cool, so there should be a sequence_tagging task and sequence_tagging criterion?
@myleott I updated the PR accordingly: https://github.com/pytorch/fairseq/pull/1709