Fairseq: Add preprocessing scripts for zh-en

Created on 17 Oct 2019  路  4Comments  路  Source: pytorch/fairseq

For us to run the pretrained models here, would we need to use the same dictionary and BPE codes as was used for the pre-trained model? Or does it not matter? If it matters, can you provide the dictionaries?

The German one has a prepare script given, so maybe it generates the same dictionary, but other ones (such as the Chinese one) don't have a prepare script, so it's hard to reproduce the same dictionary.

enhancement

Most helpful comment

@myleott We need the (missing) BPECode TOGETHER with the Zh-En scripts to MAKE USE OF THE provided pre-trained model

All 4 comments

That paper used the standard preprocessed datasets provided by fairseq. You can follow the instructions to generate them: https://github.com/pytorch/fairseq/blob/master/examples/translation/README.md

You're right that the Zh-En scripts are missing.

@myleott We need the (missing) BPECode TOGETHER with the Zh-En scripts to MAKE USE OF THE provided pre-trained model

Yep, I can add them later today, thanks for pointing this out.

The BPE codes are now available in a new set of archives with a .tar.gz extension. I've also updated the README with a bunch of additional usage instructions via torch.hub: https://github.com/pytorch/fairseq/tree/master/examples/pay_less_attention_paper#example-usage-torchhub

Was this page helpful?
0 / 5 - 0 ratings

Related issues

PhilippeMarcotte picture PhilippeMarcotte  路  3Comments

kr-sundaram picture kr-sundaram  路  3Comments

mjpost picture mjpost  路  3Comments

AranKomat picture AranKomat  路  3Comments

ajesujoba picture ajesujoba  路  3Comments