Fairseq: Implement Reducing Transformer Depth on Demand With Structred Dropout

Created on 16 Oct 2019 · 6Comments · Source: pytorch/fairseq

Please consider implementing https://arxiv.org/pdf/1909.11556.pdf with a README.md in the examples to reproduce the results. @huihuifan

Source

gvskalyan

Most helpful comment

We are actively working on this and will release code and commands for reproduction very soon, then open source models soon afterwards. Thanks for your interest!

huihuifan on 16 Oct 2019

❤1 👍1

All 6 comments

We are actively working on this and will release code and commands for reproduction very soon, then open source models soon afterwards. Thanks for your interest!

huihuifan on 16 Oct 2019

❤1 👍1

https://github.com/pytorch/fairseq/commit/dabbef467692ef4ffb7de8a01235876bd7320a93 solves this

gvskalyan on 29 Oct 2019

Acoording to this ran --model-overrides "{'decoder_layers_to_keep':'0,2,4,6'}" for a transformer-base model while evaluating, but ran into below error.
RuntimeError: Error(s) in loading state_dict for TransformerModel:
Missing key(s) in state_dict:

Also tried resuming training from layer-dropped out model, still the same error occurs @huihuifan .
Thanks.

gvskalyan on 31 Dec 2019

hi @gvskalyan, does the error message say which keys in the state dict are missing?

huihuifan on 2 Jan 2020

Sorry, it is evident that I have trained the transformer base, but am overriding only decoder layers but not encoder layers. Also it works only for same layers on both sides. Thanks.

gvskalyan on 2 Jan 2020

thanks for raising this error, I will look into it on current master branch and get back to you.

huihuifan on 3 Jan 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings