Fairseq: AdaFactor to save GPU memory?

Created on 20 Sep 2018 · 3Comments · Source: pytorch/fairseq

Tensor2Tensor has AdaFactor to drastically reduce the GPU memory usage. I believe it would be helpful for FairSeq to have this by default.

enhancement

Source

AranKomat

👍1

Most helpful comment

Working on this

luciodery on 4 Dec 2018

👍6

Good idea!

myleott on 20 Sep 2018

Working on this

luciodery on 4 Dec 2018

👍6

luciodery on 25 Jan 2019

❤3

Was this page helpful?

0 / 5 - 0 ratings

--share-all-embeddings requires a joined dictionary

neel04 · 3Comments

Accuracy drop after adding quant_noise to InceptionResnetV1

jmatak · 3Comments

Error during inference of model trained on fp16

Raghava14 · 3Comments

Reproduce Billion Word benchmark for paper by Baevski and Auli, 2018.

yilegu · 3Comments

How to generate my own distillation dataset for Levenshtien Transformer

Ir1d · 3Comments