Transformers: Training Transformer XL from scratch

Created on 30 Apr 2019 · 4Comments · Source: huggingface/transformers

Hello,

I'm trying to train a transformer XL model from scratch by combining the architecture code from this library and training code from the official paper repo. But this yields to NaNs during training, just wanted to clarify the recommended way to initialize a new model.

Im doing it by,

architecture = TransfoXLConfig().from_json_file(args.config_path)
 model = TransfoXLLMHeadModel(architecture)

Is there a bug in this?

wontfix

Source

anshuman1992

Most helpful comment

@anshuman1992 could you share a code snippet/gist used for training TransformerXL model?

ksopyla on 11 Aug 2019

👍3

All 4 comments

This looks good to me

thomwolf on 1 May 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.