Transformers: Training Transformer XL from scratch

Created on 30 Apr 2019  路  4Comments  路  Source: huggingface/transformers

Hello,

I'm trying to train a transformer XL model from scratch by combining the architecture code from this library and training code from the official paper repo. But this yields to NaNs during training, just wanted to clarify the recommended way to initialize a new model.

Im doing it by,

architecture = TransfoXLConfig().from_json_file(args.config_path)
 model = TransfoXLLMHeadModel(architecture)

Is there a bug in this?

wontfix

Most helpful comment

@anshuman1992 could you share a code snippet/gist used for training TransformerXL model?

All 4 comments

This looks good to me

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@anshuman1992 could you share a code snippet/gist used for training TransformerXL model?

@anshuman1992 this will be great for me too

Was this page helpful?
0 / 5 - 0 ratings