Transformers: Training Transformer XL from scratch

Created on 30 Apr 2019  路  4Comments  路  Source: huggingface/transformers

Hello,

I'm trying to train a transformer XL model from scratch by combining the architecture code from this library and training code from the official paper repo. But this yields to NaNs during training, just wanted to clarify the recommended way to initialize a new model.

Im doing it by,

architecture = TransfoXLConfig().from_json_file(args.config_path)
 model = TransfoXLLMHeadModel(architecture)

Is there a bug in this?

wontfix

Most helpful comment

@anshuman1992 could you share a code snippet/gist used for training TransformerXL model?

All 4 comments

This looks good to me

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@anshuman1992 could you share a code snippet/gist used for training TransformerXL model?

@anshuman1992 this will be great for me too

Was this page helpful?
0 / 5 - 0 ratings

Related issues

siddsach picture siddsach  路  3Comments

chuanmingliu picture chuanmingliu  路  3Comments

fabiocapsouza picture fabiocapsouza  路  3Comments

lcswillems picture lcswillems  路  3Comments

0x01h picture 0x01h  路  3Comments