Hello,
I'm trying to train a transformer XL model from scratch by combining the architecture code from this library and training code from the official paper repo. But this yields to NaNs during training, just wanted to clarify the recommended way to initialize a new model.
Im doing it by,
architecture = TransfoXLConfig().from_json_file(args.config_path)
model = TransfoXLLMHeadModel(architecture)
Is there a bug in this?
This looks good to me
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@anshuman1992 could you share a code snippet/gist used for training TransformerXL model?
@anshuman1992 this will be great for me too
Most helpful comment
@anshuman1992 could you share a code snippet/gist used for training TransformerXL model?