Transformers: Is training from scratch possible now?

Created on 18 Sep 2019  Â·  9Comments  Â·  Source: huggingface/transformers

Do the models support training from scratch, together with original (paper) parameters?

wontfix

Most helpful comment

You can now leave --model_name_or_path to None in run_language_modeling.py to train a model from scratch.

See also https://huggingface.co/blog/how-to-train

All 9 comments

You can just instanciate the models without the .from_pretraining() like so:

config = BertConfig(**optionally your favorite parameters**)
model = BertForPretraining(config)

I added a flag to run_lm_finetuning.py that gets checked in the main(). Maybe this snipped helps (note, I am only using this with Bert w/o next sentence prediction).

# check if instead initialize freshly
if args.do_fresh_init:
    config = config_class()
    tokenizer = tokenizer_class()
    if args.block_size <= 0:
        args.block_size = tokenizer.max_len  # Our input block size will be the max possible for the model
    args.block_size = min(args.block_size, tokenizer.max_len)
    model = model_class(config=config)
else:
    config = config_class.from_pretrained(args.config_name if args.config_name else args.model_name_or_path)
    tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name if args.tokenizer_name else args.model_name_or_path)
    if args.block_size <= 0:
        args.block_size = tokenizer.max_len  # Our input block size will be the max possible for the model
    args.block_size = min(args.block_size, tokenizer.max_len)
    model = model_class.from_pretrained(args.model_name_or_path, from_tf=bool('.ckpt' in args.model_name_or_path), config=config)
model.to(args.device)

Hi,

thanks for the quick response.
I am more interested in the XLNet and TransformerXL models. Would they have the same interface?

I don’t know firsthand, but suppose so and it is fundamentally an easy problem to reinitialize weights randomly before any kind of training in pytorch :)

Good luck,
Zacharias
Am 18. Sep. 2019, 1:56 PM +0200 schrieb Stamenov notifications@github.com:

Hi,
thanks for the quick response.
I am more interested in the XLNet and TransformerXL models. Would they have the same interface?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

I think XLNet requires a very specific training procedure, see #943 :+1:

"For XLNet, the implementation in this repo is missing some key functionality (the permutation generation function and an analogue of the dataset record generator) which you'd have to implement yourself."

https://github.com/huggingface/pytorch-transformers/issues/1283#issuecomment-532598578

Hmm, tokenizers' constructors require a vocab_file parameter...

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@Stamenov Did you figure out how to pretrain XLNet? I'm interested in that as well.

No, I haven't. According to some recent tweet, huggingface could prioritize putting more effort into providing interfaces for self pre-training.

You can now leave --model_name_or_path to None in run_language_modeling.py to train a model from scratch.

See also https://huggingface.co/blog/how-to-train

Was this page helpful?
0 / 5 - 0 ratings