Transformers: Is training from scratch possible now?

Created on 18 Sep 2019 · 9Comments · Source: huggingface/transformers

Do the models support training from scratch, together with original (paper) parameters?

wontfix

Source

Stamenov

Most helpful comment

You can now leave --model_name_or_path to None in run_language_modeling.py to train a model from scratch.

julien-c on 14 Feb 2020

🎉2 👍2

All 9 comments

You can just instanciate the models without the .from_pretraining() like so:

config = BertConfig(**optionally your favorite parameters**)
model = BertForPretraining(config)

I added a flag to run_lm_finetuning.py that gets checked in the main(). Maybe this snipped helps (note, I am only using this with Bert w/o next sentence prediction).

# check if instead initialize freshly
if args.do_fresh_init:
    config = config_class()
    tokenizer = tokenizer_class()
    if args.block_size <= 0:
        args.block_size = tokenizer.max_len  # Our input block size will be the max possible for the model
    args.block_size = min(args.block_size, tokenizer.max_len)
    model = model_class(config=config)
else:
    config = config_class.from_pretrained(args.config_name if args.config_name else args.model_name_or_path)
    tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name if args.tokenizer_name else args.model_name_or_path)
    if args.block_size <= 0:
        args.block_size = tokenizer.max_len  # Our input block size will be the max possible for the model
    args.block_size = min(args.block_size, tokenizer.max_len)
    model = model_class.from_pretrained(args.model_name_or_path, from_tf=bool('.ckpt' in args.model_name_or_path), config=config)
model.to(args.device)

Zacharias030 on 18 Sep 2019

👍2

Hi,

thanks for the quick response.
I am more interested in the XLNet and TransformerXL models. Would they have the same interface?

Stamenov on 18 Sep 2019

I don’t know firsthand, but suppose so and it is fundamentally an easy problem to reinitialize weights randomly before any kind of training in pytorch :)

Good luck,
Zacharias
Am 18. Sep. 2019, 1:56 PM +0200 schrieb Stamenov notifications@github.com:

Hi,
thanks for the quick response.
I am more interested in the XLNet and TransformerXL models. Would they have the same interface?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

Zacharias030 on 18 Sep 2019

I think XLNet requires a very specific training procedure, see #943 :+1:

"For XLNet, the implementation in this repo is missing some key functionality (the permutation generation function and an analogue of the dataset record generator) which you'd have to implement yourself."

gooofy on 21 Sep 2019

https://github.com/huggingface/pytorch-transformers/issues/1283#issuecomment-532598578

Hmm, tokenizers' constructors require a vocab_file parameter...

p-stefanov on 22 Sep 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.