Do the models support training from scratch, together with original (paper) parameters?
You can just instanciate the models without the .from_pretraining() like so:
config = BertConfig(**optionally your favorite parameters**)
model = BertForPretraining(config)
I added a flag to run_lm_finetuning.py that gets checked in the main(). Maybe this snipped helps (note, I am only using this with Bert w/o next sentence prediction).
# check if instead initialize freshly
if args.do_fresh_init:
config = config_class()
tokenizer = tokenizer_class()
if args.block_size <= 0:
args.block_size = tokenizer.max_len # Our input block size will be the max possible for the model
args.block_size = min(args.block_size, tokenizer.max_len)
model = model_class(config=config)
else:
config = config_class.from_pretrained(args.config_name if args.config_name else args.model_name_or_path)
tokenizer = tokenizer_class.from_pretrained(args.tokenizer_name if args.tokenizer_name else args.model_name_or_path)
if args.block_size <= 0:
args.block_size = tokenizer.max_len # Our input block size will be the max possible for the model
args.block_size = min(args.block_size, tokenizer.max_len)
model = model_class.from_pretrained(args.model_name_or_path, from_tf=bool('.ckpt' in args.model_name_or_path), config=config)
model.to(args.device)
Hi,
thanks for the quick response.
I am more interested in the XLNet and TransformerXL models. Would they have the same interface?
I don’t know firsthand, but suppose so and it is fundamentally an easy problem to reinitialize weights randomly before any kind of training in pytorch :)
Good luck,
Zacharias
Am 18. Sep. 2019, 1:56 PM +0200 schrieb Stamenov notifications@github.com:
Hi,
thanks for the quick response.
I am more interested in the XLNet and TransformerXL models. Would they have the same interface?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
I think XLNet requires a very specific training procedure, see #943 :+1:
"For XLNet, the implementation in this repo is missing some key functionality (the permutation generation function and an analogue of the dataset record generator) which you'd have to implement yourself."
https://github.com/huggingface/pytorch-transformers/issues/1283#issuecomment-532598578
Hmm, tokenizers' constructors require a vocab_file parameter...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@Stamenov Did you figure out how to pretrain XLNet? I'm interested in that as well.
No, I haven't. According to some recent tweet, huggingface could prioritize putting more effort into providing interfaces for self pre-training.
You can now leave --model_name_or_path to None in run_language_modeling.py to train a model from scratch.
Most helpful comment
You can now leave
--model_name_or_pathto None inrun_language_modeling.pyto train a model from scratch.See also https://huggingface.co/blog/how-to-train