run_lm_finetuning.py is a very useful tool for finetuning many models the library provided. But it doesn't cover all the models. Currently available models are:
And not available ones:
Most important part of such a library is that it can be easily finetuned. run_lm_finetuning.py
gives us that opportunity but why say no more :)
Indeed, here are my 2 cents on that:
Do you want to give it a try? We don't have that in our short term roadmap until the end of the year.
Okay, I'm gonna try to add ctrl
, xlm
and albert
. Then I'll make pull request in order to discuss on it.
Isn't there any example of how to train transfo-xl
and xlnet
?
You have to look at both original repos
Out of curiosity, has any progress been made on a pull request for this?
+1 for this request, especially transfo-xl
:)
Is this issue addressed with https://github.com/huggingface/transformers/commit/a8e3336a850e856188350a93e67d77c07c85b8af?
a8e3336a850e856188350a93e67d77c07c85b8af makes all those models accessible from run_language_modeling.py
, but does not do anything special for models whose training has peculiarities, like transfo-xl
or xlnet
. I'm not familiar with those two so maybe someone else (@patrickvonplaten?) can chime in.
As far as I know:
Currently the lun_language_modeling.py
script is not really made to train transfo-xl
or xlnet
First as @thomwolf already said, the mems
parameter (the "history") of the models is not taken care of during training. During training the model "caches" past sequences to effectively reuse them afterwards. It's described quite well in Figure 2 in the Transfo-XL paper. This should be rather easy to add though.
Second, XLNet
samples from a permutation mask during training, which is one of the core ideas of the paper, see https://github.com/huggingface/transformers/issues/2822 or Equation 5 in the official paper This is a very special case for XLNet
and is not yet implemented in run_language_modeling.py
(shouldn't be too hard though to implement it since there is only one additional sum per training sample).
Third, Transfo-XL
uses adaptive word embeddings and adaptive softmax which also leads to some specialties when training. See also this issue #3310. This should be implemented in the model class itself though.
I'm assuming that Albert
is fine out of the box. What about T5
?
Is anybody still working on this currently?
We are currently working on it. Might still take ~2 weeks.
Any update?
I'd like to try this (#4739). I'd like to start with XLNet since that's relevant to my work right now.
I think you would just need to add a XLNet data collator to this file so that the trainer can be used with XLNet :-) So I would add a new XLNetLanguageModelingCollator here: https://github.com/huggingface/transformers/blob/1b5820a56540a2096daeb43a0cd8247c8c94a719/src/transformers/data/data_collator.py#L76
Thanks so much! I'll look into it :)
Any progress on XLNet? @shngt
Any updates regarding XLNet ?
@patrickvonplaten I added the data collator as you suggested - please review :) You also mentioned earlier "the mems
parameter (the "history") of the models is not taken care of during training" - has that been taken care of, or does the logic need to be implemented separately?
I was looking into the other models requested:
mulitlingual_language_modeling
example script if others would like an end-to-end example of how this would be properly done.DataCollatorForLanguageModeling
class in my opinionIn summary, I think all that needs to be done right now for XLM and TransformerXL is to add a line or two in the starting docstring mentioning which type of LM to use. For Albert, I think we need to incorporate the masking scheme as a separate procedure in DataCollatorForLanguageModeling
, but am not sure if this is the cleanest way to do it. Let me know what you would like.
@patrickvonplaten
I agree very much with what you say. For XLM
and TransformerXL
the script should work pretty much out of the box, so we would just have to adapt some comments in examples/language-modeling/run_language_modeling.py
.
For Albert, it would be nice to create a new SpanMaskLanguageModeling
Data collator.
Great, I'll get started then. I'll try to finish it over the weekend :)
Awesome, no rush though ;-)
Maybe a stupid question, but where should I find run_lm_finetuning.py
? The docs point to a dead link, as the file doesn't exist in the master branch.
it's renamed and moved there.
Thanks for the notice @KristenMoore - The documentation was quite old. The new documentation should have fixed it :-)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Indeed, here are my 2 cents on that:
Do you want to give it a try? We don't have that in our short term roadmap until the end of the year.