Transformers: Expand run_lm_finetuning.py to all models

Created on 30 Nov 2019 · 26Comments · Source: huggingface/transformers

🚀 Feature

run_lm_finetuning.py is a very useful tool for finetuning many models the library provided. But it doesn't cover all the models. Currently available models are:

gpt2
openai-gpt
bert
roberta
distilbert
camembert

And not available ones:

ctrl
xlm
xlnet
transfo-xl
albert

Motivation

Most important part of such a library is that it can be easily finetuned. run_lm_finetuning.py gives us that opportunity but why say no more :)

wontfix

Source

iedmrc

👍10

Most helpful comment

Indeed, here are my 2 cents on that:

ctrl: easy to add (should work out of the box)
xlm: should also work out of the box (but need to check if the model is an mlm or a clm model to finetune)
albert: should work out of the box
transfo-xl: need to take care of history => a little more work
xlnet: need to take care of history + permutations => quite more work.

Do you want to give it a try? We don't have that in our short term roadmap until the end of the year.

thomwolf on 5 Dec 2019

👍3

All 26 comments

Indeed, here are my 2 cents on that:

ctrl: easy to add (should work out of the box)
xlm: should also work out of the box (but need to check if the model is an mlm or a clm model to finetune)
albert: should work out of the box
transfo-xl: need to take care of history => a little more work
xlnet: need to take care of history + permutations => quite more work.

Do you want to give it a try? We don't have that in our short term roadmap until the end of the year.

thomwolf on 5 Dec 2019

👍3

Okay, I'm gonna try to add ctrl, xlm and albert. Then I'll make pull request in order to discuss on it.

Isn't there any example of how to train transfo-xl and xlnet?

iedmrc on 5 Dec 2019

You have to look at both original repos

thomwolf on 6 Dec 2019

Out of curiosity, has any progress been made on a pull request for this?

CMobley7 on 29 Jan 2020

+1 for this request, especially transfo-xl :)

GabrielBianconi on 10 Mar 2020

Is this issue addressed with https://github.com/huggingface/transformers/commit/a8e3336a850e856188350a93e67d77c07c85b8af?

CMobley7 on 24 Mar 2020

a8e3336a850e856188350a93e67d77c07c85b8af makes all those models accessible from run_language_modeling.py, but does not do anything special for models whose training has peculiarities, like transfo-xl or xlnet. I'm not familiar with those two so maybe someone else (@patrickvonplaten?) can chime in.

julien-c on 24 Mar 2020

As far as I know:

Currently the lun_language_modeling.py script is not really made to train transfo-xl or xlnet

First as @thomwolf already said, the mems parameter (the "history") of the models is not taken care of during training. During training the model "caches" past sequences to effectively reuse them afterwards. It's described quite well in Figure 2 in the Transfo-XL paper. This should be rather easy to add though.

Second, XLNet samples from a permutation mask during training, which is one of the core ideas of the paper, see https://github.com/huggingface/transformers/issues/2822 or Equation 5 in the official paper This is a very special case for XLNet and is not yet implemented in run_language_modeling.py (shouldn't be too hard though to implement it since there is only one additional sum per training sample).

Third, Transfo-XL uses adaptive word embeddings and adaptive softmax which also leads to some specialties when training. See also this issue #3310. This should be implemented in the model class itself though.

patrickvonplaten on 25 Mar 2020

I'm assuming that Albert is fine out of the box. What about T5?

CMobley7 on 25 Mar 2020

👍2

Is anybody still working on this currently?

urlocal12 on 14 Apr 2020

We are currently working on it. Might still take ~2 weeks.

patrickvonplaten on 14 Apr 2020

👍2

Any update?

marrrcin on 3 May 2020

I'd like to try this (#4739). I'd like to start with XLNet since that's relevant to my work right now.

shngt on 3 Jun 2020

👍1

I think you would just need to add a XLNet data collator to this file so that the trainer can be used with XLNet :-) So I would add a new XLNetLanguageModelingCollator here: https://github.com/huggingface/transformers/blob/1b5820a56540a2096daeb43a0cd8247c8c94a719/src/transformers/data/data_collator.py#L76

patrickvonplaten on 3 Jun 2020

Thanks so much! I'll look into it :)

shngt on 3 Jun 2020

Any progress on XLNet? @shngt

gkebe on 9 Jun 2020

Any updates regarding XLNet ?

krannnn on 14 Jun 2020

@patrickvonplaten I added the data collator as you suggested - please review :) You also mentioned earlier "the mems parameter (the "history") of the models is not taken care of during training" - has that been taken care of, or does the logic need to be implemented separately?

shngt on 29 Jun 2020

👍1

I was looking into the other models requested:

CTRL -> CLM, works out of the box, already added comments
XLM -> can be trained with three different objectives - CLM, MLM and Translation LM, which is a supervised multilingual extension of MLM. The example script does not seem to require any changes (except for maybe a warning somewhere to use the right flag with the right checkpoint?). TLM does require a lot of data-specific preprocessing, but it seems relevant only in light of the multilingual setting. I feel it would be better to incorporate those in a separate mulitlingual_language_modeling example script if others would like an end-to-end example of how this would be properly done.
Albert -> Instead of the random masking in BERT, the authors use a span-based masking system first seen in SpanBERT (section 3.1 of https://arxiv.org/pdf/1907.10529.pdf). It seems to be a mix of what I implemented in XLNet and the masking procedure in BERT, so should be kept in another function in the main DataCollatorForLanguageModeling class in my opinion
TransformerXL -> seems to be CLM with reuse of previous states. I think this functionality has been added, so no additional work should be needed

In summary, I think all that needs to be done right now for XLM and TransformerXL is to add a line or two in the starting docstring mentioning which type of LM to use. For Albert, I think we need to incorporate the masking scheme as a separate procedure in DataCollatorForLanguageModeling, but am not sure if this is the cleanest way to do it. Let me know what you would like.

@patrickvonplaten

shngt on 11 Jul 2020

I agree very much with what you say. For XLM and TransformerXL the script should work pretty much out of the box, so we would just have to adapt some comments in examples/language-modeling/run_language_modeling.py.

For Albert, it would be nice to create a new SpanMaskLanguageModeling Data collator.

patrickvonplaten on 13 Jul 2020

👍1

Great, I'll get started then. I'll try to finish it over the weekend :)

shngt on 16 Jul 2020

👍1

Awesome, no rush though ;-)

patrickvonplaten on 16 Jul 2020

Maybe a stupid question, but where should I find run_lm_finetuning.py? The docs point to a dead link, as the file doesn't exist in the master branch.

KristenMoore on 17 Aug 2020

it's renamed and moved there.

iedmrc on 17 Aug 2020

👍2

Thanks for the notice @KristenMoore - The documentation was quite old. The new documentation should have fixed it :-)

patrickvonplaten on 20 Aug 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.