Transformers: Expand run_lm_finetuning.py to all models

Created on 30 Nov 2019  路  26Comments  路  Source: huggingface/transformers

馃殌 Feature

run_lm_finetuning.py is a very useful tool for finetuning many models the library provided. But it doesn't cover all the models. Currently available models are:

  • gpt2
  • openai-gpt
  • bert
  • roberta
  • distilbert
  • camembert

And not available ones:

  • ctrl
  • xlm
  • xlnet
  • transfo-xl
  • albert

Motivation

Most important part of such a library is that it can be easily finetuned. run_lm_finetuning.py gives us that opportunity but why say no more :)

wontfix

Most helpful comment

Indeed, here are my 2 cents on that:

  • ctrl: easy to add (should work out of the box)
  • xlm: should also work out of the box (but need to check if the model is an mlm or a clm model to finetune)
  • albert: should work out of the box
  • transfo-xl: need to take care of history => a little more work
  • xlnet: need to take care of history + permutations => quite more work.

Do you want to give it a try? We don't have that in our short term roadmap until the end of the year.

All 26 comments

Indeed, here are my 2 cents on that:

  • ctrl: easy to add (should work out of the box)
  • xlm: should also work out of the box (but need to check if the model is an mlm or a clm model to finetune)
  • albert: should work out of the box
  • transfo-xl: need to take care of history => a little more work
  • xlnet: need to take care of history + permutations => quite more work.

Do you want to give it a try? We don't have that in our short term roadmap until the end of the year.

Okay, I'm gonna try to add ctrl, xlm and albert. Then I'll make pull request in order to discuss on it.

Isn't there any example of how to train transfo-xl and xlnet?

You have to look at both original repos

Out of curiosity, has any progress been made on a pull request for this?

+1 for this request, especially transfo-xl :)

a8e3336a850e856188350a93e67d77c07c85b8af makes all those models accessible from run_language_modeling.py, but does not do anything special for models whose training has peculiarities, like transfo-xl or xlnet. I'm not familiar with those two so maybe someone else (@patrickvonplaten?) can chime in.

As far as I know:

Currently the lun_language_modeling.py script is not really made to train transfo-xl or xlnet

First as @thomwolf already said, the mems parameter (the "history") of the models is not taken care of during training. During training the model "caches" past sequences to effectively reuse them afterwards. It's described quite well in Figure 2 in the Transfo-XL paper. This should be rather easy to add though.

Second, XLNet samples from a permutation mask during training, which is one of the core ideas of the paper, see https://github.com/huggingface/transformers/issues/2822 or Equation 5 in the official paper This is a very special case for XLNet and is not yet implemented in run_language_modeling.py (shouldn't be too hard though to implement it since there is only one additional sum per training sample).

Third, Transfo-XL uses adaptive word embeddings and adaptive softmax which also leads to some specialties when training. See also this issue #3310. This should be implemented in the model class itself though.

I'm assuming that Albert is fine out of the box. What about T5?

Is anybody still working on this currently?

We are currently working on it. Might still take ~2 weeks.

Any update?

I'd like to try this (#4739). I'd like to start with XLNet since that's relevant to my work right now.

I think you would just need to add a XLNet data collator to this file so that the trainer can be used with XLNet :-) So I would add a new XLNetLanguageModelingCollator here: https://github.com/huggingface/transformers/blob/1b5820a56540a2096daeb43a0cd8247c8c94a719/src/transformers/data/data_collator.py#L76

Thanks so much! I'll look into it :)

Any progress on XLNet? @shngt

Any updates regarding XLNet ?

@patrickvonplaten I added the data collator as you suggested - please review :) You also mentioned earlier "the mems parameter (the "history") of the models is not taken care of during training" - has that been taken care of, or does the logic need to be implemented separately?

I was looking into the other models requested:

  • CTRL -> CLM, works out of the box, already added comments
  • XLM -> can be trained with three different objectives - CLM, MLM and Translation LM, which is a supervised multilingual extension of MLM. The example script does not seem to require any changes (except for maybe a warning somewhere to use the right flag with the right checkpoint?). TLM does require a lot of data-specific preprocessing, but it seems relevant only in light of the multilingual setting. I feel it would be better to incorporate those in a separate mulitlingual_language_modeling example script if others would like an end-to-end example of how this would be properly done.
  • Albert -> Instead of the random masking in BERT, the authors use a span-based masking system first seen in SpanBERT (section 3.1 of https://arxiv.org/pdf/1907.10529.pdf). It seems to be a mix of what I implemented in XLNet and the masking procedure in BERT, so should be kept in another function in the main DataCollatorForLanguageModeling class in my opinion
  • TransformerXL -> seems to be CLM with reuse of previous states. I think this functionality has been added, so no additional work should be needed

In summary, I think all that needs to be done right now for XLM and TransformerXL is to add a line or two in the starting docstring mentioning which type of LM to use. For Albert, I think we need to incorporate the masking scheme as a separate procedure in DataCollatorForLanguageModeling, but am not sure if this is the cleanest way to do it. Let me know what you would like.

@patrickvonplaten

I agree very much with what you say. For XLM and TransformerXL the script should work pretty much out of the box, so we would just have to adapt some comments in examples/language-modeling/run_language_modeling.py.

For Albert, it would be nice to create a new SpanMaskLanguageModeling Data collator.

Great, I'll get started then. I'll try to finish it over the weekend :)

Awesome, no rush though ;-)

Maybe a stupid question, but where should I find run_lm_finetuning.py? The docs point to a dead link, as the file doesn't exist in the master branch.

it's renamed and moved there.

Thanks for the notice @KristenMoore - The documentation was quite old. The new documentation should have fixed it :-)

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

0x01h picture 0x01h  路  3Comments

adigoryl picture adigoryl  路  3Comments

chuanmingliu picture chuanmingliu  路  3Comments

zhezhaoa picture zhezhaoa  路  3Comments

iedmrc picture iedmrc  路  3Comments