Transformers: Add optimal model size and stopping time feature

Created on 8 Jun 2020 · 8Comments · Source: huggingface/transformers

🚀 Feature request

The calculator blog post presented an automated way to find scaling laws with model size and compute budget on language modeling tasks. Adding it to the library would help save on training costs by picking an optimal model size and training time.

Motivation

Estimating how big of a model to use and how long to train for is more of an art than a science. An automated tool to perform that task would allow researchers and practitioners to concentrate on the the high-level parts of their projects as opposed to parameter tweaking.

Your contribution

I can submit a PR with my existing work, probably integrating it within Trainer and/or knocknock.

Discussion LM (Pretraining) High-Level feature wontfix

Source

TevenLeScao

👍21 ❤16 🚀1

Most helpful comment

Great stuff, thank you! The energy estimates look 1000 worse than reality though, V100 running for 12 h should not consume 5432 kWh I think, else we'd be all dead. 5.4 kWh looks more reasonable.

Ah yes - I remembered having a doubt on that, I checked again the library we used to estimate those and there might have been a unit conversion error, I'll fix that ASAP tomorrow!

Edit: it's fixed, thank you @lopuhin !

TevenLeScao on 9 Jun 2020

🎉4 👍2

All 8 comments

Great stuff, thank you! The energy estimates look 1000 worse than reality though, V100 running for 12 h should not consume 5432 kWh I think, else we'd be all dead. 5.4 kWh looks more reasonable.

Screenshot 2020-06-09 at 00 26 45

lopuhin on 8 Jun 2020

👍1

Great stuff, thank you! The energy estimates look 1000 worse than reality though, V100 running for 12 h should not consume 5432 kWh I think, else we'd be all dead. 5.4 kWh looks more reasonable.

Ah yes - I remembered having a doubt on that, I checked again the library we used to estimate those and there might have been a unit conversion error, I'll fix that ASAP tomorrow!

Edit: it's fixed, thank you @lopuhin !

TevenLeScao on 9 Jun 2020

🎉4 👍2

This is already looking very promising! Good stuff.

When clicking the "initialize in transformers" button, the code block should probably not center-align the code, but left align instead. That makes the code a lot more readable.

BramVanroy on 9 Jun 2020

This is already looking very promising! Good stuff.

When clicking the "initialize in transformers" button, the code block should probably not center-align the code, but left align instead. That makes the code a lot more readable.

Yeah that was a bit of an aesthetic choice to not break the flow of the web page, it definitely wouldn't be like this in a tool rather than a demo!

TevenLeScao on 10 Jun 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 9 Aug 2020

unstale, what's the status on this @TevenLeScao? Should we close?

julien-c on 17 Aug 2020

@julien-c we had originally decided not to go forward with this, but I started working on it amongst the discussions about the scale of GPT-3. I didn't get to finish it before leaving for holidays two weeks ago, but the PR will be ready this week.

TevenLeScao on 24 Aug 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 24 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings