Transformers: Add upcoming GPT-3 model

Created on 29 May 2020  路  29Comments  路  Source: huggingface/transformers

馃専 New model addition

Model description

The GPT-3 paper just landed on ArXiv: https://arxiv.org/abs/2005.14165.

Would be great to integrate it into Transformers, whenever models are available.

Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Open source status

  • [x] GitHub repository is available: here
  • [ ] the model implementation is available: (give details)
  • [ ] the model weights are available: (give details)
  • [ ] who are the authors: (mention them, if possible by @gh-username)
New model

Most helpful comment

But who put the "Open" in OpenAI then 馃

All 29 comments

My god, the paper hasn't even been up for a day...

Said being, +1

So who can run 175B parameters and what do I have to do for a favor?

whose pp do i have to diddle to run this model i can't imagine any scenario where i could run this myself lol

The full model will be at least 350 GB (16-bit parameters). You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Not to mention the egress costs of making a model that size available.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. 馃檪

image

Is there any Colab to test at least GPT-3 XL ?

Is there any Colab to test at least GPT-3 XL ?

They haven't released any code or pretrained models yet. See the issue on the official repo: https://github.com/openai/gpt-3/issues/1

Note that the released models may be FP16, which may require forcing FP16 for use/finetuning (and therefore hardware-limited), or casting up to FP32.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. slightly_smiling_face

One of the main benefits of the smaller gpt-3 models compared to their gpt-2 counterparts could be the increased context length of 2048 tokens.

Yeah, personally, I wouldn't be able to use the models that won't fit in a Tesla P100

The GPT-3 repo is now archived (read-only) so perhaps OpenAI isn't planning on releasing anything this time around.

The GPT-3 repo is now archived (read-only) so perhaps OpenAI isn't planning on releasing anything this time around.

That is a crying shame, because my system could do-er... :(

Hopefully they have a better excuse than last time.

Hopefully they have a better excuse than last time.

@flarn2006 You mean the....ooohhhh we created something scary and have soggy diapers excuse with GPT-3?

@flarn2006 If they don't make excuses or drag their feet, and I finish my system build in a relatively congruent time frame, hopefully I can help...

A little update: OpenAI's now running their own API with GPT-3 on it. https://beta.openai.com
You can apply for access, but seems like they're aiming mostly for big companies, not researchers. Sad, way too sad.

But who put the "Open" in OpenAI then 馃

I guess we will need to "fundraise" enough GPU-compute to run the GPT3 model. :smile:

It should be possible to run lower-models on regular GPUs, like 1b model. But we don't have the model itself, and seems that OpenAI is against releasing it and would rather commercialize it :(

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

Very interesting as an idea. @StealthySemicolon do you have reference to other similar work done in the past?

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

Very interesting as an idea. @StealthySemicolon do you have reference to other similar work done in the past?

No, just a hunch. Even if I did know how to do this, it's not like OpenAI would publicly release the model weights...

Guys when is this gonna be integrated!?

When OpenAI decides to release GPT-3 open-sourcely, but this won't happen it seems, they just want to sell access to big corporations.

Hopefully they have a better excuse than last time.

Because Microsoft gave us money.

GPT-3 is not coming out anytime soon :(

this thread signifies capitalism's pros and cons at the same time...馃槄

The full model will be at least 350 GB (16-bit parameters). You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Not to mention the egress costs of making a model that size available.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. 馃檪

image

@AdamDanielKing is there a way to estimate the size of the GPT-3 XL model?

Was this page helpful?
0 / 5 - 0 ratings