Transformers: Add upcoming GPT-3 model

Created on 29 May 2020 · 29Comments · Source: huggingface/transformers

🌟 New model addition

Model description

The GPT-3 paper just landed on ArXiv: https://arxiv.org/abs/2005.14165.

Would be great to integrate it into Transformers, whenever models are available.

Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Open source status

[x] GitHub repository is available: here
[ ] the model implementation is available: (give details)
[ ] the model weights are available: (give details)
[ ] who are the authors: (mention them, if possible by @gh-username)

New model

Source

stefan-it

👍86 ❤18 🎉16 🚀13 👀12 😄5 😕1

Most helpful comment

But who put the "Open" in OpenAI then 🤔

stefan-it on 20 Jun 2020

👍36 👀9 😕6

All 29 comments

My god, the paper hasn't even been up for a day...

Said being, +1

LouisCastricato on 29 May 2020

😄27 🚀1

So who can run 175B parameters and what do I have to do for a favor?

moinnadeem on 29 May 2020

😄20

whose pp do i have to diddle to run this model i can't imagine any scenario where i could run this myself lol

spronkoid on 29 May 2020

👎7 👀3 😄1 👍1

The full model will be at least 350 GB (16-bit parameters). You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Not to mention the egress costs of making a model that size available.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. 🙂

AdamDanielKing on 29 May 2020

😄16 🚀12 👍2

Is there any Colab to test at least GPT-3 XL ?

GraphGrailAi on 29 May 2020

Is there any Colab to test at least GPT-3 XL ?

They haven't released any code or pretrained models yet. See the issue on the official repo: https://github.com/openai/gpt-3/issues/1

AdamDanielKing on 29 May 2020

👀2

Note that the released models may be FP16, which may require forcing FP16 for use/finetuning (and therefore hardware-limited), or casting up to FP32.

minimaxir on 29 May 2020

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. slightly_smiling_face

One of the main benefits of the smaller gpt-3 models compared to their gpt-2 counterparts could be the increased context length of 2048 tokens.

andifunke on 30 May 2020

Yeah, personally, I wouldn't be able to use the models that won't fit in a Tesla P100

enzoampil on 30 May 2020

The GPT-3 repo is now archived (read-only) so perhaps OpenAI isn't planning on releasing anything this time around.

AdamDanielKing on 6 Jun 2020

😕8 👍7

The GPT-3 repo is now archived (read-only) so perhaps OpenAI isn't planning on releasing anything this time around.

That is a crying shame, because my system could do-er... :(

ljlueloff on 11 Jun 2020

Hopefully they have a better excuse than last time.

flarn2006 on 11 Jun 2020

Hopefully they have a better excuse than last time.

@flarn2006 You mean the....ooohhhh we created something scary and have soggy diapers excuse with GPT-3?

ljlueloff on 11 Jun 2020

😄1

@flarn2006 If they don't make excuses or drag their feet, and I finish my system build in a relatively congruent time frame, hopefully I can help...

ljlueloff on 11 Jun 2020

A little update: OpenAI's now running their own API with GPT-3 on it. https://beta.openai.com
You can apply for access, but seems like they're aiming mostly for big companies, not researchers. Sad, way too sad.

fen0s on 20 Jun 2020

😕9

But who put the "Open" in OpenAI then 🤔

stefan-it on 20 Jun 2020

👍36 👀9 😕6

I guess we will need to "fundraise" enough GPU-compute to run the GPT3 model. :smile:

yassineAlouini on 19 Jul 2020

😄2

It should be possible to run lower-models on regular GPUs, like 1b model. But we don't have the model itself, and seems that OpenAI is against releasing it and would rather commercialize it :(

fen0s on 20 Jul 2020

👍2 😕1

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

sagarreddypatil on 27 Jul 2020

👀3

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

Very interesting as an idea. @StealthySemicolon do you have reference to other similar work done in the past?

yassineAlouini on 1 Aug 2020

I wonder if you could hardcode the 175B model into an electronic chip(like an ASIC but more specific)

Very interesting as an idea. @StealthySemicolon do you have reference to other similar work done in the past?

No, just a hunch. Even if I did know how to do this, it's not like OpenAI would publicly release the model weights...

sagarreddypatil on 1 Aug 2020

Guys when is this gonna be integrated!?

shashankMadan-designEsthetics on 16 Aug 2020

When OpenAI decides to release GPT-3 open-sourcely, but this won't happen it seems, they just want to sell access to big corporations.

fen0s on 17 Aug 2020

😕3 👎1

https://bdtechtalks.com/2020/08/17/openai-gpt-3-commercial-ai/amp/

Here it goes...

shashankMadan-designEsthetics on 18 Aug 2020

https://arxiv.org/abs/2009.07118
https://github.com/timoschick/pet

bhack on 21 Sep 2020

👎1

Hopefully they have a better excuse than last time.

Because Microsoft gave us money.

OverlordQ on 22 Sep 2020

😕1

GPT-3 is not coming out anytime soon :(

Clickative on 23 Sep 2020

this thread signifies capitalism's pros and cons at the same time...😅

shashankMadan-designEsthetics on 24 Sep 2020

👀5

The full model will be at least 350 GB (16-bit parameters). You'd need to partition it across more than (350 GB) / (16 GB) ~ 22 GPUs just to run it! Not to mention the egress costs of making a model that size available.

Of course, the paper shows 8 different-sized models, 4 of which are smaller than GPT-2, so some of those could be practical. 🙂

@AdamDanielKing is there a way to estimate the size of the GPT-3 XL model?