Transformers: Is there a fine-tuning script for DPR?

Created on 7 Oct 2020 · 12Comments · Source: huggingface/transformers

It would be nice to have a fine-tuning script for DPR.

wontfix

Source

shamanez

Most helpful comment

The DPR encoders (context encoder and question encoder) in RAG are pretrained BERT that were fine-tuned for retrieval on the question/answers pairs of Natural Questions (and other datasets depending on the setup) using retrieved passages from the 21 million passages Wikipedia dump. In the library, the DPR encoders are the one trained on NQ.

lhoestq on 8 Oct 2020

👍2

All 12 comments

Hey @shamanez - I don't think there is a fine-tuning script for DPR at the moment, but we always welcome contributions as such! @lhoestq might have more information.

patrickvonplaten on 7 Oct 2020

I just have one more question about the DPR model used in RAG (specially the Doc-Encoder network).

Is the doc-encoder pretrained with a 21-million Wikipedia dump as mentioned in the DPR paper?

shamanez on 7 Oct 2020

lhoestq on 8 Oct 2020

👍2

Thanks a lot. So can I use these encoders to ginetune the rag on customized
document settings given the fact that question encoder also get fine-tuned.

On Thu, Oct 8, 2020, 21:49 Quentin Lhoest notifications@github.com wrote:

The DPR encoders (context encoder and question encoder) in RAG are
pretrained BERT that were fine-tuned for retrieval on the question/answers
pairs of Natural Questions (and other datasets depending on the setup)
using retrieved passages from the 21 million passages Wikipedia dump. In
the library, the DPR encoders are the one trained on NQ.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705426902,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGXZKERI55XOZQ5TWATSJV4JHANCNFSM4SHEODIA
.

shamanez on 8 Oct 2020

Yes you can fine-tune it on your documents. During RAG fine-tuning both the generator and the question encoder are updated.

lhoestq on 8 Oct 2020

Thanks :). So finally what is the best way to arrange customized set of
documents?

On Thu, Oct 8, 2020, 22:23 Quentin Lhoest notifications@github.com wrote:

Yes you can fine-tune it on your documents. During RAG fine-tuning both
the generator and the question encoder are updated.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705446147,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGSPUR4YUA3F2GTSIJTSJWAIXANCNFSM4SHEODIA
.

shamanez on 8 Oct 2020

You'll find all the info at https://github.com/huggingface/transformers/tree/master/examples/rag#finetuning :)

lhoestq on 8 Oct 2020

Amazing. Thanks a lot

On Thu, Oct 8, 2020, 22:27 Quentin Lhoest notifications@github.com wrote:

You'll find all the info at
https://github.com/huggingface/transformers/tree/master/examples/rag#finetuning
:)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705448525,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGSSRJFHM7YDUQSUFKDSJWAYBANCNFSM4SHEODIA
.

shamanez on 8 Oct 2020

I kind of checked the finetuning script. It shows how to train for custom
datasets. What I don't understand is how should I use my own set of
documents other that wikipedia's dumps.

On Thu, Oct 8, 2020, 22:27 Quentin Lhoest notifications@github.com wrote:

You'll find all the info at
https://github.com/huggingface/transformers/tree/master/examples/rag#finetuning
:)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705448525,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGSSRJFHM7YDUQSUFKDSJWAYBANCNFSM4SHEODIA
.

shamanez on 8 Oct 2020

Oh I see. In that case you have to build the RAG knowledge source. We haven't released a code example to do so yet but we're discussing it in #7462

lhoestq on 8 Oct 2020

👍1

Ok will follow it.

On Thu, Oct 8, 2020, 22:36 Quentin Lhoest notifications@github.com wrote:

Oh I see. In that case you have to build the RAG knowledge source. We
haven't released a code example to do so yet but we're discussing it in

7462 https://github.com/huggingface/transformers/issues/7462

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705453565,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGTDMISMY7F2TSNGA5LSJWB3VANCNFSM4SHEODIA
.

shamanez on 8 Oct 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.