It would be nice to have a fine-tuning script for DPR.
Hey @shamanez - I don't think there is a fine-tuning script for DPR at the moment, but we always welcome contributions as such! @lhoestq might have more information.
I just have one more question about the DPR model used in RAG (specially the Doc-Encoder network).
Is the doc-encoder pretrained with a 21-million Wikipedia dump as mentioned in the DPR paper?
The DPR encoders (context encoder and question encoder) in RAG are pretrained BERT that were fine-tuned for retrieval on the question/answers pairs of Natural Questions (and other datasets depending on the setup) using retrieved passages from the 21 million passages Wikipedia dump. In the library, the DPR encoders are the one trained on NQ.
Thanks a lot. So can I use these encoders to ginetune the rag on customized
document settings given the fact that question encoder also get fine-tuned.
On Thu, Oct 8, 2020, 21:49 Quentin Lhoest notifications@github.com wrote:
The DPR encoders (context encoder and question encoder) in RAG are
pretrained BERT that were fine-tuned for retrieval on the question/answers
pairs of Natural Questions (and other datasets depending on the setup)
using retrieved passages from the 21 million passages Wikipedia dump. In
the library, the DPR encoders are the one trained on NQ.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705426902,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGXZKERI55XOZQ5TWATSJV4JHANCNFSM4SHEODIA
.
Yes you can fine-tune it on your documents. During RAG fine-tuning both the generator and the question encoder are updated.
Thanks :). So finally what is the best way to arrange customized set of
documents?
On Thu, Oct 8, 2020, 22:23 Quentin Lhoest notifications@github.com wrote:
Yes you can fine-tune it on your documents. During RAG fine-tuning both
the generator and the question encoder are updated.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705446147,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGSPUR4YUA3F2GTSIJTSJWAIXANCNFSM4SHEODIA
.
You'll find all the info at https://github.com/huggingface/transformers/tree/master/examples/rag#finetuning :)
Amazing. Thanks a lot
On Thu, Oct 8, 2020, 22:27 Quentin Lhoest notifications@github.com wrote:
You'll find all the info at
https://github.com/huggingface/transformers/tree/master/examples/rag#finetuning
:)—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705448525,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGSSRJFHM7YDUQSUFKDSJWAYBANCNFSM4SHEODIA
.
I kind of checked the finetuning script. It shows how to train for custom
datasets. What I don't understand is how should I use my own set of
documents other that wikipedia's dumps.
On Thu, Oct 8, 2020, 22:27 Quentin Lhoest notifications@github.com wrote:
You'll find all the info at
https://github.com/huggingface/transformers/tree/master/examples/rag#finetuning
:)—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705448525,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGSSRJFHM7YDUQSUFKDSJWAYBANCNFSM4SHEODIA
.
Oh I see. In that case you have to build the RAG knowledge source. We haven't released a code example to do so yet but we're discussing it in #7462
Ok will follow it.
On Thu, Oct 8, 2020, 22:36 Quentin Lhoest notifications@github.com wrote:
Oh I see. In that case you have to build the RAG knowledge source. We
haven't released a code example to do so yet but we're discussing it in7462 https://github.com/huggingface/transformers/issues/7462
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/huggingface/transformers/issues/7631#issuecomment-705453565,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AEA4FGTDMISMY7F2TSNGA5LSJWB3VANCNFSM4SHEODIA
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
The DPR encoders (context encoder and question encoder) in RAG are pretrained BERT that were fine-tuned for retrieval on the question/answers pairs of Natural Questions (and other datasets depending on the setup) using retrieved passages from the 21 million passages Wikipedia dump. In the library, the DPR encoders are the one trained on NQ.