Transformers: how we can replace/swap the Wikipedia data with our custom data for knowledge retrieval in the RAG model and the format of the retrieval data.

Created on 12 Oct 2020  ยท  4Comments  ยท  Source: huggingface/transformers

โ“ Questions & Help

Details


A link to original question on the forum/Stack Overflow:

wontfix

Most helpful comment

Yes indeed. I'll create the PR later today to allow users to use their own data. I'll also add code examples

All 4 comments

I am also playing with RAG model and I am trying to understand how to replace the Wikipedia data with my custom data. From transformes/model-cards/facebook/rag-sequence-nq I see that the train dataset is wiki_dpr. So I loaded it with the following
from datasets import load_dataset dataset = load_dataset("wiki_dpr")
The dataset is loaded with arrow into RAM(it's prety big, 75 GB). I was wandering, the custom dataset must have the same format as wiki_dpr ? If you can help with a tutorial on how to replace wiki dataset with a custom one, it will be very helpful. Thank you :D

I think @lhoestq is working on this right now?

Yes indeed. I'll create the PR later today to allow users to use their own data. I'll also add code examples

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chuanmingliu picture chuanmingliu  ยท  3Comments

hsajjad picture hsajjad  ยท  3Comments

guanlongtianzi picture guanlongtianzi  ยท  3Comments

iedmrc picture iedmrc  ยท  3Comments

rsanjaykamath picture rsanjaykamath  ยท  3Comments