Transformers: RAG - how to precompute custom document index?

Created on 29 Sep 2020  Â·  13Comments  Â·  Source: huggingface/transformers

Was wondering if there was any code snippet / blog post showing how one could load their own documents and index them, so they can be used by the RAG retriever.

Cheers!

Most helpful comment

Any progress on this @lhoestq @patrickvonplaten ? Awesome work guys :)

You can expect a PR by tomorrow

All 13 comments

Second this.

https://github.com/deepset-ai/haystack may be useful to you. They leverage huggingface and have an DPR implementation with an end-to-end example. Will not be surprised to see RAG implemented soon.

@Weilin37 Thanks. I'm also looking at the Faiss docs now (https://github.com/facebookresearch/faiss/wiki/Faiss-indexes).

@lhoestq can maybe help here as well

Yep I'm thinking of adding a script in examples/rag that shows how to create an indexed dataset for RAG.
I'll let you know how it goes

@lhoestq Can you please let me know on how we can index the custom datasets? Appreciate your help on this

@lhoestq I have a bunch of documents to perform Q&A and currently, in the config it says,
dataset (str, optional, defaults to "wiki_dpr") – A dataset identifier of the indexed dataset on HuggingFace AWS bucket (list all available datasets and ids using datasets.list_datasets()). So how can we create an indexed file and input that to the pretrained model for evaluation.

@lhoestq I have a bunch of documents to perform Q&A and currently, in the config it says,
dataset (str, optional, defaults to "wiki_dpr") – A dataset identifier of the indexed dataset on HuggingFace AWS bucket (list all available datasets and ids using datasets.list_datasets()). So how can we create an indexed file and input that to the pretrained model for evaluation.

Yes right... We'll have to edit the RagRetriever and the HfIndex to accept custom ones.
If you wanto to give it a try in the meantime, feel free to do so :)

Any progress on this @lhoestq @patrickvonplaten ? Awesome work guys :)

@tholor @Timoeller Do you reckon you guys could integrate this work into haystack?

@aced125 Yep, we will integrate RAG in Haystack soon (https://github.com/deepset-ai/haystack/issues/443).

Any progress on this @lhoestq @patrickvonplaten ? Awesome work guys :)

You can expect a PR by tomorrow

Awesome thanks everyone @tholor @lhoestq @patrickvonplaten !!!!

Thank you @lhoestq . Really appreciate for getting back quickly on this issue.

Was this page helpful?
0 / 5 - 0 ratings