transformers version: 3.3.0@sshleifer
RAG model is not on the list, but this is summarization related
-->
Model I am using RAG
The problem arises when using:
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True)
model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
The tasks I am working on is:
Model coudln't load, didn't perform any task
## To reproduce
Steps to reproduce the behavior:
1. run the code
``` python from transformers import RagTokenizer, RagRetriever, RagSequenceForGeneration
import torch
tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True)
# initialize with RagRetriever to do everything in one forward call
model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
Raise a NameError, load_dataset is not defined.
NameError Traceback (most recent call last)
<ipython-input-6-752205d4a1c8> in <module>
3
4 tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
----> 5 retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True)
6 # initialize with RagRetriever to do everything in one forward call
7 model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
/mnt/disks/nlp/env_nlp_main/lib/python3.7/site-packages/transformers/retrieval_rag.py in from_pretrained(cls, retriever_name_or_path, **kwargs)
307 generator_tokenizer = rag_tokenizer.generator
308 return cls(
--> 309 config, question_encoder_tokenizer=question_encoder_tokenizer, generator_tokenizer=generator_tokenizer
310 )
311
/mnt/disks/nlp/env_nlp_main/lib/python3.7/site-packages/transformers/retrieval_rag.py in __init__(self, config, question_encoder_tokenizer, generator_tokenizer)
287 config.retrieval_vector_size,
288 config.index_path,
--> 289 config.use_dummy_dataset,
290 )
291 )
/mnt/disks/nlp/env_nlp_main/lib/python3.7/site-packages/transformers/retrieval_rag.py in __init__(self, dataset_name, dataset_split, index_name, vector_size, index_path, use_dummy_dataset)
218
219 logger.info("Loading passages from {}".format(self.dataset_name))
--> 220 self.dataset = load_dataset(
221 self.dataset_name, with_index=False, split=self.dataset_split, dummy=self.use_dummy_dataset
222 )
NameError: name 'load_dataset' is not defined
Try with pip install transformers datasets faiss-cpu psutil (or see the requirements.txt file).
Had the same issue and it fixed it for me.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
Try with
pip install transformers datasets faiss-cpu psutil(or see the requirements.txt file).Had the same issue and it fixed it for me.