Transformers: [RAG] RagSequenceForGeneration should not load "facebook/rag-token-nq" and RagTokenForGeneration also should not load "facebook/rag-sequence-nq"

Created on 15 Oct 2020  路  7Comments  路  Source: huggingface/transformers

Environment info

  • transformers version: 3.3.1
  • Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.9
  • PyTorch version (GPU?): 1.6.0+cu101 (False)
  • Tensorflow version (GPU?): 2.3.0 (False)
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help


@patrickvonplaten

Information

Model I am using (Bert, XLNet ...): RAG

The problem arises when using:

  • [X] the official example scripts: (give details below)

The tasks I am working on is:

  • [X] an official GLUE/SQUaD task: (give the name)

To reproduce

Following usage of token and sequence models should not be allowed, it may give unintended result in forward pass-

# RagSequenceForGeneration with "facebook/rag-token-nq"
model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)

# RagTokenForGeneration with "facebook/rag-sequence-nq"
model = RagTokenForGeneration.from_pretrained("facebook/rag-sequence-nq", retriever=retriever)

Also please correct example at https://huggingface.co/transformers/master/model_doc/rag.html#ragsequenceforgeneration

Expected behavior


Above usage should throw exception because both the models are incompatible with the each other.

All 7 comments

The model weights are actually 1-1 compatible with each other, so I see no reason why we should throw an exception here.

Hi Patrick, I also believe there are typos regarding the examples :

On "sequence" based : https://huggingface.co/facebook/rag-sequence-nq , the examples use "token" arguments e.g.

retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=True) 
model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever) 

@patrickvonplaten yes I think agree with you. I am closing this.

@patrickvonplaten

I am seeing very weird behaviour. Various RAG generator and model combination giving me very different output.
I am not able to understand why?

Check output of generators for "What is capital of Germany?" -

!pip install git+https://github.com/huggingface/transformers.git
!pip install datasets
!pip install faiss-cpu
!pip install torch torchvision

from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration, RagSequenceForGeneration
import torch
import faiss


tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True)


input_dict = tokenizer.prepare_seq2seq_batch("What is capital of Germany?", return_tensors="pt")
input_ids = input_dict["input_ids"]

# RagTokenForGeneration with "facebook/rag-token-nq"
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
generated_ids = model.generate(input_ids=input_ids)
generated_string = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Result of model = ", generated_string)

# RagSequenceForGeneration with "facebook/rag-sequence-nq"
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq", retriever=retriever)
generated_ids = model.generate(input_ids=input_ids)
generated_string = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Result of model = ", generated_string)

# RagSequenceForGeneration with "facebook/rag-token-nq"
model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
generated_ids = model.generate(input_ids=input_ids)
generated_string = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Result of model = ", generated_string)

# RagTokenForGeneration with "facebook/rag-sequence-nq"
model = RagTokenForGeneration.from_pretrained("facebook/rag-sequence-nq", retriever=retriever)
generated_ids = model.generate(input_ids=input_ids)
generated_string = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Result of model = ", generated_string)

Output of above run is (it is consistent behaviour) -

Result of model =  [' german capital']
Result of model =  ['']
Result of model =  [' munich']
Result of model =  [' germany']

Hi Patrick, I also believe there are typos regarding the examples :

On "sequence" based : https://huggingface.co/facebook/rag-sequence-nq , the examples use "token" arguments e.g.

retriever = RagRetriever.from_pretrained("facebook/rag-token-nq", index_name="exact", use_dummy_dataset=True) 
model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever) 

Should be fixed - thanks :-)
https://github.com/huggingface/transformers/blob/master/model_cards/facebook/rag-sequence-nq/README.md

@patrickvonplaten

I am seeing very weird behaviour. Various RAG generator and model combination giving me very different output.
I am not able to understand why?

Check output of generators for "What is capital of Germany?" -

!pip install git+https://github.com/huggingface/transformers.git
!pip install datasets
!pip install faiss-cpu
!pip install torch torchvision

from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration, RagSequenceForGeneration
import torch
import faiss


tokenizer = RagTokenizer.from_pretrained("facebook/rag-sequence-nq")
retriever = RagRetriever.from_pretrained("facebook/rag-sequence-nq", index_name="exact", use_dummy_dataset=True)


input_dict = tokenizer.prepare_seq2seq_batch("What is capital of Germany?", return_tensors="pt")
input_ids = input_dict["input_ids"]

# RagTokenForGeneration with "facebook/rag-token-nq"
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
generated_ids = model.generate(input_ids=input_ids)
generated_string = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Result of model = ", generated_string)

# RagSequenceForGeneration with "facebook/rag-sequence-nq"
model = RagSequenceForGeneration.from_pretrained("facebook/rag-sequence-nq", retriever=retriever)
generated_ids = model.generate(input_ids=input_ids)
generated_string = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Result of model = ", generated_string)

# RagSequenceForGeneration with "facebook/rag-token-nq"
model = RagSequenceForGeneration.from_pretrained("facebook/rag-token-nq", retriever=retriever)
generated_ids = model.generate(input_ids=input_ids)
generated_string = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Result of model = ", generated_string)

# RagTokenForGeneration with "facebook/rag-sequence-nq"
model = RagTokenForGeneration.from_pretrained("facebook/rag-sequence-nq", retriever=retriever)
generated_ids = model.generate(input_ids=input_ids)
generated_string = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print("Result of model = ", generated_string)

Output of above run is (it is consistent behaviour) -

Result of model =  [' german capital']
Result of model =  ['']
Result of model =  [' munich']
Result of model =  [' germany']

Hey @lalitpagaria , the models are different in generating the answers - the results are not unexpected :-) If you take a closer look into the code you can see that both models expect the exact same weights, but have different generate() functions

Thanks @patrickvonplaten
I will play with few parameters of RegConfig.

Was this page helpful?
0 / 5 - 0 ratings