Hi I'm trying to use 'fmikaelian/flaubert-base-uncased-squad' for question answering. I understand that I should load the model and the tokenizers. I'm not sure how should I do this.
My code is basically far
`
from transformers import pipeline, BertTokenizer
nlp = pipeline('question-answering', \
model='fmikaelian/flaubert-base-uncased-squad', \
tokenizer='fmikaelian/flaubert-base-uncased-squad')`
Most probably this can be solve with a two liner.
Many thanks
A link to original question on Stack Overflow:
https://stackoverflow.com/questions/60287465/pipeline-loading-models-and-tokenizers
Also cc'ing @fmikaelian on this for information :)
Apologize for the careless mistake @fmikaelian
Hi, other than the careless mistake, I'm trying to understand why I cannot load any model from transformers S3 repo. I have tried :
1) from transformers import FlaubertModel, FlaubertTokenizer
2) from transformers import CamembertTokenizer
3)from transformers import CamembertModel
4)from transformers import BertModel
model = BertModel.from_pretrained('bert-base-uncased')
Only the forth option has triggered the download process. All other options return :
"ImportError: cannot import name 'CamembertModel'"
i was wondering if there is an issue since I'm using conda in a Windows PC.
Many thanks for your help.
I tried to update transformers with conda but that did not work and I also tried to do some pip install but also getting some errors:
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\...\lib\site-packages\transformers\configuration_utils.py", line 145, in from_pretrained
raise EnvironmentError(msg)
OSError: Model name 'flaubert-base-uncased-squad' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed 'flaubert-base-uncased-squad' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.
As pointed out in my Stackoverflow answer, I suspect a versioning conflict. I successfully managed to load the pipeline in 2.5.0, but had errors in 2.4.1 (not quite the same as @rcontesti , but similar enough for me to assume problems with an older version).
Do you have torch installed in your environment? That might explain why you can't import CamembertModel.
The error
OSError: Model name 'flaubert-base-uncased-squad' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed 'flaubert-base-uncased-squad' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.
means you're trying to load a flaubert checkpoint in BERT. Could you share the code that raised the last error so that we may try to reproduce the error?
Guyz thank so much for your answers. I was able to solve the version problem but now I'm running into a different problem(Should I open a new thread?):
I'm currently using:
model_=transformers.FlaubertForQuestionAnswering
tokenizer_ = transformers.FlaubertTokenizer
But when I place them into pipeline:
nlp = pipeline('question-answering', \
model=model, \
tokenizer=tokenizer)
I'm getting the following error:
Traceback (most recent call last):
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\transformers\data\processors\squad.py", line 105, in squad_convert_example_to_features
sub_tokens = tokenizer.tokenize(token)
TypeError: tokenize() missing 1 required positional argument: 'text'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "question_extraction.py", line 61, in <module>
answer, score=question_extraction(text, question_, model_, tokenizer_, language_, verbose= True)
File "question_extraction.py", line 44, in question_extraction
output=nlp({'question':question, 'context': text})
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\transformers\pipelines.py", line 802, in __call__
for example in examples
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\socgen_nlp\lib\site-packages\transformers\pipelines.py", line 802, in <listcomp>
for example in examples
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\transformers\data\processors\squad.py", line 316, in squad_convert_examples_to_features
desc="convert squad examples to features",
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\tqdm\std.py", line 1097, in __iter__
for obj in iterable:
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\multiprocessing\pool.py", line 320, in <genexpr>
return (item for chunk in result for item in chunk)
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\multiprocessing\pool.py", line 735, in next
raise value
TypeError: tokenize() missing 1 required positional argument: 'text'
convert squad examples to features: 0%|
You need to initialize your model and tokenizer with a checkpoint, for example instead of
model_=transformers.FlaubertForQuestionAnswering
tokenizer_ = transformers.FlaubertTokenizer
You would specify a flaubert checkpoint:
model_ = transformers.FlaubertModel.from_pretrained("fmikaelian/flaubert-base-uncased-squad")
tokenizer_ = transformers.FlaubertTokenizer.from_pretrained("fmikaelian/flaubert-base-uncased-squad")
I chose a community checkpoint that was trained using question answering. You can check all available FlauBERT models here.
Once again many thanks @LysandreJik for the help. I proceed as suggested and now when I'm trying to put both the tokenizer and the model into pipeline I'm running into the following error:
Traceback (most recent call last):
File "question_extraction.py", line 72, in <module>
answer, score=question_extraction(text, question_, model_, tokenizer_, language_, verbose= True)
File "question_extraction.py", line 55, in question_extraction
output=nlp({'question':question, 'context': text})
File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\transformers\pipelines.py", line 818, in __call__
start, end = self.model(**fw_args)
ValueError: not enough values to unpack (expected 2, got 1)
It seems like the dictionary of values start and end I'm getting is not a tuple or something like that.
I updated the code so that it loads a previously saved model
tokenizer_ = FlaubertTokenizer.from_pretrained(MODELS)
model_ = FlaubertModel.from_pretrained(MODELS)
def question_extraction(text, question, model, tokenizer, language="French", verbose=False):
if language=="French":
nlp = pipeline('question-answering', \
model=model, \
tokenizer=tokenizer)
else:
nlp=pipeline('question-answering')
output=nlp({'question':question, 'context': text})
answer, score = output.answer, output.score
if verbose==True:
print("Q: ", question ,"\n",\
"A:", answer,"\n", \
"Confidence (%):", "{0:.2f}".format(str(score*100) )
)
return answer, score
if __name__=="__main__":
question_="Quel est le montant de la garantie?"
language_="French"
text="le montant de la garantie est β¬ 1000"
answer, score=question_extraction(text, question_, model_, tokenizer_, language_, verbose= True)
But now I'm getting an unpacking error:
C:\...\NLP\src>python question_extraction.py
OK
OK
convert squad examples to features: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.66it/s]
add example index and unique id: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "question_extraction.py", line 77, in <module>
answer, score=question_extraction(text, question_, model_, tokenizer_, language_, verbose= True)
File "question_extraction.py", line 60, in question_extraction
output=nlp({'question':question, 'context': text})
File "C:\...\transformers\pipelines.py", line 818, in __call__
start, end = self.model(**fw_args)
ValueError: not enough values to unpack (expected 2, got 1)
Hi @rcontesti, I've investigated further and found a few issues. First of all, the checkpoint you're trying to load is fmikaelian/flaubert-base-uncased-squad, which unfortunately cannot be used by pipelines.
This is because this model was fine-tuned with FlaubertForQuestionAnswering instead of FlaubertForQuestionAnsweringSimple, and only the latter can be used by pipelines. Since it was fine-tuned leveraging a different architecture for the QA head, it, unfortunately, won't be usable by pipelines. The usage example on the models page is misleading because of that (cc @fmikaelian).
Unfortunately, there is no French model that can be used with the pipelines, so you would need to do a custom inference leveraging the model. We don't have any examples showcasing how to leverage XLNet/XLM/FlaubertForQuestionAnswering, but it is on our roadmap.
@LysandreJik many thanks for your answer. It was very clarifying.
Some follow up questions on my side:
Many thanks!!!
FlaubertForQuestionAnsweringSimple with pipelines, the issue is that there is currently no model fine-tuned on QA for this model.CamembertForQuestionAnswering model with pipelines I believe, but unfortunately there is no model fine-tuned on QA for this model either.XLNet/XLM/FlaubertForQuestionAnswering and their differing architecture as well.@rcontesti @LysandreJik
I will fine-tune FlaubertForQuestionAnsweringSimple and CamembertForQuestionAnswering on French QA in the next days and let you know if we can use the pipeline with those
@fmikaelian, @LysandreJik
Many thanks for the help. Eventually I could train it myself, I haven't use Pytorch in a year but if you could point to a good dataset I could do training. Many thanks!
@rcontesti @LysandreJik
I fine-tuned FlaubertForQuestionAnsweringSimple on FQuAD, by editing run_squad.py using the same approach as #2746, but still got ValueError: not enough values to unpack (expected 2, got 1) when using the model with a pipeline.
I also fine-tuned CamembertForQuestionAnswering on FQuAD and French-SQuAD, and pipelines are working :-]
from transformers import pipeline
nlp = pipeline('question-answering', model='fmikaelian/camembert-base-squad', tokenizer='fmikaelian/camembert-base-squad')
nlp({
'question': "Qui est Claude Monet?",
'context': "Claude Monet, nΓ© le 14 novembre 1840 Γ Paris et mort le 5 dΓ©cembre 1926 Γ Giverny, est un peintre franΓ§ais et lβun des fondateurs de l'impressionnisme."
})
{'answer': 'un peintre franΓ§ais',
'end': 106,
'score': 0.498404793881182,
'start': 87}
Model links:
Will open a PR for models cards (#3089)
@fmikaelian That's really cool, thanks for taking the time to fine-tune those models! I'll look into the error with the pipeline ASAP, I'm pretty sure I know where it comes from.
Really cool to have the first community model for question answering in French!
Hi @fmikaelian
Just installed transformers from source and it seems the model is still not there
Model name 'fmikaelian/camembert-base-squad' was not found in model name list
Also tried to download from S3 but it also does not seem to be there:
OSError: Model name '../models/fmikaelian/camembert-base-squad' was not found in model name list. We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/../models/fmikaelian/camembert-base-squad/config.json' was a path, a model identifier, or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.
Would you mind sharing the s3 paths? I couldn t get them.
The models are on the S3. What command did you use? Why is there "../" in your model name?
The following works:
from transformers import CamembertModel
model = CamembertModel.from_pretrained("fmikaelian/camembert-base-squad")
The following also works:
from transformers import pipeline
nlp = pipeline("question-answering", model="fmikaelian/camembert-base-squad", tokenizer="fmikaelian/camembert-base-squad")
@LysandreJik, is working now. Many thanks!
Most helpful comment
@fmikaelian That's really cool, thanks for taking the time to fine-tune those models! I'll look into the error with the pipeline ASAP, I'm pretty sure I know where it comes from.
Really cool to have the first community model for question answering in French!