Transformers: Pipeline Loading Models and Tokenizers

Created on 18 Feb 2020 · 20Comments · Source: huggingface/transformers

❓ Questions & Help

Details

Hi I'm trying to use 'fmikaelian/flaubert-base-uncased-squad' for question answering. I understand that I should load the model and the tokenizers. I'm not sure how should I do this.

My code is basically far

`
from transformers import pipeline, BertTokenizer

nlp = pipeline('question-answering', \
model='fmikaelian/flaubert-base-uncased-squad', \
tokenizer='fmikaelian/flaubert-base-uncased-squad')`

Most probably this can be solve with a two liner.

Many thanks

A link to original question on Stack Overflow:
https://stackoverflow.com/questions/60287465/pipeline-loading-models-and-tokenizers

Pipeline

Source

rcontesti

Most helpful comment

@fmikaelian That's really cool, thanks for taking the time to fine-tune those models! I'll look into the error with the pipeline ASAP, I'm pretty sure I know where it comes from.

Really cool to have the first community model for question answering in French!

LysandreJik on 2 Mar 2020

👍3

All 20 comments

Also cc'ing @fmikaelian on this for information :)

julien-c on 20 Feb 2020

Apologize for the careless mistake @fmikaelian

rcontesti on 20 Feb 2020

Hi, other than the careless mistake, I'm trying to understand why I cannot load any model from transformers S3 repo. I have tried :

1) from transformers import FlaubertModel, FlaubertTokenizer

2) from transformers import CamembertTokenizer

3)from transformers import CamembertModel

4)from transformers import BertModel
model = BertModel.from_pretrained('bert-base-uncased')

Only the forth option has triggered the download process. All other options return :
"ImportError: cannot import name 'CamembertModel'"

i was wondering if there is an issue since I'm using conda in a Windows PC.

Many thanks for your help.

rcontesti on 24 Feb 2020

I tried to update transformers with conda but that did not work and I also tried to do some pip install but also getting some errors:

File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\...\lib\site-packages\transformers\configuration_utils.py", line 145, in from_pretrained
    raise EnvironmentError(msg)
OSError: Model name 'flaubert-base-uncased-squad' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed 'flaubert-base-uncased-squad' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

rcontesti on 24 Feb 2020

As pointed out in my Stackoverflow answer, I suspect a versioning conflict. I successfully managed to load the pipeline in 2.5.0, but had errors in 2.4.1 (not quite the same as @rcontesti , but similar enough for me to assume problems with an older version).

dennlinger on 24 Feb 2020

👍1

Do you have torch installed in your environment? That might explain why you can't import CamembertModel.

The error

OSError: Model name 'flaubert-base-uncased-squad' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased). We assumed 'flaubert-base-uncased-squad' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

means you're trying to load a flaubert checkpoint in BERT. Could you share the code that raised the last error so that we may try to reproduce the error?

LysandreJik on 24 Feb 2020

Guyz thank so much for your answers. I was able to solve the version problem but now I'm running into a different problem(Should I open a new thread?):

I'm currently using:

model_=transformers.FlaubertForQuestionAnswering
tokenizer_ = transformers.FlaubertTokenizer

But when I place them into pipeline:

nlp = pipeline('question-answering', \
        model=model, \
        tokenizer=tokenizer)

I'm getting the following error:

Traceback (most recent call last):
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\transformers\data\processors\squad.py", line 105, in squad_convert_example_to_features
    sub_tokens = tokenizer.tokenize(token)
TypeError: tokenize() missing 1 required positional argument: 'text'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "question_extraction.py", line 61, in <module>
    answer, score=question_extraction(text, question_, model_, tokenizer_, language_, verbose= True)
  File "question_extraction.py", line 44, in question_extraction
    output=nlp({'question':question, 'context': text})
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\transformers\pipelines.py", line 802, in __call__
    for example in examples
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\socgen_nlp\lib\site-packages\transformers\pipelines.py", line 802, in <listcomp>
    for example in examples
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\transformers\data\processors\squad.py", line 316, in squad_convert_examples_to_features
    desc="convert squad examples to features",
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\tqdm\std.py", line 1097, in __iter__
    for obj in iterable:
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\multiprocessing\pool.py", line 320, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\multiprocessing\pool.py", line 735, in next
    raise value
TypeError: tokenize() missing 1 required positional argument: 'text'
convert squad examples to features:   0%|

rcontesti on 24 Feb 2020

You need to initialize your model and tokenizer with a checkpoint, for example instead of

model_=transformers.FlaubertForQuestionAnswering
tokenizer_ = transformers.FlaubertTokenizer

You would specify a flaubert checkpoint:

model_ = transformers.FlaubertModel.from_pretrained("fmikaelian/flaubert-base-uncased-squad")
tokenizer_ = transformers.FlaubertTokenizer.from_pretrained("fmikaelian/flaubert-base-uncased-squad")

I chose a community checkpoint that was trained using question answering. You can check all available FlauBERT models here.

LysandreJik on 24 Feb 2020

Once again many thanks @LysandreJik for the help. I proceed as suggested and now when I'm trying to put both the tokenizer and the model into pipeline I'm running into the following error:

Traceback (most recent call last): File "question_extraction.py", line 72, in <module> answer, score=question_extraction(text, question_, model_, tokenizer_, language_, verbose= True) File "question_extraction.py", line 55, in question_extraction output=nlp({'question':question, 'context': text}) File "C:\Users\Ruben Contesti\AppData\Local\Continuum\Anaconda3\envs\..\lib\site-packages\transformers\pipelines.py", line 818, in __call__ start, end = self.model(**fw_args) ValueError: not enough values to unpack (expected 2, got 1)

It seems like the dictionary of values start and end I'm getting is not a tuple or something like that.

rcontesti on 25 Feb 2020

I updated the code so that it loads a previously saved model

tokenizer_ = FlaubertTokenizer.from_pretrained(MODELS)
model_ = FlaubertModel.from_pretrained(MODELS)

def question_extraction(text, question, model, tokenizer, language="French", verbose=False):

    if language=="French":
        nlp = pipeline('question-answering', \
        model=model, \
        tokenizer=tokenizer)
    else:
        nlp=pipeline('question-answering')

    output=nlp({'question':question, 'context': text})

    answer, score = output.answer, output.score 

    if verbose==True:
        print("Q: ", question ,"\n",\
              "A:", answer,"\n", \
              "Confidence (%):", "{0:.2f}".format(str(score*100) )
              )

    return answer, score

if __name__=="__main__":
    question_="Quel est le montant de la garantie?"
    language_="French"
    text="le montant de la garantie est € 1000"

    answer, score=question_extraction(text, question_, model_, tokenizer_, language_, verbose= True)

But now I'm getting an unpacking error:

C:\...\NLP\src>python question_extraction.py
OK
OK
convert squad examples to features: 100%|████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.66it/s]
add example index and unique id: 100%|███████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "question_extraction.py", line 77, in <module>
    answer, score=question_extraction(text, question_, model_, tokenizer_, language_, verbose= True)
  File "question_extraction.py", line 60, in question_extraction
    output=nlp({'question':question, 'context': text})
  File "C:\...\transformers\pipelines.py", line 818, in __call__
    start, end = self.model(**fw_args)
ValueError: not enough values to unpack (expected 2, got 1)

rcontesti on 25 Feb 2020

Hi @rcontesti, I've investigated further and found a few issues. First of all, the checkpoint you're trying to load is fmikaelian/flaubert-base-uncased-squad, which unfortunately cannot be used by pipelines.

This is because this model was fine-tuned with FlaubertForQuestionAnswering instead of FlaubertForQuestionAnsweringSimple, and only the latter can be used by pipelines. Since it was fine-tuned leveraging a different architecture for the QA head, it, unfortunately, won't be usable by pipelines. The usage example on the models page is misleading because of that (cc @fmikaelian).

Unfortunately, there is no French model that can be used with the pipelines, so you would need to do a custom inference leveraging the model. We don't have any examples showcasing how to leverage XLNet/XLM/FlaubertForQuestionAnswering, but it is on our roadmap.

LysandreJik on 25 Feb 2020

@LysandreJik many thanks for your answer. It was very clarifying.

Some follow up questions on my side:

If I use FlaubertForQuestionAnsweringSimple then can I use pipelines? If that is the case would you show me how?
Is it also the case that I cannot use CammmBert for QA?
I guess that because we have different architectures theres is no quick hack to adapt it to pipelines, am I getting it right?
If I were to do custom inferencing, without pipelines and only using pytorch, would you mind showing me the resources to do so?

Many thanks!!!

rcontesti on 25 Feb 2020

You can indeed use FlaubertForQuestionAnsweringSimple with pipelines, the issue is that there is currently no model fine-tuned on QA for this model.
You could also use the CamembertForQuestionAnswering model with pipelines I believe, but unfortunately there is no model fine-tuned on QA for this model either.
Indeed, we should add these down the line, but it is not very high on our priority list right now cc @mfuntowicz
Yes, I'm currently working on some examples that should be merged sometimes today. I'll look into using a XLNet/XLM/FlaubertForQuestionAnswering and their differing architecture as well.

LysandreJik on 25 Feb 2020

@rcontesti @LysandreJik

I will fine-tune FlaubertForQuestionAnsweringSimple and CamembertForQuestionAnswering on French QA in the next days and let you know if we can use the pipeline with those

fmikaelian on 26 Feb 2020

❤2

@fmikaelian, @LysandreJik

Many thanks for the help. Eventually I could train it myself, I haven't use Pytorch in a year but if you could point to a good dataset I could do training. Many thanks!

rcontesti on 28 Feb 2020

@rcontesti @LysandreJik

I fine-tuned FlaubertForQuestionAnsweringSimple on FQuAD, by editing run_squad.py using the same approach as #2746, but still got ValueError: not enough values to unpack (expected 2, got 1) when using the model with a pipeline.

I also fine-tuned CamembertForQuestionAnswering on FQuAD and French-SQuAD, and pipelines are working :-]

from transformers import pipeline

nlp = pipeline('question-answering', model='fmikaelian/camembert-base-squad', tokenizer='fmikaelian/camembert-base-squad')

nlp({
    'question': "Qui est Claude Monet?",
    'context': "Claude Monet, né le 14 novembre 1840 à Paris et mort le 5 décembre 1926 à Giverny, est un peintre français et l’un des fondateurs de l'impressionnisme."
})

{'answer': 'un peintre français',
 'end': 106,
 'score': 0.498404793881182,
 'start': 87}

Model links:

Will open a PR for models cards (#3089)

fmikaelian on 2 Mar 2020

@fmikaelian That's really cool, thanks for taking the time to fine-tune those models! I'll look into the error with the pipeline ASAP, I'm pretty sure I know where it comes from.

Really cool to have the first community model for question answering in French!

LysandreJik on 2 Mar 2020

👍3

Hi @fmikaelian

Just installed transformers from source and it seems the model is still not there

Model name 'fmikaelian/camembert-base-squad' was not found in model name list

Also tried to download from S3 but it also does not seem to be there:

OSError: Model name '../models/fmikaelian/camembert-base-squad' was not found in model name list. We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/../models/fmikaelian/camembert-base-squad/config.json' was a path, a model identifier, or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

Would you mind sharing the s3 paths? I couldn t get them.

rcontesti on 4 Mar 2020

The models are on the S3. What command did you use? Why is there "../" in your model name?

The following works:

from transformers import CamembertModel
model = CamembertModel.from_pretrained("fmikaelian/camembert-base-squad")

The following also works:

from transformers import pipeline
nlp = pipeline("question-answering", model="fmikaelian/camembert-base-squad", tokenizer="fmikaelian/camembert-base-squad")

LysandreJik on 5 Mar 2020

👍1

@LysandreJik, is working now. Many thanks!

rcontesti on 5 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings