Transformers: Onnx converted model has its output shape modified when compared to original (finetuned) model

Created on 7 Jun 2020 · 11Comments · Source: huggingface/transformers

🐛 Bug

Information

Model I am using (Bert, XLNet ...): mrm8488/distilroberta-base-finetuned-sentiment from the hub

Language I am using the model on (English, Chinese ...): English

The problem arises when using:

[ ] the official example scripts: (give details below)
[X] my own modified scripts: (give details below)

I use the 04-onnx-export.ipynb Notebook and have only change the model name and the tokenizer:

The issue appeared on all finetuned model I tried, being classification or multichoice questions.

The tasks I am working on is:

[ ] my own task or dataset: (give details below)
[X] an official GLUE/SQUaD task: classification

To reproduce

Steps to reproduce the behavior:

Import AutoTokenizer, AutoModelForSequenceClassification and change tokenizer and model name, the section we are interested into:

# ...
!rm -rf onnx/
from transformers.convert_graph_to_onnx import convert

# Handles all the above steps for you
convert(framework="pt", model="mrm8488/distilroberta-base-finetuned-sentiment", output="onnx/bert-base-cased.onnx", opset=11)
# ...

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("mrm8488/distilroberta-base-finetuned-sentiment")
cpu_model = create_model_for_provider("onnx/bert-base-cased.onnx", "CPUExecutionProvider")

# Inputs are provided through numpy array
model_inputs = tokenizer.encode_plus("My name is Bert", return_tensors="pt")
inputs_onnx = {k: v.cpu().detach().numpy() for k, v in model_inputs.items()}

# Run the model (None = get all the outputs)
sequence, pooled = cpu_model.run(None, inputs_onnx)

# Print information about outputs

print(f"Sequence output: {sequence.shape}, Pooled output: {pooled.shape}")

pytorch_model = AutoModelForSequenceClassification.from_pretrained("mrm8488/distilroberta-base-finetuned-sentiment")
a, = pytorch_model(**model_inputs)
print(f"finetune non onnx pytorch model output: {a.shape}")
# ...

Expected behavior

I was expecting that the onnx output shape would be the same than the non converted model output shape, but that's not the case:

Sequence output: (1, 6, 768), Pooled output: (1, 768)
finetune non onnx pytorch model output: torch.Size([1, 6])

It is like the last layer of the model related to the classification task is not taken in onnx.
Does it make sense? @mfuntowicz

Environment info

Google Colab with a GPU

Source

pommedeterresautee

👍1

Most helpful comment

@manueltonneau You're right, we're currently enforcing the feature-extraction because not all our pipelines are compatible with ONNX graph representation.

I'll have a look asap to identify which pipelines are compatible and which are not, so what we can add the possibility to export other kind of pipeline through the script.

mfuntowicz on 9 Jun 2020

👍2

All 11 comments

Facing the same problem with a BERT model fine-tuned on sequence classification and would love to get an answer :)

manueltonneau on 9 Jun 2020

This seems to be related to this issue.

As @hrsmanian points it, it seems that in convert_graph_to_onnx.py, the model is currently converted by default to a 'feature-extraction' version where the classification layer is discarded. Changing the pipeline type (line 108 of the py file) to 'ner' in @hrsmanian's case seems to have worked.

In the case of binary classification, I tried changing the pipeline type to 'sentiment-analysis' (my model is a binary BertForSequenceClassification) but get a ValueError (ValueError: not enough values to unpack (expected 2, got 1)) when trying to run the session. I used simpletransformers (which is based on this repo) to do binary classification with BERT, followed the instructions for conversion and inference from the blog post.

Let me know if you see what the problem is @mfuntowicz :)

manueltonneau on 9 Jun 2020

👍2

Actually, I managed to make it work.

The problem was that the session.run output shape changed and so writing:
output, pooled = session.run(None, tokens) was not working anymore.

When only writing output = session.run(None, tokens), it works and I get the classification scores.

Hope that helps :)

manueltonneau on 9 Jun 2020

🎉1

@manueltonneau You're right, we're currently enforcing the feature-extraction because not all our pipelines are compatible with ONNX graph representation.

I'll have a look asap to identify which pipelines are compatible and which are not, so what we can add the possibility to export other kind of pipeline through the script.

mfuntowicz on 9 Jun 2020

👍2

Tks @manueltonneau , works for me too! Btw you may prefer output, = ... to avoid the list :-)
@mfuntowicz would it be possible to have a pipeline for the multichoice task (and a related onnx converter too if this is onnx compatible)? Not sure why it doesn't exist yet btw as all models I have used support the task.

pommedeterresautee on 9 Jun 2020

👍1

It might be possible for pipelines such as token classification and sequence classification to be exportable out of the box. These pipelines generally just add a projection layer on top of the model followed by a argmax. All of these operators are natively supported by ONNX.

For more complex pipeline such as qa or generation, ONNX might not support all the operators used in the post-processing steps (i.e. _sampling_, _answer span extraction_) and thus would lead to the impossibility to export the model to ONNX.

mfuntowicz on 10 Jun 2020

This is a very good news!
So in theory a multichoice pipeline should work as it s just a projection like classification but with a different shape, am I right? Would it be possible for your team to support this task on the pipeline?

pommedeterresautee on 10 Jun 2020

I have another question, looking at the convert function code, the dumb input used to guess the architecture of the model in torch script is:

    tokens = nlp.tokenizer.encode_plus("This is a sample output", return_tensors=framework)

My understanding is that onnx uses torch script and torch script can only guess a fix input length.
Doc here

@mfuntowicz Does that mean that onnx model truncates all inputs to less than 10 tokens?
@manueltonneau On your model, does onnx predictions the same than pytorch ones? (for the same input)
My model is based on the multichoice task and it doesn't work (it compiles but the predictions are wrong). I don't know if it s because some input truncation or just because of the task.

pommedeterresautee on 10 Jun 2020

@pommedeterresautee You're right here about how PyTorch & ONNX interact together. ONNX leverage the tracing provided by PyTorch to construct the ONNX IR.

However on the input, it should not truncate anything because convert_graph_to_onnx.py exports the inputs with the sequence axis being dynamic

# Generate input names & axes
input_vars = list(tokens.keys())
input_dynamic_axes = {k: build_shape_dict(v, True, seq_len) for k, v in tokens.items()}

You can set a breakpoint on this line and see the actual axes being dynamic (input and output).

If you find any incoherent behaviour we can dig further to understand why dynamic axes are not correctly exported in your case 👍

mfuntowicz on 10 Jun 2020

👍1

First, I have tried with a long sequence on classification task and it works (results are the same).
Anyway, tks @mfuntowicz for the clear explanation

Not a big surprise, the converter doesn't work when the task is multichoice and the pipeline used in the converter is "sentiment-analysis" (because the multichoice pipeline doesn't exist).

What can I do to get the support of a multichoice task pipeline and check if onnx works in this setup?

Code to reproduce

import torch
from transformers import AutoTokenizer, AutoModelForMultipleChoice
from transformers.convert_graph_to_onnx import convert
from onnxruntime import InferenceSession, SessionOptions, get_all_providers


def create_model_for_provider(model_path: str, provider: str) -> InferenceSession:
    assert provider in get_all_providers(), f"provider {provider} not found, {get_all_providers()}"

    # Few properties than might have an impact on performances (provided by MS)
    options = SessionOptions()
    options.intra_op_num_threads = 1

    # Load the model as a graph and prepare the CPU backend
    return InferenceSession(model_path, options, providers=[provider])


tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base', use_fast=False)

model = AutoModelForMultipleChoice.from_pretrained(pretrained_model_name_or_path="output/xlm-r")
device = torch.device(device='cuda')
model.to(device)
model.eval()

convert(framework="pt",
        model="output/xlm-r",
        tokenizer='xlm-roberta-base',
        output="output/onnx/xlm-r.onnx",
        opset=11)
model_onnx = create_model_for_provider("output/onnx/xlm-r.onnx", "CUDAExecutionProvider")


inputs = tokenizer.encode_plus("hello les amis, comment allez vous ? Moi pas mal", "je vais très bien")

torch_inputs = {k: torch.tensor([[v, v]], dtype=torch.long).to(device) for k, v in inputs.items()}
output_pytorch = model(**torch_inputs)
inputs_onnx = {k: v.cpu().detach().numpy() for k, v in torch_inputs.items()}

sequence, = model_onnx.run(None, inputs_onnx)

It crashes with:

Traceback (most recent call last):
  File "/home/geantvert/.local/share/virtualenvs/***/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-11-f614fb04d5d2>", line 7, in <module>
    sequence, = model_onnx.run(None, inputs_onnx)
  File "/home/geantvert/.local/share/virtualenvs/***/lib/python3.8/site-packages/onnxruntime/capi/session.py", line 111, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid rank for input: input_ids Got: 3 Expected: 2 Please fix either the inputs or the model.

pommedeterresautee on 10 Jun 2020