Describe the bug
I use onnxruntime/onnxruntime/python/tools/transformers/benchmark_gpt2.py script to benchmark and export GPT2-XL (1.5B) to ONNX and apply optimizations:
python benchmark_gpt2.py \
--model_name "gpt2-xl" \
--cache_dir "./cache_models" \
--onnx_dir="./gpt2_xl_onnx_past" \
--test_times 10 \
--precision fp16 \
--optimize_onnx \
--use_gpu \
--batch_sizes "1" \
--past_sequence_lengths 1000 \
--result_csv gpt2_results.csv
When I use it for gpt2-large, it works without a problem. When I switch model_name to gpt2-xl, it shows that optimizations are being applied, but fails to save optimized model to disk:
...
Output model to ./gpt2_xl_onnx_past/_past_fp16.onnx
Traceback (most recent call last):
File "benchmark_gpt2.py", line 258, in <module>
main()
File "benchmark_gpt2.py", line 152, in main
model.config.num_attention_heads, model.config.hidden_size)
File "/home/jupyter/onnxruntime/onnxruntime/python/tools/transformers/gpt2_helper.py", line 252, in optimize_onnx
m.save_model_to_file(optimized_model_path)
File "/home/jupyter/onnxruntime/onnxruntime/python/tools/transformers/onnx_model.py", line 668, in save_model_to_file
save_model(self.model, output_path, format=None)
File "/opt/conda/lib/python3.7/site-packages/onnx/__init__.py", line 186, in save_model
s = _serialize(proto)
File "/opt/conda/lib/python3.7/site-packages/onnx/__init__.py", line 67, in _serialize
result = proto.SerializeToString()
ValueError: Message ONNX_REL_1_7.ModelProto exceeds maximum protobuf size of 2GB: 3276141735
I also added use_external_data_format=True to torch.onnx.export() method in gpt2_helper.py and expected that it would help, but it did not. I can't use other scripts (benchmark.py) because I need GPT2 to be exported with the past state support.
Urgency
I'm blocked on my current GPT2 deployment project because of this issue. The model is approximately 4x more expensive and slow in our production without ONNX optimizations.
System information
use_external_data_format flag in torch.onnx.export())To Reproduce
- Describe steps/code to reproduce the behavior.
1. Add use_external_data_format=True to torch.onnx.export() method in gpt2_helper.py
2. Add gpt2-xl to PRETRAINED_MODELS list in benchmark_gpt2.py
3. Run:
python benchmark_gpt2.py \
--model_name "gpt2-xl" \
--cache_dir "./cache_models" \
--onnx_dir="./gpt2_xl_onnx_past" \
--test_times 10 \
--precision fp16 \
--optimize_onnx \
--use_gpu \
--batch_sizes "1" \
--past_sequence_lengths 1000 \
--result_csv gpt2_results.csv
Expected behavior
gpt2-xl is benchmarked and exported to ONNX without any error.
according to onnx.save_model, if you pass in 'f' a path, it will save external tensors which will avoid 2gb limit. May you give onnx_model_path a folder name and see what happens?
@liqunfu unfortunately it didn't seem to help. I edited onnx_model.py file so that save_model_to_file function sends a folder name instead of file path to onnx.save_model:
def save_model_to_file(self, output_path):
output_folder = os.path.dirname(output_path)
logger.info(f"Output model to {output_path}")
logger.info(f"Output model to folder {output_folder}")
if output_path.endswith(".json"):
assert isinstance(self.model, ModelProto)
with open(output_path, "w") as out:
out.write(str(self.model))
else:
save_model(self.model, output_folder, format=None)
#external_data_helper.convert_model_to_external_data(self.model, all_tensors_to_one_file=True, location = output_path + ".data")
#with open(output_path, "wb") as out:
# out.write(self.model.SerializeToString())
I also made sure f is the second parameter in save_model:
save_model(proto, f, format=None)
````
According to the log, this method is indeed called and `output_folder` is indeed a folder, but it didn't help:
Output model to ./gpt2_xl_onnx_past/_past_fp16.onnx
Output model to folder ./gpt2_xl_onnx_past
Traceback (most recent call last):
File "benchmark_gpt2.py", line 258, in
main()
File "benchmark_gpt2.py", line 152, in main
model.config.num_attention_heads, model.config.hidden_size)
File "/home/jupyter/onnxruntime/onnxruntime/python/tools/transformers/gpt2_helper.py", line 252, in optimize_onnx
m.save_model_to_file(optimized_model_path)
File "/home/jupyter/onnxruntime/onnxruntime/python/tools/transformers/onnx_model.py", line 674, in save_model_to_file
save_model(self.model, output_folder, format=None)
File "/opt/conda/lib/python3.7/site-packages/onnx/__init__.py", line 186, in save_model
s = _serialize(proto)
File "/opt/conda/lib/python3.7/site-packages/onnx/__init__.py", line 67, in _serialize
result = proto.SerializeToString()
ValueError: Message ONNX_REL_1_7.ModelProto exceeds maximum protobuf size of 2GB: 3276141735
```
@klimentij, I can reproduce the problem. I will try modify the save_model_to_file function and let you know when there is progress.
@klimentij, It seems that the following change could help export large model to ONNX:
def save_model_to_file(self, output_path):
from pathlib import Path
external_data_helper.convert_model_to_external_data(self.model, all_tensors_to_one_file=True, location = Path(output_path).name + ".data")
save_model(self.model, output_path)
The output model will contain two files like name.onnx and name.onnx.data. I'll send a pull request later after more testing.
It seems that the model is very large, so benchmark will get out of memory exception when both PyTorch model and ONNX model are loaded in V100 GPU (with 16GB memory). After ONNX model is exported, use ONNX model only might avoid the problem.
Thank you @tianleiwu! I managed to export it to .onnx and .onnx.data files using the edit you suggested.
Most helpful comment
@klimentij, I can reproduce the problem. I will try modify the save_model_to_file function and let you know when there is progress.