Describe the bug
Using a simple model with only one convolution operation, I am able to get the same inference results using PyTorch and ONNX Runtime's Python API. However, when I load the same image for inference with your C API, I cannot get the same results despite (as far as I can tell) passing the exact same pixel values to the model
This seems similar to https://github.com/microsoft/onnxruntime/issues/1204, but unfortunately the only proposed solution was proper normalizing. In my case, the normalization is explicitly controlled in both the Python and C APIs and can be shown to produce identical pixel values prior to feeding the model
Urgency
None
System information
To Reproduce
8 steps total - it just looks long because of the included scripts and sample output!
1) Build from commit a02638e
./build.sh --config Debug --enable_pybind --build_wheel --build_shared_lib --parallel
2) Install the Python wheel
pip install build/Linux/Debug/dist/onnxruntime-1.2.0-cp36-cp36m-linux_x86_64.whl
3) pip install other Python requirements
numpy==1.18.1
opencv-python==4.2.0
onnx==1.6.0
torch==1.4.0
4) Add the two following scripts to a folder
ort-capi.py
import os
import random
import cv2
import numpy as np
import onnx
import onnxruntime
import torch
import torch.onnx
import torch.nn as nn
def load_image(img_path):
"""Load image data to PyTorch tensor
:param str img_path: Filepath to image data on disk
:return torch.tensor img: Shape ``(C, H, W)``
"""
loaded = cv2.imread(img_path)
img = torch.tensor(loaded).permute(2, 0, 1) / 255.0
print('Python first 3 pixels')
for channel in range(3):
print(f'{loaded.flatten()[channel]} --> {img[channel, 0, 0]:.6f}')
return img
def seed_everything(seed=1234):
"""Control all random seeds that could potentially be used by PyTorch"""
random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
def to_numpy(tensor):
return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()
class ReduceDims(nn.Module):
def __init__(self):
super(ReduceDims, self).__init__()
self.conv = nn.Conv2d(3, 3, (32, 64), (32, 64))
def forward(self, image):
"""Convolve with large filter to shrink spatial dimensions
:param torch.tensor image: Shape ``(N, C, H, W)`` in BGR order
:return:
"""
return self.conv(image)
if __name__ == '__main__':
# Export ONNX model
seed_everything()
model = ReduceDims()
image = load_image('crop.jpg').unsqueeze(0)
torch_out = model(image)
onnx_path = 'simple.onnx'
torch.onnx.export(model, image, onnx_path, input_names=['images'], output_names=['conv'])
# Run ORT inference
onnx_model = onnx.load(onnx_path)
onnx.checker.check_model(onnx_model)
ort_session = onnxruntime.InferenceSession(onnx_path)
ort_inputs = {'images': to_numpy(image)}
ort_outs = ort_session.run(None, ort_inputs)
# Compare and print results
np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03)
print(f'PyTorch: {torch_out.shape} \n{torch_out}')
print(f'ORT-Py: {ort_outs[0].shape} \n{ort_outs[0]}')
ort-decode.cpp (adapted with minimal changes from the C API sample)
#include <assert.h>
#include <onnxruntime_c_api.h>
#include <cmath>
#include <stdlib.h>
#include <stdio.h>
#include <vector>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc/imgproc.hpp>
const OrtApi* g_ort = OrtGetApiBase()->GetApi(ORT_API_VERSION);
void CheckStatus(OrtStatus* status) {
if (status != NULL) {
const char* msg = g_ort->GetErrorMessage(status);
fprintf(stderr, "%s\n", msg);
g_ort->ReleaseStatus(status);
exit(1);
}
}
int main(int argc, char* argv[]) {
// Initialize enviroment (one per process) to maintain thread pools and other state info
OrtEnv* env;
CheckStatus(g_ort->CreateEnv(ORT_LOGGING_LEVEL_WARNING, "test", &env));
// Initialize session options if needed
OrtSessionOptions* session_options;
CheckStatus(g_ort->CreateSessionOptions(&session_options));
g_ort->SetIntraOpNumThreads(session_options, 1);
// Sets graph optimization level
g_ort->SetSessionGraphOptimizationLevel(session_options, ORT_ENABLE_BASIC);
// Create session and load model into memory
OrtSession* session;
const char* model_path = "simple.onnx";
printf("Using Onnxruntime C API\n");
CheckStatus(g_ort->CreateSession(env, model_path, session_options, &session));
// Print model input layer (node names, types, shape etc.)
size_t num_input_nodes;
OrtStatus* status;
OrtAllocator* allocator;
CheckStatus(g_ort->GetAllocatorWithDefaultOptions(&allocator));
// Print number of model input nodes
status = g_ort->SessionGetInputCount(session, &num_input_nodes);
std::vector<const char*> input_node_names(num_input_nodes);
std::vector<int64_t> input_node_dims;
printf("Number of inputs = %zu\n", num_input_nodes);
// Iterate over all input nodes and print names/types/shapes
std::vector<char*> input_names;
for (size_t i = 0; i < num_input_nodes; i++) {
char* input_name;
status = g_ort->SessionGetInputName(session, i, allocator, &input_name);
printf("Input %zu : name=%s\n", i, input_name);
input_node_names[i] = input_name;
input_names.push_back(input_name);
OrtTypeInfo* typeinfo;
status = g_ort->SessionGetInputTypeInfo(session, i, &typeinfo);
const OrtTensorTypeAndShapeInfo* tensor_info;
CheckStatus(g_ort->CastTypeInfoToTensorInfo(typeinfo, &tensor_info));
ONNXTensorElementDataType type;
CheckStatus(g_ort->GetTensorElementType(tensor_info, &type));
printf("Input %zu : type=%d\n", i, type);
size_t num_dims;
CheckStatus(g_ort->GetDimensionsCount(tensor_info, &num_dims));
printf("Input %zu : num_dims=%zu\n", i, num_dims);
input_node_dims.resize(num_dims);
g_ort->GetDimensions(tensor_info, (int64_t*)input_node_dims.data(), num_dims);
for (size_t j = 0; j < num_dims; j++) {
printf("Input %zu : dim %zu=%jd\n", i, j, input_node_dims[j]);
}
g_ort->ReleaseTypeInfo(typeinfo);
}
// Load image data to array
size_t input_tensor_size = 68 * 136 * 3; // eventually use OrtGetTensorShapeElementCount() to get official size!
float input_tensor_values[input_tensor_size];
std::vector<const char*> output_node_names = {"conv"};
cv::Mat image_bgr = cv::imread("crop.jpg", cv::IMREAD_COLOR);
if (!image_bgr.isContinuous()) {
image_bgr = image_bgr.clone();
}
for (size_t i = 0; i < input_tensor_size; i++) {
input_tensor_values[i] = image_bgr.data[i] / 255.0;
}
// Create input tensor object from data values
OrtMemoryInfo* memory_info;
CheckStatus(g_ort->CreateCpuMemoryInfo(OrtArenaAllocator, OrtMemTypeDefault, &memory_info));
OrtValue* input_tensor = NULL;
CheckStatus(g_ort->CreateTensorWithDataAsOrtValue(memory_info, input_tensor_values, input_tensor_size * sizeof(float), input_node_dims.data(), 4, ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT, &input_tensor));
int is_tensor;
CheckStatus(g_ort->IsTensor(input_tensor, &is_tensor));
assert(is_tensor);
g_ort->ReleaseMemoryInfo(memory_info);
float* input_ort;
CheckStatus(g_ort->GetTensorMutableData(input_tensor, (void**)&input_ort));
printf("ORT first 3 pixels\n");
for (int i=0; i<3; i++) {
printf("%f\n", input_ort[i]);
}
// Run inference
std::vector<OrtValue*> outputs(output_node_names.size());
CheckStatus(g_ort->Run(session, NULL, input_node_names.data(), (const OrtValue* const*)&input_tensor, 1, output_node_names.data(), output_node_names.size(), outputs.data()));
CheckStatus(g_ort->IsTensor(outputs[0], &is_tensor));
assert(is_tensor);
// Get pointer to output tensor float values
float* conv;
CheckStatus(g_ort->GetTensorMutableData(outputs[0], (void**)&conv));
printf("ORT-C outputs\n");
for (int i=0; i<12; i++) {
printf("%f\n", conv[i]);
}
g_ort->ReleaseValue(outputs[0]);
g_ort->ReleaseValue(input_tensor);
g_ort->ReleaseSession(session);
g_ort->ReleaseSessionOptions(session_options);
g_ort->ReleaseEnv(env);
for (int i=0; i<input_names.size(); i++) {
free(input_names[i]);
}
printf("Done!\n");
return 0;
}
5) Download the sample image (68x136 pixels) from the screenshots section below and save it as crop.jpg in the same folder as the Python and C++ scripts from above
6) Run the Python script: python3 ort-capi.py. This should generate a simple.onnx file in the current directory and output the following results
Python first 3 pixels
43 --> 0.168627
44 --> 0.172549
40 --> 0.156863
Called inference_session.cc:1054
PyTorch: torch.Size([1, 3, 2, 2])
tensor([[[[-0.2275, -0.2554],
[-0.2656, -0.0904]],
[[ 0.2442, 0.2813],
[ 0.1520, 0.1503]],
[[ 0.4388, 0.4080],
[ 0.4567, 0.6156]]]], grad_fn=<MkldnnConvolutionBackward>)
ORT-Py: (1, 3, 2, 2)
[[[[-0.22749133 -0.25542438]
[-0.26560846 -0.09042263]]
[[ 0.24415745 0.2812824 ]
[ 0.15198623 0.15028757]]
[[ 0.43876082 0.40797257]
[ 0.45668074 0.61562276]]]]
7) Compile the C++ program: g++ ort-decode.cpp -g -o ort-decode -I . -lonnxruntime -lopencv_core -lopencv_imgcodecs -lopencv_imgproc (you may need to apt install libopencv-dev for the OpenCV includes to work)
8) Run the C++ executable: ./ort-decode which will print
Using Onnxruntime C API
Number of inputs = 1
Input 0 : name=images
Input 0 : type=1
Input 0 : num_dims=4
Input 0 : dim 0=1
Input 0 : dim 1=3
Input 0 : dim 2=68
Input 0 : dim 3=136
ORT first 3 pixels
0.168627
0.172549
0.156863
Called inference_session.cc:1054
ORT-C outputs
-0.152388
-0.278834
-0.122822
-0.277305
0.098833
-0.003909
-0.030183
0.163025
0.328772
0.277781
0.425203
0.529667
Done!
Expected behavior
The outputs from PyTorch, ORT-Py, and ORT-C should all be identical. As demonstrated by the Python/C++ scripts, the pixel values passed to the model are the same, yet the C API comes back with a different set of 12 values for the (1, 3, 2, 2) shaped output tensor
It would be very helpful to have a C API sample script that shows loading of actual image data since the current sample only uses dummy pixels at line 129. If you cannot directly help with loading image data so that the C and Python APIs return the same results, I would really appreciate guidance on how to better debug following my approach in the "Additional context" section
Screenshots
Sample image to be used as crop.jpg

Additional context
My assumption is that this is a user error related to passing the pixel values to the C API, so I have tried unsuccessfully to compile in debug prints to identify the exact values used in the Run(session, ...) call on ort-decode.cpp:113. My thought was to adapt the Run function definition in inference_session.cc:1054 to print the values of the feeds parameter. However, I am not familiar enough with the ORT library to know the appropriate method for this. The closest I got was
const OrtApi* g_ort = OrtGetApiBase()->GetApi(ORT_API_VERSION);
Status InferenceSession::Run(const RunOptions& run_options, const std::vector<std::string>& feed_names,
std::vector<OrtValue>& feeds, const std::vector<std::string>& output_names,
std::vector<OrtValue>* p_fetches) {
std::cout << "At inference_session.cc:1048" << std::endl;
float* input_data;
g_ort->GetTensorMutableData(&feeds[0], (void**)&input_data);
... rest of the function unchanged ...
However, this fails to compile because feeds is declared as a constant parameter in the function prototype, so I am stuck on how to access the underlying data of the input OrtValue
onnxruntime/onnxruntime/core/session/inference_session.cc:
In member function ‘onnxruntime::common::Status onnxruntime::InferenceSession::Run(const RunOptions&, const std::vector<std::__cxx11::basic_string<char> >&, const std::vector<OrtValue>&, const std::vector<std::__cxx11::basic_string<char> >&, std::vector<OrtValue>*)’:
onnxruntime/onnxruntime/core/session/inference_session.cc:1050:31:
error: invalid conversion from ‘const value_type* {aka const OrtValue*}’ to ‘OrtValue*’ [-fpermissive]
g_ort->GetTensorMutableData(&feeds[0], (void**)&input_data);
I believe I figured out the proper way to pass pixel data from OpenCV to your C API. To start, I compiled in extra print statements to inference_session.cc at the beginning of the InferenceSession::Run function defined on line 1054
Status InferenceSession::Run(const RunOptions& run_options, const std::vector<std::string>& feed_names,
const std::vector<OrtValue>& feeds, const std::vector<std::string>& output_names,
std::vector<OrtValue>* p_fetches) {
std::cout << "Input data from inference_session.cc:1054" << std::endl;
// From onnxruntime/core/framework/execution_frame.cc:195
std::vector<std::reference_wrapper<const TensorShape>> input_shapes;
input_shapes.reserve(feeds.size());
for (const auto& feed : feeds) {
auto& tensor = feed.Get<Tensor>();
std::cout << tensor.Shape() << std::endl;
input_shapes.push_back(std::cref(tensor.Shape()));
const float* data = tensor.Data<float>();
for (uint32_t i = 0; i < 6; i++)
std::cout << "data " << i << ": " << data[i] * 255 << std::endl;
}
If you compare the C/Python APIs with this added debug info, it is clear they are receiving different pixel values. Particularly, when running ort-capi.py, you can see that the pixel data comes through in _channel_ blocks. For instance, all the rows/columns for the blue channel, then green, and then red
blue[0, 0], blue[0, 1], blue[0, 2], ... blue[W, H],
green[0, 0], green[0, 1], ... green[W, H],
red[0, 0], ... red[W, H]
In comparison, using the data attribute of a cv::Mat object gives you the pixels in _spatial_ blocks. For instance, all channels for each spatial location sequentially
blue[0, 0], green[0, 0], red[0, 0],
blue[0, 1], green[0, 1], red[0, 1],
...
blue[W, H], green[W, H], red[W, H]
The solution is to format a flat array of channels blocks in ort-decode.cpp as shown in this OpenCV forum post
// Create expected ORT format (channel blocks by row): https://answers.opencv.org/question/64837
cv::Mat image_bgr = cv::imread("crop.jpg", cv::IMREAD_COLOR);
std::vector<cv::Mat> channels;
cv::split(image_bgr, channels);
cv::Mat image_by_channel;
for (size_t i=0; i<channels.size(); i++)
image_by_channel.push_back(channels[i]);
if (!image_by_channel.isContinuous())
image_by_channel = image_by_channel.clone();
for (size_t i = 0; i < input_tensor_size; i++)
input_tensor_values[i] = image_by_channel.data[i] / 255.0;
I think it would be very helpful for other users to add the above code into your C API sample script. Perhaps also some tutorials on how to access OrtValue and Tensor objects? It took me quite a while to find the appropriate class methods to print the internal values from InferenceSession::Run, and that information was key to debugging the proper pixel ordering
@addisonklinke seems like this might be an issue of NHWC vs NCHW order? The python example does torch.tensor(loaded).permute(2, 0, 1) after the cv image load to swap the order but it doesn't look like the C example does so.
Thanks @addisonklinke for the investigation and @pranav-prakash is spot on. It is NHWC vs NCHW. I think this is not API bound. It is the data order required by the ONNX model itself. If the ONNX model requires input data in the NCHW data order then the user is required to give it in that order. If any intermediate image parsing library (OpenCV here) were to provide data back in a different order than what is expected, the necessary "pre-processing" of the raw data parsed is required.
Closing as this is no longer an issue. Thanks for raising the issue.
Most helpful comment
I believe I figured out the proper way to pass pixel data from OpenCV to your C API. To start, I compiled in extra print statements to
inference_session.ccat the beginning of theInferenceSession::Runfunction defined on line 1054If you compare the C/Python APIs with this added debug info, it is clear they are receiving different pixel values. Particularly, when running
ort-capi.py, you can see that the pixel data comes through in _channel_ blocks. For instance, all the rows/columns for the blue channel, then green, and then redIn comparison, using the
dataattribute of acv::Matobject gives you the pixels in _spatial_ blocks. For instance, all channels for each spatial location sequentiallyThe solution is to format a flat array of channels blocks in
ort-decode.cppas shown in this OpenCV forum postI think it would be very helpful for other users to add the above code into your C API sample script. Perhaps also some tutorials on how to access
OrtValueandTensorobjects? It took me quite a while to find the appropriate class methods to print the internal values fromInferenceSession::Run, and that information was key to debugging the proper pixel ordering