Describe the bug
I use DynamicSlice (a result of using torch.jit.script for my CenterCrop).
And even though I change the input dimension to ?:
model = onnx.load("model.onnx")
model.graph.input[0].type.tensor_type.shape.dim[2].dim_param = '?'
model.graph.input[0].type.tensor_type.shape.dim[3].dim_param = '?'
onnx.save(model, "model.onnx")
For some input images I get this warning:
2019-04-23 13:52:11.9230541 [W:onnxruntime:CSharpOnnxRuntime, execution_frame.cc:283 onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper] For mlvalue with index: 128, block in memory pattern size is: 1032192 but the actually size is: 688128, fall back t o default allocation behavior
First, the actually size is is not correct sentence
Second, how can I see what is the mlvalue with index 128 so that I can manually change it to ? to avoid warning?
Third, after inference is done on few images I see this warning: so it seems like Run is not state-less. Which makes me worry more than the actual warning. i.e. I do not see the error only if i run the inference once on any image, but see it after few runs when the sizes change. How is this thread-safe?
System information
To Reproduce
After inference is done on few images I see this warning
Expected behavior
To be stateless, and to work with dynamic input sizes
To Reproduce
import numpy as np
import torch
import torch.nn as nn
import onnxruntim as rt
Use the attached onnx model boz.zip, or define these modules:
@torch.jit.script
def center_slice_helper(x, h_offset, w_offset, h_end, w_end):
return x[:, :, h_offset:h_end, w_offset:w_end]
class CenterCrop(nn.Module):
def __init__(self, crop_size):
"""Crop from the center of a 4d tensor
Input shape can be dynamic
:param crop_size: the center crop size
"""
super(CenterCrop, self).__init__()
self.crop_size = crop_size
def extra_repr(self):
"""Extra information
"""
return 'crop_size={}'.format(
self.crop_size
)
def forward(self, x):
h_offset = (x.shape[2] - self.crop_size) / 2
w_offset = (x.shape[3] - self.crop_size) / 2
h_end = h_offset + self.crop_size
w_end = w_offset + self.crop_size
return center_slice_helper(x, h_offset, w_offset, h_end, w_end)
class Add(nn.Module):
def __init__(self):
"""Add
"""
super(Add, self).__init__()
def forward(self, x):
return x.float().sum()
and export:
model = nn.Sequential(CenterCrop(224), Add())
dummy_input = torch.randn(1, 3, 300, 256, device='cpu').byte()
torch.onnx.export(dep_model, dummy_input, "boz.onnx", verbose=True, input_names=['data'], output_names=["output"])
# this does not change the result but is in general necessary for dynamic sizes
import onnx
model = onnx.load("boz.onnx")
model.graph.input[0].type.tensor_type.shape.dim[2].dim_param = '?'
model.graph.input[0].type.tensor_type.shape.dim[3].dim_param = '?'
onnx.save(model, "boz.onnx")
Now to test:
im2 = np.random.rand(1, 3, 384, 256).astype('uint8')
im = np.random.rand(1, 3, 256, 384).astype('uint8')
First:
sess = rt.InferenceSession("boz.onnx")
output = sess.run(['output'], {'data': im})[0]
output = sess.run(['output'], {'data': im2})[0]
Results in this in the second line:
2019-04-23 15:14:50.442041699 [W:onnxruntime:Default, execution_frame.cc:283 AllocateMLValueTensorSelfOwnBufferHelper] For mlvalue with index: 26, block in memory pattern size is: 258048 but the actually size is: 172032, fall back to default allocation behavio
And reversing the order
sess = rt.InferenceSession("boz.onnx")
output = sess.run(['output'], {'data': im2})[0]
output = sess.run(['output'], {'data': im})[0]
Results in this error also in the second line!
2019-04-23 15:23:44.705329709 [W:onnxruntime:Default, execution_frame.cc:283 AllocateMLValueTensorSelfOwnBufferHelper] For mlvalue with index: 26, block in memory pattern size is: 172032 but the actually size is: 258048, fall back to default allocation behavior
This shows that the order matters, and Run is not stateless. What about thread-safety?
It's not stateless. By design it should be thread-safe.
Hi @souptc , could you answer his question on memory planner?
Yes in onnxruntime, to optimize the memory allocation, we use a memory planner to trace the allocation.
At that time, onnx does not have any dynamic ops, that means the memory allocation plan could be decided by the input shapes. So once we hit a future request with same shape as previously request, we try to reuse previous memory allocation pattern.
But after we introduce those dynamic ops, this is not true anymore, the memory size not only depends on the input shape, but also depends on the data. The warning you saw is because runtime detect the actual memory request does not match the previous memory plan, so it fall back to request a new block of memory.
We should disable the memory planner for those models with dynamic ops.
We should disable the memory planner for those models with dynamic ops.
With it enabled, does it affect thread-safety in the current code? if it does, can I disable it manually?
Also, as a suggestion: can it be disabled after a certain ops? for example the network may start with a dynamic part (e.g. before the CenterCrop) but become fixed-size afterwards (like after `CenterCrop has cropped the input to 224 in the model attached)?
if it is enabled, it does not affect thread-safety, each request has its own allocation strategy. It may waste some allocated resource, but no impact for function correctness.
We used to have a session option that user could chose to disable it manually. But later it got removed. @snnn , maybe we should add it back?
Yes, we could investigate more flexible strategies like what you mentioned that share the memory pattern for a subgraph. it depends how much gain we could get from this approach.
@souptc, it's already added back. #872
Thanks
Most helpful comment
@souptc, it's already added back. #872