Onnxruntime: mlvalue default allocation warning, not stateless

Created on 24 Apr 2019 · 8Comments · Source: microsoft/onnxruntime

Describe the bug
I use DynamicSlice (a result of using torch.jit.script for my CenterCrop).

And even though I change the input dimension to ?:

model = onnx.load("model.onnx")
model.graph.input[0].type.tensor_type.shape.dim[2].dim_param = '?'
model.graph.input[0].type.tensor_type.shape.dim[3].dim_param = '?'
onnx.save(model, "model.onnx")

For some input images I get this warning:

2019-04-23 13:52:11.9230541 [W:onnxruntime:CSharpOnnxRuntime, execution_frame.cc:283 onnxruntime::ExecutionFrame::AllocateMLValueTensorSelfOwnBufferHelper] For mlvalue with index: 128, block in memory pattern size is: 1032192 but the actually size is: 688128, fall back t o default allocation behavior

First, the actually size is is not correct sentence
Second, how can I see what is the mlvalue with index 128 so that I can manually change it to ? to avoid warning?
Third, after inference is done on few images I see this warning: so it seems like Run is not state-less. Which makes me worry more than the actual warning. i.e. I do not see the error only if i run the inference once on any image, but see it after few runs when the sizes change. How is this thread-safe?

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16, and Windows
ONNX Runtime installed from (source or binary): pip/nuget
ONNX Runtime version: 0.3.0
Python version: 0.3.5
Visual Studio version (if applicable): 2015

To Reproduce
After inference is done on few images I see this warning

Expected behavior
To be stateless, and to work with dynamic input sizes

Source

dashesy

Most helpful comment

@souptc, it's already added back. #872

snnn on 27 Apr 2019

👍2

All 8 comments

To Reproduce

import numpy as np
import torch
import torch.nn as nn
import onnxruntim as rt

Use the attached onnx model boz.zip, or define these modules:

@torch.jit.script
def center_slice_helper(x, h_offset, w_offset, h_end, w_end):
    return x[:, :, h_offset:h_end, w_offset:w_end]


class CenterCrop(nn.Module):
    def __init__(self, crop_size):
        """Crop from the center of a 4d tensor
        Input shape can be dynamic
        :param crop_size: the center crop size
        """
        super(CenterCrop, self).__init__()
        self.crop_size = crop_size

    def extra_repr(self):
        """Extra information
        """
        return 'crop_size={}'.format(
            self.crop_size
        )

    def forward(self, x):
        h_offset = (x.shape[2] - self.crop_size) / 2
        w_offset = (x.shape[3] - self.crop_size) / 2
        h_end = h_offset + self.crop_size
        w_end = w_offset + self.crop_size
        return center_slice_helper(x, h_offset, w_offset, h_end, w_end)


class Add(nn.Module):
    def __init__(self):
        """Add
        """
        super(Add, self).__init__()

    def forward(self, x):
        return x.float().sum()

and export:

model = nn.Sequential(CenterCrop(224), Add())
dummy_input = torch.randn(1, 3, 300, 256, device='cpu').byte()
torch.onnx.export(dep_model, dummy_input, "boz.onnx", verbose=True, input_names=['data'], output_names=["output"])

# this does not change the result but is in general necessary for dynamic sizes

import onnx

model = onnx.load("boz.onnx")
model.graph.input[0].type.tensor_type.shape.dim[2].dim_param = '?'
model.graph.input[0].type.tensor_type.shape.dim[3].dim_param = '?'
onnx.save(model, "boz.onnx")

Now to test:

im2 = np.random.rand(1, 3, 384, 256).astype('uint8')
im = np.random.rand(1, 3, 256, 384).astype('uint8')

First:

sess = rt.InferenceSession("boz.onnx")

output =  sess.run(['output'], {'data': im})[0]
output =  sess.run(['output'], {'data': im2})[0]

Results in this in the second line:

2019-04-23 15:14:50.442041699 [W:onnxruntime:Default, execution_frame.cc:283 AllocateMLValueTensorSelfOwnBufferHelper] For mlvalue with index: 26, block in memory pattern size is: 258048 but the actually size is: 172032, fall back to default allocation behavio

And reversing the order

sess = rt.InferenceSession("boz.onnx")

output =  sess.run(['output'], {'data': im2})[0]
output =  sess.run(['output'], {'data': im})[0]

Results in this error also in the second line!

2019-04-23 15:23:44.705329709 [W:onnxruntime:Default, execution_frame.cc:283 AllocateMLValueTensorSelfOwnBufferHelper] For mlvalue with index: 26, block in memory pattern size is: 172032 but the actually size is: 258048, fall back to default allocation behavior

This shows that the order matters, and Run is not stateless. What about thread-safety?

dashesy on 24 Apr 2019

It's not stateless. By design it should be thread-safe.

snnn on 24 Apr 2019

👍2

Hi @souptc , could you answer his question on memory planner?

snnn on 24 Apr 2019

Yes in onnxruntime, to optimize the memory allocation, we use a memory planner to trace the allocation.
At that time, onnx does not have any dynamic ops, that means the memory allocation plan could be decided by the input shapes. So once we hit a future request with same shape as previously request, we try to reuse previous memory allocation pattern.
But after we introduce those dynamic ops, this is not true anymore, the memory size not only depends on the input shape, but also depends on the data. The warning you saw is because runtime detect the actual memory request does not match the previous memory plan, so it fall back to request a new block of memory.
We should disable the memory planner for those models with dynamic ops.

souptc on 24 Apr 2019

We should disable the memory planner for those models with dynamic ops.

With it enabled, does it affect thread-safety in the current code? if it does, can I disable it manually?

Also, as a suggestion: can it be disabled after a certain ops? for example the network may start with a dynamic part (e.g. before the CenterCrop) but become fixed-size afterwards (like after `CenterCrop has cropped the input to 224 in the model attached)?

dashesy on 24 Apr 2019

if it is enabled, it does not affect thread-safety, each request has its own allocation strategy. It may waste some allocated resource, but no impact for function correctness.
We used to have a session option that user could chose to disable it manually. But later it got removed. @snnn , maybe we should add it back?

Yes, we could investigate more flexible strategies like what you mentioned that share the memory pattern for a subgraph. it depends how much gain we could get from this approach.

souptc on 27 Apr 2019

👍1

@souptc, it's already added back. #872

snnn on 27 Apr 2019

👍2

Thanks

dashesy on 27 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings