One: [onert] Reducing copies of models' inputs/outputs

Created on 21 Apr 2020 · 7Comments · Source: Samsung/ONE

For the scenario #152 let's find out how many copies are performed.

The current nnfw_api(oneapi) requires for users to pass input/output tensor memory.

Prepare input buffer
Running Model 1
1. Copy input data from input buffer
2. Run the model
3. Copy output data to output buffer
Run Model 2 (the output buffer above is the input buffer below)
1. Copy input data from input buffer
2. Run the model
3. Copy output data to the buffer

% It turns out that there is not much difference with this scenario and running just one model

Ideas

To get rid of all 4 copies above do either one of these:

Revise API setInput/setOutput to get the internal memory buffer
Revise the runtime to allow using memory from outside

Constraints

This optimization is not always possible if:

layout mismatch
Internal tensor has paddings(alignment)
gpu memory is used
And some more...

Possible solutions w/ pros and cons

Based on Ideas above, let's think about its feasibility and pros/cons.

1. Revise API setInput/setOutput to get the internal memory buffer

Feasibility
- Looks feasible
Considerations
- If we support multiple async runs for a single session, it will be much more complicated

2. Revise the runtime to allow using memory from outside

Let's define a couple of terms.

Runtime MM(memory manager) : Runtime has a memory manager for all tensors and it passes memory handles to backends (what we want to do here)
Backend MM : Each backend has its own memory management system (As-is)

Feasibility
- Need to check this is available in ACL backends
Considerations
- Do we need to support both Backend MM and Runtime MM?
- If we still keep the as-is API, the user must give enough size of output tensors as we don't know the exact size of output tensors for dynamic models(Same issue with Android NN API)
- This is also mandatory if we want to optimize copying between backends (and possibly between subgraphs)

typdiscussion

Source

wateret

All 7 comments

Revise the runtime to allow using memory from outside

@wateret , in short, except constraint cases, isn't all this solved if only this single feature is supported?

lemmaa on 21 Apr 2020

@lemmaa Yes, as I stated "do either one of these". However that has huge amount of work. I will update "Possible solutions" section soon.

wateret on 22 Apr 2020

👍1

Need to check this is available in ACL backends

If you have a plan with using inputs/output of models as allocated tensor memory by our runtime, there are two issues.

Padding
If models are compiled separately, the padding of prev model's output and next model's input can be different.
Dynamic tensor
If shape of inputs changes every execution time, our runtime must sequentially call configure methods of all layers and do planning memory size of tensors in every execution time.

ragmani on 22 Apr 2020

@ragmani

Padding
If models are compiled separately, the padding of prev model's output and next model's input can be different.

Yes, just like I mentioned in Constraints section. This optimization is not always possible. For that case we may need to put a conversion operation. The thing is, By doing "Revise the runtime to allow using memory from outside", we give the runtime a chance not to copy inputs. Then if the conditions(like no paddings) are not met, we must perform copy. Meanwhile now we always copy inputs.

Dynamic tensor
If shape of inputs changes every execution time, our runtime must sequentially call configure methods of all layers and do planning memory size of tensors in every execution time.

As we discussed offline sometime before, yes we will do that. It is ACL(CL)'s nature so I don't see any other solution.

wateret on 23 Apr 2020

@wateret

Yes, just like I mentioned in Constraints section. This optimization is not always possible.

If inputs/outputs are dynamic tensor, those tensors's padding can be changed every execution time. In other words, Some tensors may have padding in the second, even if those tensors didn't have padding in the first. we have to also consider about this to support reducing copies of models.

ragmani on 23 Apr 2020

👍1

@ragmani I see, that means we have another constraint - dynamic tensors are not supported. Or didn't I get you right?

wateret on 23 Apr 2020

@wateret
That's not what I mean. If tensor does have padding, there is no problem. So I think you can apply reducing copies for dynamic tensors of cpu backend.

ragmani on 23 Apr 2020

😄1

Was this page helpful?

0 / 5 - 0 ratings