For the scenario #152 let's find out how many copies are performed.
The current nnfw_api(oneapi) requires for users to pass input/output tensor memory.
% It turns out that there is not much difference with this scenario and running just one model
To get rid of all 4 copies above do either one of these:
This optimization is not always possible if:
Based on Ideas above, let's think about its feasibility and pros/cons.
Let's define a couple of terms.
Runtime MM(memory manager) : Runtime has a memory manager for all tensors and it passes memory handles to backends (what we want to do here)
Backend MM : Each backend has its own memory management system (As-is)
- Revise the runtime to allow using memory from outside
@wateret , in short, except constraint cases, isn't all this solved if only this single feature is supported?
@lemmaa Yes, as I stated "do either one of these". However that has huge amount of work. I will update "Possible solutions" section soon.
Need to check this is available in ACL backends
If you have a plan with using inputs/output of models as allocated tensor memory by our runtime, there are two issues.
configure methods of all layers and do planning memory size of tensors in every execution time.@ragmani
- Padding
If models are compiled separately, the padding of prev model's output and next model's input can be different.
Yes, just like I mentioned in Constraints section. This optimization is not always possible. For that case we may need to put a conversion operation. The thing is, By doing "Revise the runtime to allow using memory from outside", we give the runtime a chance not to copy inputs. Then if the conditions(like no paddings) are not met, we must perform copy. Meanwhile now we always copy inputs.
- Dynamic tensor
If shape of inputs changes every execution time, our runtime must sequentially callconfiguremethods of all layers and do planning memory size of tensors in every execution time.
As we discussed offline sometime before, yes we will do that. It is ACL(CL)'s nature so I don't see any other solution.
@wateret
Yes, just like I mentioned in Constraints section. This optimization is not always possible.
If inputs/outputs are dynamic tensor, those tensors's padding can be changed every execution time. In other words, Some tensors may have padding in the second, even if those tensors didn't have padding in the first. we have to also consider about this to support reducing copies of models.
@ragmani I see, that means we have another constraint - dynamic tensors are not supported. Or didn't I get you right?
@wateret
That's not what I mean. If tensor does have padding, there is no problem. So I think you can apply reducing copies for dynamic tensors of cpu backend.