One: [onert] Discussion : Introduce option to select kernel type in cpu backend

Created on 2 Nov 2020 · 19Comments · Source: Samsung/ONE

Issue

In most cases, onert uses one fastest kernel for each operation in cpu backend. For example, Conv2D (FP32) uses Eigen and FullyConnected operation with weight quantize uses ruy library.

This policy fits well until now, but I found a counter-example. When I tried to use ruy library for FullyConnected (FP32) layer, a benchmark shows that ruy library is faster than current kernel only in some cases. https://github.com/Samsung/ONE/issues/4482#issuecomment-704798829

Some models become 3x faster, while others become 3x slower.

Internal issue for profile result : https://github.sec.samsung.net/STAR/nnfw/issues/11818

I think it is better to support multiple kernels for one operation in onert and use one of them depending on each model.

Suggestion

Introduce OP_KERNEL_MAP environmental variable to select kernel type
- This variable is for testing only and its format is the same as OP_BACKEND_TYPE
- ex) OP_KERNEL_MAP="2=ruy;5=neon"
- Operation 2 uses ruy library and operation 5 uses neon library
onert uses selected kernel type if it is given by OP_KERNEL_MAP

Any suggestions are welcome!

/cc @Samsung/nnfw

typdiscussion

Source

periannath

Most helpful comment

Let's summarize this discussion here.

Format of `OP_KERNEL_MAP`

Op = Kernel : 2=ruy;5=neon
Kernel = Op : ruy=2,3,10,11;neon=5,6,7

I think option 1 would be better for consistency https://github.com/Samsung/ONE/issues/4869#issuecomment-721136372

How to implement new kernel?

As kernel inside cpu backend
- Easier implementation of runtime kernel selection
  - Kernel selection inside KernelGenerator or *Layer
- Duplications for a same feature
As new backend
- Reuse of current backend API
- No runtime backend assignment (Need to implement)

We're going to introduce some kernels as backend. (RUY backend #4863 and XNNPACK backend #4968)
Maybe it is better to support infra for runtime backend switching.

periannath on 10 Nov 2020

👍3

All 19 comments

As there can be lots of Ops so, we can think of another way like ruy=2,3,10,11;neon=5,6,7
I don't know which would be better...

seanshpark on 2 Nov 2020

Just write down what I think of.

As there can be lots of Ops so, we can think of another way like ruy=2,3,10,11;neon=5,6,7
I don't know which would be better...

I have a similar idea. We might think in terms of expanding the way we use it now. If acl_cl means mapping acl_cl to all operations, it refers to a part in the same manner as acl_cl=2,3,4,11 or acl_cl=2-4,11. Anyway, it's just one of an opinion. :)

Introduce OP_KERNEL_MAP environmental variable to select kernel type

This variable is for testing only and its format is the same as OP_BACKEND_TYPE

Need to limit this to testing only? Isn't the purpose of the test eventually to find the best combination of kernels for each model? When the optimal combination is found, it would be good if the manifest file of the NN package was used so that the information could be used as a hint at runtime at runtime. (Of course, if this is far from the original purpose of this issue, it could be discussed separately.)

ex) OP_KERNEL_MAP="2=ruy;5=neon"

Operation 2 uses ruy library and operation 5 uses neon library

Trivial, but we also have to think about the meaning of ;. Although that is not intended, in use so far the ; identified fall-back targets. Would it mean the same fall-back here? Wouldn't it be better to use different symbols for different purposes? For example, : or so.

lemmaa on 2 Nov 2020

We also need to consider control flow.

FYI,
In the following code, mul's input shape changes in while loop
and input shape of first calling of mul is small but those of 100th calling of mul become huge.
So we should also consider some format like, "n,m=ruy" where _n_ = operation index, m is _m_ th calling of operation _n_

x = some tensor of shape [2, 2]
WHILE 100 times (input of WHILE is x)
  y = concat (x, x)
  z = mul(y, y)
  (output of WHILE op is z, which will be used as input x in next round of WHILE op)

hyunsik-yoon on 2 Nov 2020

@seanshpark @lemmaa Thanks! ruy=2,3,10,11;neon=5,6,7 looks better. Some models have a lot of operations, and this representation seems more suitable to support them.

periannath on 3 Nov 2020

@lemmaa

Need to limit this to testing only? Isn't the purpose of the test eventually to find the best combination of kernels for each model? When the optimal combination is found, it would be good if the manifest file of the NN package was used so that the information could be used as a hint at runtime at runtime. (Of course, if this is far from the original purpose of this issue, it could be discussed separately.)

I said this variable as testing because it cannot be used on applications. We need other ways such as manifest entry in nnpackage or nnfw API to support kernel selection if we want to use it on the application.

Trivial, but we also have to think about the meaning of ;. Although that is not intended, in use so far the ; identified fall-back targets. Would it mean the same fall-back here? Wouldn't it be better to use different symbols for different purposes? For example, : or so.

I didn't think about fall-back targets. This variable just assumes that selected kernel is available. ; is used as separator of operations. : or ; looks same for me.

periannath on 3 Nov 2020

@hyunsik-yoon Wow... That's something I never thought of.
Define environmental variable as "n,m=ruy" can handle WHILE operation but it can't handle IF operation. Considering control flow looks way more complicated than expected.

periannath on 3 Nov 2020

👀1

@periannath , agree, it would be nice to start simply for experimental purposes. It's a good idea to leave notes on assumptions to avoid unnecessary confusion or future improvement. :)

I said this variable as testing because it cannot be used on applications. We need other ways such as manifest entry in nnpackage or nnfw API to support kernel selection if we want to use it on the application.

I think little different. Although this is said to be started for experimental purposes, it can be applied to the actual application by adding a simple spec to the NN package. Of course not perfect. It is enough to add many configurations passed as environment variables in the command line to the manifest of the NN package as a single line so that the runtime can interpret them. (Of course, this is separate from this issue. It's just a shared thought, so don't mind)

I didn't think about fall-back targets. This variable just assumes that selected kernel is available. ; is used as separator of operations. : or ; looks same for me.

Yes. It's a trivial matter right now. I only hope that our notation will have a unified meaning in the future. :)

lemmaa on 3 Nov 2020

@periannath @lemmaa

Looks good for testing purpose. But I also see @lemmaa 's comment:

I think little different. Although this is said to be started for experimental purposes, it can be applied to the actual application by adding a simple spec to the NN package.

And I would like us to think about these.

Create ruy backend - then we can reuse OP_BACKEND_{OPNAME} (This way is the most proper way according to my initial design)
Can we make cpu backend KernelGenerator smart enough to choose the better kernel considering the op's params and operands?

IMHO as we already have two places to manipulate kernel selection, I would like not to introduce another kernel-selection mechanism, for the sake of simplicity.

And one more thing I came up:

cpu backend is not part of onert, it is just same as any other backends - it could not be present in some environments. So I'm not sure if it can be included in NN Package spec.
- Maybe we could have it as hints, so the runtime can ignore it if unavailable
I'm not sure what this exactly means - OP_KERNEL_MAP="2=ruy;5=neon", as ruy is a kernel neon is a backend

wateret on 3 Nov 2020

About the syntax, "n,m=ruy" I don't mind any syntax, but I wish all are consistent.

; is used as separator for all option variables, so if it is changed in an option we should change all
OP_KERNEL_MAP and OP_BACKEND_MAP should have an identical syntax
As you all mentioned, it does not work correctly for the graphs that have subgraphs(Say we have 2=cpu, then all operation ID 2 in all subgraphs are going to be assigned to cpu)

wateret on 3 Nov 2020

@wateret

Create ruy backend - then we can reuse OP_BACKEND_{OPNAME} (This way is the most proper way according to my initial design)

I thought about that design. AFAIK, may codes such as KernelGenerator, StaticTensorManager, and TensorBuilder should be duplicated from cpu backend to create ruy as new backend. Not only ruy, but also other libraries such as XNNPACK can be used. Every time a backend is added, the total amount of code and its management code will increased.

Can we make cpu backend KernelGenerator smart enough to choose the better kernel considering the op's params and operands?

We can make some heuristic to select proper kernel considering op's params and operands. However, that heuristic will be created by profiling kernels on some specific devices. It may not work on some other devices. If we have profile data, we can choose kernels that produce fastest inference time on target device.

cpu backend is not part of onert, it is just same as any other backends - it could not be present in some environments. So I'm not sure if it can be included in NN Package spec.

Maybe we could have it as hints, so the runtime can ignore it if unavailable

I didn't think about environment where cpu backend is not available. As you said, considering it as hint would be better.

I'm not sure what this exactly means - OP_KERNEL_MAP="2=ruy;5=neon", as ruy is a kernel neon is a backend

I use neon to express neon kernel of cpu backend. Some kernels use neon instruction and others use just C++ code only. I just want to distinguish two kernel types in current cpu backend.

periannath on 4 Nov 2020

I'd like to share short note about the discussion with @periannath offline yesterday:

// x and y are inputs, and model output is 'mul(x, y)'
nnfw_prepare()
nnfw_set_input_tensorinfo( x, to some shape)
nnfw_set_input_tensorinfo( y, to some shape)
nnfw_run()

In some case, x and y are small. In some case x and y is large. In such cases, the exact shape can be known only at runtime and kernels can be selected only at runtime.
(We need some additional kernel-mapping table that can be used at runtime.)

hyunsik-yoon on 4 Nov 2020

👀1

@periannath , is your thought like this?

OP_BACKEND_TYPE will let scheduler choose from the items given. this value contains candidates to choose which backend
OP_KERNEL_MAP will specifically inform which cpu backend kernel for the specific operator. if the scheduler choose not cpu then the value will be ignored

If OP_KERNEL_MAP is not just for cpu backend, the we can think of specific kernel name for each operators

do we have an implementation for this in current scheduler? or you are going to introduce such logic?

seanshpark on 4 Nov 2020

@periannath @lemmaa

What I'm concerned the most is if we can have these things that meet these conditions:

No duplications for a same feature (many places to control kernel selection, described https://github.com/Samsung/ONE/issues/4869#issuecomment-721130880)
Simple and flawless logic (described below)

For details:

cpu backend is just a backend like others. onert core does not handle it specially. (there is some code that requires cpu backend, but it is temporary workaround)
- It is a separate module which is an independent library from onert core
- We could change how we treat cpu backend if necessary, though
cpu backend implementation can change
- cpu backend could change there implementation. e.g.) it used a ruy kernel but later it can be removed
- For the same codebase, the compilation result could be different. e.g.) depend on the compiler flag like #ifdef USE_RUY_GEMV
- This is why I would have it as just hints (if we have to introduce it)

wateret on 4 Nov 2020

@seanshpark

@periannath , is your thought like this?

* `OP_BACKEND_TYPE` will let scheduler choose from the items given. this value contains candidates to choose which backend

* `OP_KERNEL_MAP` will specifically inform which `cpu` backend kernel for the specific operator. if the scheduler choose not `cpu` then the value will be ignored

Yes it is.

* do we have an implementation for this in current scheduler? or you are going to introduce such logic?

We don't have any logic to choose kernel type in current scheduler. When this option is introduced, I will use it in KernelGenerator to choose kernel of FullyConnected layers in some models.

periannath on 4 Nov 2020

@periannath

Create ruy backend - then we can reuse OP_BACKEND_{OPNAME} (This way is the most proper way according to my initial design)

I thought about that design. AFAIK, may codes such as KernelGenerator, StaticTensorManager, and TensorBuilder should be duplicated from cpu backend to create ruy as new backend. Not only ruy, but also other libraries such as XNNPACK can be used. Every time a backend is added, the total amount of code and its management code will increased.

Agree. With the current backend API, there will be a lot of work. But what if we simplified the backend API? (I have been having discussions with @YongseopKim , soon I will raise an issue.)

Can we make cpu backend KernelGenerator smart enough to choose the better kernel considering the op's params and operands?

Just to clarify this, I meant KernelGenerator and ***Layer(for dynamic tensor cases).

wateret on 4 Nov 2020

* do we have an implementation for this in current scheduler? or you are going to introduce such logic?
We don't have any logic to choose kernel type in current scheduler. When this option is introduced, I will use it in KernelGenerator to choose kernel of FullyConnected layers in some models.

As I see this now, introducing kernel choosing API would also complicate automatic scheduler implementations.

wateret on 4 Nov 2020

😕1

@wateret

Agree. With the current backend API, there will be a lot of work. But what if we simplified the backend API? (I have been having discussions with @YongseopKim , soon I will raise an issue.)

If backend implementation will be simplified, it would be better to use current backend API. Then we can reuse HEScheduler for kernel scheduling.

Added) I have one question. Can we change backend assigned to operation during runtime? We may need to change kernel assigned to operation during runtime considering runtime status.

periannath on 5 Nov 2020

Can we change backend assigned to operation during runtime? We may need to change kernel assigned to operation during runtime considering runtime status.

No. 😢 We should implement the whole infra for that.

There is an old thread in our old repo - https://github.sec.samsung.net/STAR/nnfw/issues/1197#issuecomment-90097

CPU/GPU static scheduling + static execution (Goal by end of June)

CPU/GPU adaptive scheduling + static execution

CPU/GPU static/adaptive scheduling + CPU/GPU co-execution

CPU/GPU adaptive scheduling + CPU/GPU adaptive execution

What you are saying is 2 and 4, and those are never discussed since then.

wateret on 5 Nov 2020

👍2

Let's summarize this discussion here.

Format of `OP_KERNEL_MAP`

Op = Kernel : 2=ruy;5=neon
Kernel = Op : ruy=2,3,10,11;neon=5,6,7

I think option 1 would be better for consistency https://github.com/Samsung/ONE/issues/4869#issuecomment-721136372

How to implement new kernel?

As kernel inside cpu backend
- Easier implementation of runtime kernel selection
  - Kernel selection inside KernelGenerator or *Layer
- Duplications for a same feature
As new backend
- Reuse of current backend API
- No runtime backend assignment (Need to implement)

We're going to introduce some kernels as backend. (RUY backend #4863 and XNNPACK backend #4968)
Maybe it is better to support infra for runtime backend switching.

periannath on 10 Nov 2020

👍3

Was this page helpful?

0 / 5 - 0 ratings

Related issues

common-artifacts unnecessary build actions

seanshpark · 3Comments

Checklist for 1.9.1 compiler release

mhs4670go · 3Comments

[infra/Android] Using gold linker for android build

periannath · 3Comments

[one-cmds] one-build failed with error message

YongseopKim · 3Comments

[Wiki] Update ONE-DCO Signed-off-by signature

kishcs · 3Comments

One: [onert] Discussion : Introduce option to select kernel type in cpu backend

Issue

Suggestion

Most helpful comment

Format of OP_KERNEL_MAP

How to implement new kernel?

All 19 comments

Format of OP_KERNEL_MAP

How to implement new kernel?

Related issues

Format of `OP_KERNEL_MAP`

Format of `OP_KERNEL_MAP`