One: [luci-interpreter] Operators to enable

Created on 20 May 2020 · 15Comments · Source: Samsung/ONE

This issue shows the status of operators supported in luci-interpreter. I've listed the operators in three popular models: mobilenetV2 (M), inceptionV3 (I), and ResNet50 (R) (ResNet is a top priority for post-training quantization #696).

Most of the operators in the list are already supported in the draft of luci-interpreter (#205), but I found that Mean and Pad were not implemented yet. For those who want to contribute to luci-interpreter, it would be more helpful to support them first.

| Operators | Status (float) | Status (u8) | Model |
| ------------- | ------------- | ------------- | ------------- |
Add | O @s-barannikov | O @s-barannikov | R, M
AvgPool2D | O @s-barannikov | O @s-barannikov | M, I
Concat | O @s-barannikov | O @s-barannikov | I
Conv2D | O @s-barannikov | O @s-barannikov | R, M, I
DepthwiseConv2D | O @s-barannikov | O @s-barannikov | M
FullyConnected | O @s-barannikov | | I
MaxPool2D | O @s-barannikov | O @s-barannikov | R, I
Mean | @karthik-pen (#1669) | @karthik-pen (#1669) | R
Pad | O @s-barannikov (#1509) | O @s-barannikov (#1509) | R
Reshape | O @s-barannikov | O @s-barannikov | M
Softmax | O @s-barannikov | | M, I
ArgMax | @struss (#1691 ) | @struss (#1691 ) | R

Update 2020/05/27: ArgMax operator was added to the table.

Source

jinevening

👍3 ❤1

Most helpful comment

All operators in ResNet50, InceptionV3, MobileNetV2 are now supported 👏 .

Look at the above table to see the execution time of each model.

Thanks! @s-barannikov @karthik-pen @struss

jinevening on 3 Jun 2020

🎉5

All 15 comments

@jinevening Can I take up the implementation of Mean?

karthik-pen on 22 May 2020

@karthik-pen Sure. Please add me, @binarman , and @s-barannikov as reviewers.

You can learn how to add kernels from @s-barannikov 's PRs titled "[luci-interpreter] Add ~~ kernel" link. Since we're using tflite's kernel, please see tflite kernel implementation for mean.

jinevening on 22 May 2020

👍1

I will implement Pad.

s-barannikov on 22 May 2020

👍1

@karthik-pen , @s-barannikov , what if you write your github id in the appropriate cell of the table and mark it in progress? :)

lemmaa on 25 May 2020

Execution time of each model in luci-interpreter
(measured in Ubuntu 18.04 x86 desktop equipped with i7-9700 3.0 GHz)

Model | Debug mode | Release mode
-- | -- | --
InceptionV3 | 114 sec | ~~5.2~~ 0.8 sec
MobileNetV2 | 6.9 sec | ~~0.4~~ 0.1 sec
ResNet50 | 24 sec | 0.7 sec

~~(ResNet50 will be updated later)~~ ResNet50 result was added 2020/06/03

luci-interpreter will run ~1,000 data samples to profile moving avg of min/max values for post-training quantization #696. For InceptionV3, the execution time for 1,000 data is ~5200 sec (~1 hour 27 min). Even considering the time to record min/max values, I expect that the profiling would finish within several hours.

Update (2020/05/26): Thanks to @s-barannikov 's work (#1438), the execution time for InceptionV3 was reduced from 5.2s to 0.8s. Now, the profiling may take just ~15 minutes 👍 .

jinevening on 25 May 2020

👍3 🎉1 😄1

From what I've seen, most of the time is spent on Conv2D. At least that operation needs to use optimized kernel (instead of reference), but it requires some additional steps to take (allocate Im2Col tensor and possibly create CpuContext to enable parallelism).

s-barannikov on 25 May 2020

@jinevening BTW Is it the time spent on "interpret()" call only?

s-barannikov on 25 May 2020

BTW Is it the time spent on "interpret()" call only?

@s-barannikov No, I measured the whole time spent for running luci_eval_tester. This includes the time to read input data from the file, run the interpreter, write the output to the file.

$ time build/release/compiler/luci-value-test/tester/luci_eval_tester build/release/compiler/luci-value-test/inception_v3.circle 1 build/release/compiler/luci-value-test/inception_v3.circle.input build/release/compiler/luci-value-test/inception_v3.circle.output

real    0m0.773s
user    0m0.633s
sys     0m0.140s

jinevening on 26 May 2020

real 0m0.773s

Looks better!

@jinevening did you try run network in multithread mode?

@s-barannikov how to enable parallel execution?

binarman on 26 May 2020

@jinevening did you try run network in multithread mode?

No. I'm not sure if the Conv2D kernel runs with multi threads.

jinevening on 26 May 2020

@s-barannikov how to enable parallel execution?

There is no way to control it yet. Quantized op is always parallelized and floating kernel is single-threaded.

s-barannikov on 26 May 2020

@karthik-pen Can you tell me the progress of Mean?

jinevening on 29 May 2020

@karthik-pen Can you tell me the progress of Mean?

I will post the draft in some time. Sorry for the delay. Here it is: #1665

Edit: Created new PR #1669 for this.

karthik-pen on 29 May 2020

All operators in ResNet50, InceptionV3, MobileNetV2 are now supported 👏 .

Look at the above table to see the execution time of each model.

Thanks! @s-barannikov @karthik-pen @struss

jinevening on 3 Jun 2020

🎉5

I close this issue because all of the target operators are supported and no more issues have been raised.

jinevening on 10 Jun 2020

🎉4

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Channel-wise INT16 quantization support

jinevening · 3Comments

[res] Add U8 recipe: GREATER

underflow101 · 4Comments

common-artifacts unnecessary build actions

seanshpark · 3Comments

Compiler FE: Direct ONNX support

lucenticus · 3Comments

Compiler FE: support Shape op in luci-interpreter

mhs4670go · 4Comments