One: [luci-interpreter] Operators to enable

Created on 20 May 2020  路  15Comments  路  Source: Samsung/ONE

This issue shows the status of operators supported in luci-interpreter. I've listed the operators in three popular models: mobilenetV2 (M), inceptionV3 (I), and ResNet50 (R) (ResNet is a top priority for post-training quantization #696).

Most of the operators in the list are already supported in the draft of luci-interpreter (#205), but I found that Mean and Pad were not implemented yet. For those who want to contribute to luci-interpreter, it would be more helpful to support them first.

| Operators | Status (float) | Status (u8) | Model |
| ------------- | ------------- | ------------- | ------------- |
Add | O聽@s-barannikov | O @s-barannikov 聽 | R, M
AvgPool2D | O @s-barannikov | O @s-barannikov | M, I
Concat | O @s-barannikov 聽 | O @s-barannikov 聽 | I
Conv2D | O @s-barannikov 聽 | O @s-barannikov 聽 | R, M, I
DepthwiseConv2D | O @s-barannikov 聽 | O @s-barannikov 聽 | M
FullyConnected | O @s-barannikov 聽 | 聽 | I
MaxPool2D | O @s-barannikov 聽 | O @s-barannikov 聽 | R, I
Mean | @karthik-pen (#1669)聽 | @karthik-pen (#1669) | R
Pad | O @s-barannikov (#1509) | O @s-barannikov (#1509) | R
Reshape | O @s-barannikov 聽 | O @s-barannikov 聽 | M
Softmax | O @s-barannikov 聽 | 聽 | M, I
ArgMax | @struss (#1691 ) | @struss (#1691 )聽 | R

Update 2020/05/27: ArgMax operator was added to the table.

Most helpful comment

All operators in ResNet50, InceptionV3, MobileNetV2 are now supported 馃憦 .

Look at the above table to see the execution time of each model.

Thanks! @s-barannikov @karthik-pen @struss

All 15 comments

@jinevening Can I take up the implementation of Mean?

@karthik-pen Sure. Please add me, @binarman , and @s-barannikov as reviewers.

You can learn how to add kernels from @s-barannikov 's PRs titled "[luci-interpreter] Add ~~ kernel" link. Since we're using tflite's kernel, please see tflite kernel implementation for mean.

I will implement Pad.

@karthik-pen , @s-barannikov , what if you write your github id in the appropriate cell of the table and mark it in progress? :)

Execution time of each model in luci-interpreter
(measured in Ubuntu 18.04 x86 desktop equipped with i7-9700 3.0 GHz)

Model聽 | Debug mode | Release mode
-- | -- | --
InceptionV3 | 114 sec | 5.2 0.8 sec
MobileNetV2 | 6.9 sec | 0.4 0.1 sec
ResNet50 | 24 sec | 0.7 sec

(ResNet50 will be updated later) ResNet50 result was added 2020/06/03

luci-interpreter will run ~1,000 data samples to profile moving avg of min/max values for post-training quantization #696. For InceptionV3, the execution time for 1,000 data is ~5200 sec (~1 hour 27 min). Even considering the time to record min/max values, I expect that the profiling would finish within several hours.

Update (2020/05/26): Thanks to @s-barannikov 's work (#1438), the execution time for InceptionV3 was reduced from 5.2s to 0.8s. Now, the profiling may take just ~15 minutes 馃憤 .

From what I've seen, most of the time is spent on Conv2D. At least that operation needs to use optimized kernel (instead of reference), but it requires some additional steps to take (allocate Im2Col tensor and possibly create CpuContext to enable parallelism).

@jinevening BTW Is it the time spent on "interpret()" call only?

BTW Is it the time spent on "interpret()" call only?

@s-barannikov No, I measured the whole time spent for running luci_eval_tester. This includes the time to read input data from the file, run the interpreter, write the output to the file.

$ time build/release/compiler/luci-value-test/tester/luci_eval_tester build/release/compiler/luci-value-test/inception_v3.circle 1 build/release/compiler/luci-value-test/inception_v3.circle.input build/release/compiler/luci-value-test/inception_v3.circle.output

real    0m0.773s
user    0m0.633s
sys     0m0.140s

real 0m0.773s

Looks better!

@jinevening did you try run network in multithread mode?

@s-barannikov how to enable parallel execution?

@jinevening did you try run network in multithread mode?

No. I'm not sure if the Conv2D kernel runs with multi threads.

@s-barannikov how to enable parallel execution?

There is no way to control it yet. Quantized op is always parallelized and floating kernel is single-threaded.

@karthik-pen Can you tell me the progress of Mean?

@karthik-pen Can you tell me the progress of Mean?

I will post the draft in some time. Sorry for the delay. Here it is: #1665

Edit: Created new PR #1669 for this.

All operators in ResNet50, InceptionV3, MobileNetV2 are now supported 馃憦 .

Look at the above table to see the execution time of each model.

Thanks! @s-barannikov @karthik-pen @struss

I close this issue because all of the target operators are supported and no more issues have been raised.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jinevening picture jinevening  路  3Comments

underflow101 picture underflow101  路  4Comments

seanshpark picture seanshpark  路  3Comments

lucenticus picture lucenticus  路  3Comments

mhs4670go picture mhs4670go  路  4Comments