One: [onert] Optimizes PermuteLayer of controlflow

Created on 24 Sep 2020  路  5Comments  路  Source: Samsung/ONE

Now, there are two reasons why PermuteLayer has a bit poor performance.

Problems

  1. PermuteLayer does not use multi-threading.
  2. For accessing memory of "arm_compute::ICLTensor", runtime must call arm_compute::ICLTensor::map(). Our runtime calls the arm_compute::ICLTensor::map() with the blocking flag set to always true.

The way to resolve above problems

This task is going to optimize PermuteLayer of controlflow by resolving above two reasons.

  • [x] 1. To make PermuteLayer uses multi-threading by using thread pool of ruy::Context.
  • [x] 2. To make our runtime calls clEnqueueWriteBuffer() or clEnqueueReadBuffer() instead of arm_compute::ICLTensor::map() when output or intput of PermuteLater is tensor of acl_cl backend.
  • [x] 3. To cache offsets of tensors in PermuteLayer

Draft PR : #4395

/cc @Samsung/nnfw_committers @periannath

areonert sprintask

Most helpful comment

Effects

  1. To make PermuteLayer uses multi-threading by using thread pool of ruy::Context.

Depending on some conditions such as device or model, using multi-threading can improve or can worsens performance. So we have to set the number of threads well.

  1. To make our runtime calls clEnqueueWriteBuffer() or clEnqueueReadBuffer() instead of arm_compute::ICLTensor::map() when output or intput of PermuteLater is tensor of acl_cl backend.

This improves performance in the models that creates multiple PermuteLayer such as having many inputs.
This improves performance in the case where PermuteLayer is located in the middle.

  1. To cache offsets of tensors in PermuteLayer

This improves performance in the models that has large sized input and output with pads.

  • Performance of each model on odroid-xu4 with acl_cl backend

| model | before (d1886fab6a3bcbcef7a7cdd8a547c538b1574506) | after (#4395) | thread count | improved performance rate (before - after / before) * 100 |
|:--:|:--:|:--:|:--:|:--:|
| d1 | 21.417 ms | 19.612 ms | 1 | 8.4 % |
| d2 | 7.333 ms | 6.636 ms | 1 | 9.5 % |
| d3 | 19.513 ms | 17.487 ms | 1 | 10.4 % |
| e1 | 28.082 ms | 27.465 ms | 1 | 2.2 % |
| e2 | 41.323 ms | 37.822 ms | 1 | 8.5 % |
| mobilenet | 73.013 ms | 71.547 ms | 1 | 2.0 % |
| inception | 551.272 ms | 550.894 ms | 1 | 0.07 % |
| p1 | 56.565 ms | 55.283 ms | 1 | 2.3 % |
| p2 | 7.210 ms | 6.713 ms | 1 | 6.9 % |

  • Log of performance of each model before patches
$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d1/ -w 10 -r 1000
Package Filename d1/
===================================
MODEL_LOAD   takes 22.722 ms
PREPARE      takes 1346.501 ms
EXECUTE      takes 21.419 ms
- MEAN     :  21.419 ms
- MAX      :  22.681 ms
- MIN      :  20.760 ms
- GEOMEAN  :  21.417 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d2/ -w 10 -r 1000
Package Filename d2/
===================================
MODEL_LOAD   takes 66.378 ms
PREPARE      takes 1295.190 ms
EXECUTE      takes 7.334 ms
- MEAN     :  7.334 ms
- MAX      :  7.760 ms
- MIN      :  6.935 ms
- GEOMEAN  :  7.333 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d3/ -w 10 -r 1000
Package Filename d3/
===================================
MODEL_LOAD   takes 792.003 ms
PREPARE      takes 2330.235 ms
EXECUTE      takes 19.516 ms
- MEAN     :  19.516 ms
- MAX      :  20.427 ms
- MIN      :  18.736 ms
- GEOMEAN  :  19.513 ms
==================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage e1/ -w 10 -r 1000
Package Filename e1/
===================================
MODEL_LOAD   takes 220.728 ms
PREPARE      takes 1736.628 ms
EXECUTE      takes 28.084 ms
- MEAN     :  28.084 ms
- MAX      :  30.076 ms
- MIN      :  27.246 ms
- GEOMEAN  :  28.082 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage e2/ -w 10 -r 1000
Package Filename e2/
===================================
MODEL_LOAD   takes 1598.363 ms
PREPARE      takes 2167.587 ms
EXECUTE      takes 41.329 ms
- MEAN     :  41.329 ms
- MAX      :  48.997 ms
- MIN      :  40.170 ms
- GEOMEAN  :  41.323 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage mobilenet_v2_1.0_224 -w 10 -r 1000
Package Filename mobilenet_v2_1.0_224
===================================
MODEL_LOAD   takes 42.793 ms
PREPARE      takes 5378.029 ms
EXECUTE      takes 73.013 ms
- MEAN     :  73.013 ms
- MAX      :  81.730 ms
- MIN      :  72.660 ms
- GEOMEAN  :  73.013 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage inceptionv3_slim_2016 -w 10 -r 1000
Package Filename inceptionv3_slim_2016
===================================
MODEL_LOAD   takes 256.788 ms
PREPARE      takes 12407.476 ms
EXECUTE      takes 551.273 ms
- MEAN     :  551.273 ms
- MAX      :  553.389 ms
- MIN      :  549.208 ms
- GEOMEAN  :  551.272 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage p1 -w 10 -r 1000
Package Filename p1
===================================
MODEL_LOAD   takes 11.074 ms
PREPARE      takes 5193.269 ms
EXECUTE      takes 56.566 ms
- MEAN     :  56.566 ms
- MAX      :  58.516 ms
- MIN      :  56.057 ms
- GEOMEAN  :  56.565 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage p2 -w 10 -r 1000
Package Filename p2
===================================
MODEL_LOAD   takes 5.239 ms
PREPARE      takes 1711.113 ms
EXECUTE      takes 7.211 ms
- MEAN     :  7.211 ms
- MAX      :  8.856 ms
- MIN      :  6.879 ms
- GEOMEAN  :  7.210 ms
===================================
  • Log of performance of each model after patches
$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d1/ -w 10 -r 1000
Package Filename d1/
===================================
MODEL_LOAD   takes 22.913 ms
PREPARE      takes 1334.244 ms
EXECUTE      takes 19.613 ms
- MEAN     :  19.613 ms
- MAX      :  20.682 ms
- MIN      :  19.278 ms
- GEOMEAN  :  19.612 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d2/ -w 10 -r 1000
Package Filename d2/
===================================
MODEL_LOAD   takes 7.850 ms
PREPARE      takes 1298.734 ms
EXECUTE      takes 6.637 ms
- MEAN     :  6.637 ms
- MAX      :  7.344 ms
- MIN      :  6.437 ms
- GEOMEAN  :  6.636 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d3/ -w 10 -r 1000
Package Filename d3/
===================================
MODEL_LOAD   takes 30.664 ms
PREPARE      takes 2324.531 ms
EXECUTE      takes 17.488 ms
- MEAN     :  17.488 ms
- MAX      :  19.223 ms
- MIN      :  17.019 ms
- GEOMEAN  :  17.487 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage e1/ -w 10 -r 1000
Package Filename e1/
===================================
MODEL_LOAD   takes 14.761 ms
PREPARE      takes 1743.793 ms
EXECUTE      takes 27.471 ms
- MEAN     :  27.471 ms
- MAX      :  31.900 ms
- MIN      :  26.469 ms
- GEOMEAN  :  27.465 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage e2/ -w 10 -r 1000
Package Filename e2/
===================================
MODEL_LOAD   takes 56.136 ms
PREPARE      takes 2170.272 ms
EXECUTE      takes 37.826 ms
- MEAN     :  37.826 ms
- MAX      :  40.546 ms
- MIN      :  36.727 ms
- GEOMEAN  :  37.822 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage mobilenet_v2_1.0_224 -w 10 -r 1000
Package Filename mobilenet_v2_1.0_224
===================================
MODEL_LOAD   takes 47.294 ms
PREPARE      takes 5404.024 ms
EXECUTE      takes 71.547 ms
- MEAN     :  71.547 ms
- MAX      :  73.341 ms
- MIN      :  71.255 ms
- GEOMEAN  :  71.547 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage inceptionv3_slim_2016 -w 10 -r 1000
Package Filename inceptionv3_slim_2016
===================================
MODEL_LOAD   takes 263.214 ms
PREPARE      takes 12379.828 ms
EXECUTE      takes 550.894 ms
- MEAN     :  550.894 ms
- MAX      :  551.998 ms
- MIN      :  548.923 ms
- GEOMEAN  :  550.894 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage p1 -w 10 -r 1000
Package Filename p1
===================================
MODEL_LOAD   takes 8.615 ms
PREPARE      takes 5191.016 ms
EXECUTE      takes 55.285 ms
- MEAN     :  55.285 ms
- MAX      :  68.132 ms
- MIN      :  54.630 ms
- GEOMEAN  :  55.283 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage p2 -w 10 -r 1000
Package Filename p2
===================================
MODEL_LOAD   takes 8.182 ms
PREPARE      takes 1713.505 ms
EXECUTE      takes 6.714 ms
- MEAN     :  6.714 ms
- MAX      :  7.572 ms
- MIN      :  6.421 ms
- GEOMEAN  :  6.713 ms
===================================

  • Performance of each model on odroid-n2 with acl_cl backend

| model | before (d1886fab6a3bcbcef7a7cdd8a547c538b1574506) | after (#4395) | thread count | improved performance rate (before - after / before) * 100 |
|:--:|:--:|:--:|:--:|:--:|
| d1 | 8.157 ms | 6.929 ms | 1 | 15.1 % |
| d2 | 4.614 ms | 3.154 ms | 2 | 31.6 % |
| d3 | 10.186 ms | 8.664 ms | 1 | 15.0 % |
| e1 | 12.780 ms | 12.245 ms | 2 | 4.2 % |
| e2 | 19.102 ms | 18.791 ms | 1 | 1.6 % |
| mobilenet | 40.779 ms | 40.727 ms | 1 | 0.13 % |
| inception | 365.441 ms | 366.077 ms | 1 | -0.2 % |
| p1 | 41.660 ms | 36.384 ms | 1 | 12.7 % |
| p2 | 6.633 ms | 5.863 ms | 1 | 11.6 % |

  • Log of performance of each model before patches
$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d1/ -w 10 -r 1000
Package Filename d1/
===================================
MODEL_LOAD   takes 11.712 ms
PREPARE      takes 2270.909 ms
EXECUTE      takes 8.191 ms
- MEAN     :  8.191 ms
- MAX      :  21.204 ms
- MIN      :  7.235 ms
- GEOMEAN  :  8.157 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d2/ -w 10 -r 1000
Package Filename d2/
===================================
MODEL_LOAD   takes 2.745 ms
PREPARE      takes 2438.955 ms
EXECUTE      takes 4.625 ms
- MEAN     :  4.625 ms
- MAX      :  7.464 ms
- MIN      :  3.201 ms
- GEOMEAN  :  4.614 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d3/ -w 10 -r 1000
Package Filename d3/
===================================
MODEL_LOAD   takes 11.728 ms
PREPARE      takes 4084.234 ms
EXECUTE      takes 10.203 ms
- MEAN     :  10.203 ms
- MAX      :  19.850 ms
- MIN      :  8.762 ms
- GEOMEAN  :  10.186 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage e1/ -w 10 -r 1000
Package Filename e1/
===================================
MODEL_LOAD   takes 6.707 ms
PREPARE      takes 1983.457 ms
EXECUTE      takes 12.803 ms
- MEAN     :  12.803 ms
- MAX      :  24.841 ms
- MIN      :  11.472 ms
- GEOMEAN  :  12.780 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage e2/ -w 10 -r 1000
Package Filename e2/
===================================
MODEL_LOAD   takes 20.181 ms
PREPARE      takes 2355.564 ms
EXECUTE      takes 19.121 ms
- MEAN     :  19.121 ms
- MAX      :  29.254 ms
- MIN      :  18.055 ms
- GEOMEAN  :  19.102 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage mobilenet_v2_1.0_224 -w 10 -r 1000
Package Filename mobilenet_v2_1.0_224
===================================
MODEL_LOAD   takes 17.108 ms
PREPARE      takes 4341.270 ms
EXECUTE      takes 40.813 ms
- MEAN     :  40.813 ms
- MAX      :  58.547 ms
- MIN      :  39.492 ms
- GEOMEAN  :  40.779 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage inceptionv3_slim_2016 -w 10 -r 1000
Package Filename inceptionv3_slim_2016
===================================
MODEL_LOAD   takes 76.632 ms
PREPARE      takes 11707.544 ms
EXECUTE      takes 365.472 ms
- MEAN     :  365.472 ms
- MAX      :  384.435 ms
- MIN      :  359.136 ms
- GEOMEAN  :  365.441 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage p1 -w 10 -r 1000
Package Filename p1
===================================
MODEL_LOAD   takes 2.802 ms
PREPARE      takes 5363.464 ms
EXECUTE      takes 41.919 ms
- MEAN     :  41.919 ms
- MAX      :  63.400 ms
- MIN      :  36.434 ms
- GEOMEAN  :  41.660 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage p2 -w 10 -r 1000
Package Filename p2
===================================
MODEL_LOAD   takes 1.851 ms
PREPARE      takes 2091.100 ms
EXECUTE      takes 6.669 ms
- MEAN     :  6.669 ms
- MAX      :  22.189 ms
- MIN      :  5.628 ms
- GEOMEAN  :  6.633 ms
===================================
  • Log of performance of each model after patches
$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d1/ -w 10 -r 1000
Package Filename d1/
===================================
MODEL_LOAD   takes 6.152 ms
PREPARE      takes 2261.286 ms
EXECUTE      takes 6.966 ms
- MEAN     :  6.966 ms
- MAX      :  19.381 ms
- MIN      :  5.963 ms
- GEOMEAN  :  6.929 ms
===================================

$ RUY_THREADS=2 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d2/ -w 10 -r 1000
Package Filename d2/
===================================
MODEL_LOAD   takes 1.450 ms
PREPARE      takes 2438.586 ms
EXECUTE      takes 3.182 ms
- MEAN     :  3.182 ms
- MAX      :  5.737 ms
- MIN      :  2.640 ms
- GEOMEAN  :  3.154 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d3/ -w 10 -r 1000
Package Filename d3/
===================================
MODEL_LOAD   takes 7.580 ms
PREPARE      takes 4088.470 ms
EXECUTE      takes 8.676 ms
- MEAN     :  8.676 ms
- MAX      :  12.090 ms
- MIN      :  6.934 ms
- GEOMEAN  :  8.664 ms
===================================

$ RUY_THREADS=2 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage e1/ -w 10 -r 1000
Package Filename e1/
===================================
MODEL_LOAD   takes 3.468 ms
PREPARE      takes 1972.334 ms
EXECUTE      takes 12.249 ms
- MEAN     :  12.249 ms
- MAX      :  15.098 ms
- MIN      :  11.463 ms
- GEOMEAN  :  12.245 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage e2/ -w 10 -r 1000
Package Filename e2/
===================================
MODEL_LOAD   takes 18.158 ms
PREPARE      takes 2331.893 ms
EXECUTE      takes 18.803 ms
- MEAN     :  18.803 ms
- MAX      :  31.326 ms
- MIN      :  17.971 ms
- GEOMEAN  :  18.791 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage mobilenet_v2_1.0_224 -w 10 -r 1000
Package Filename mobilenet_v2_1.0_224
===================================
MODEL_LOAD   takes 12.159 ms
PREPARE      takes 4346.867 ms
EXECUTE      takes 40.760 ms
- MEAN     :  40.760 ms
- MAX      :  61.887 ms
- MIN      :  39.568 ms
- GEOMEAN  :  40.727 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage inceptionv3_slim_2016 -w 10 -r 1000
Package Filename inceptionv3_slim_2016
===================================
MODEL_LOAD   takes 70.042 ms
PREPARE      takes 11668.927 ms
EXECUTE      takes 366.110 ms
- MEAN     :  366.110 ms
- MAX      :  386.276 ms
- MIN      :  359.115 ms
- GEOMEAN  :  366.077 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage p1 -w 10 -r 1000
Package Filename p1
===================================
MODEL_LOAD   takes 2.814 ms
PREPARE      takes 5346.621 ms
EXECUTE      takes 36.441 ms
- MEAN     :  36.441 ms
- MAX      :  55.867 ms
- MIN      :  34.255 ms
- GEOMEAN  :  36.384 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage ../p2 -w 10 -r 1000
Package Filename ../p2
===================================
MODEL_LOAD   takes 3.450 ms
PREPARE      takes 2094.042 ms
EXECUTE      takes 5.880 ms
- MEAN     :  5.880 ms
- MAX      :  7.869 ms
- MIN      :  4.340 ms
- GEOMEAN  :  5.863 ms
===================================

All 5 comments

I have one curiosity. Can we use the benefit of verctorization when we copy memory?

@hyunsik-yoon

I have one curiosity. Can we use the benefit of verctorization when we copy memory?

I think it's a good point. But I don't have a plan to use it yet for the following reason.
We can use vectorization (SIMD). But I'm not sure if it will always be benefit of performance. To use vectorization(SIMD), we have to use instructions of loading and storing. It depends on how big and fast the caches are in target device. In the other words, using SIMD for copying memory could decrease performance than memcpy(), contrary to our expectations.

  1. To make our runtime calls arm_compute::ICLTensor::map() by distinguishing how to set the blocking flag value.

I tried to change using map() to use clEnqueueMapBuffer() for only writing CLTensors. But I failed it because there is no way how to get memory buffer without mapping. The function always map data to buffer even though I try all the available flags. To say that mapping cannot be prevented means that there is no way to prevent mismatching of results.
So, I will try to use clEnqueueWriteBuffer() instead of clEnqueueMapBuffer().

Effects

  1. To make PermuteLayer uses multi-threading by using thread pool of ruy::Context.

Depending on some conditions such as device or model, using multi-threading can improve or can worsens performance. So we have to set the number of threads well.

  1. To make our runtime calls clEnqueueWriteBuffer() or clEnqueueReadBuffer() instead of arm_compute::ICLTensor::map() when output or intput of PermuteLater is tensor of acl_cl backend.

This improves performance in the models that creates multiple PermuteLayer such as having many inputs.
This improves performance in the case where PermuteLayer is located in the middle.

  1. To cache offsets of tensors in PermuteLayer

This improves performance in the models that has large sized input and output with pads.

  • Performance of each model on odroid-xu4 with acl_cl backend

| model | before (d1886fab6a3bcbcef7a7cdd8a547c538b1574506) | after (#4395) | thread count | improved performance rate (before - after / before) * 100 |
|:--:|:--:|:--:|:--:|:--:|
| d1 | 21.417 ms | 19.612 ms | 1 | 8.4 % |
| d2 | 7.333 ms | 6.636 ms | 1 | 9.5 % |
| d3 | 19.513 ms | 17.487 ms | 1 | 10.4 % |
| e1 | 28.082 ms | 27.465 ms | 1 | 2.2 % |
| e2 | 41.323 ms | 37.822 ms | 1 | 8.5 % |
| mobilenet | 73.013 ms | 71.547 ms | 1 | 2.0 % |
| inception | 551.272 ms | 550.894 ms | 1 | 0.07 % |
| p1 | 56.565 ms | 55.283 ms | 1 | 2.3 % |
| p2 | 7.210 ms | 6.713 ms | 1 | 6.9 % |

  • Log of performance of each model before patches
$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d1/ -w 10 -r 1000
Package Filename d1/
===================================
MODEL_LOAD   takes 22.722 ms
PREPARE      takes 1346.501 ms
EXECUTE      takes 21.419 ms
- MEAN     :  21.419 ms
- MAX      :  22.681 ms
- MIN      :  20.760 ms
- GEOMEAN  :  21.417 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d2/ -w 10 -r 1000
Package Filename d2/
===================================
MODEL_LOAD   takes 66.378 ms
PREPARE      takes 1295.190 ms
EXECUTE      takes 7.334 ms
- MEAN     :  7.334 ms
- MAX      :  7.760 ms
- MIN      :  6.935 ms
- GEOMEAN  :  7.333 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d3/ -w 10 -r 1000
Package Filename d3/
===================================
MODEL_LOAD   takes 792.003 ms
PREPARE      takes 2330.235 ms
EXECUTE      takes 19.516 ms
- MEAN     :  19.516 ms
- MAX      :  20.427 ms
- MIN      :  18.736 ms
- GEOMEAN  :  19.513 ms
==================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage e1/ -w 10 -r 1000
Package Filename e1/
===================================
MODEL_LOAD   takes 220.728 ms
PREPARE      takes 1736.628 ms
EXECUTE      takes 28.084 ms
- MEAN     :  28.084 ms
- MAX      :  30.076 ms
- MIN      :  27.246 ms
- GEOMEAN  :  28.082 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage e2/ -w 10 -r 1000
Package Filename e2/
===================================
MODEL_LOAD   takes 1598.363 ms
PREPARE      takes 2167.587 ms
EXECUTE      takes 41.329 ms
- MEAN     :  41.329 ms
- MAX      :  48.997 ms
- MIN      :  40.170 ms
- GEOMEAN  :  41.323 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage mobilenet_v2_1.0_224 -w 10 -r 1000
Package Filename mobilenet_v2_1.0_224
===================================
MODEL_LOAD   takes 42.793 ms
PREPARE      takes 5378.029 ms
EXECUTE      takes 73.013 ms
- MEAN     :  73.013 ms
- MAX      :  81.730 ms
- MIN      :  72.660 ms
- GEOMEAN  :  73.013 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage inceptionv3_slim_2016 -w 10 -r 1000
Package Filename inceptionv3_slim_2016
===================================
MODEL_LOAD   takes 256.788 ms
PREPARE      takes 12407.476 ms
EXECUTE      takes 551.273 ms
- MEAN     :  551.273 ms
- MAX      :  553.389 ms
- MIN      :  549.208 ms
- GEOMEAN  :  551.272 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage p1 -w 10 -r 1000
Package Filename p1
===================================
MODEL_LOAD   takes 11.074 ms
PREPARE      takes 5193.269 ms
EXECUTE      takes 56.566 ms
- MEAN     :  56.566 ms
- MAX      :  58.516 ms
- MIN      :  56.057 ms
- GEOMEAN  :  56.565 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage p2 -w 10 -r 1000
Package Filename p2
===================================
MODEL_LOAD   takes 5.239 ms
PREPARE      takes 1711.113 ms
EXECUTE      takes 7.211 ms
- MEAN     :  7.211 ms
- MAX      :  8.856 ms
- MIN      :  6.879 ms
- GEOMEAN  :  7.210 ms
===================================
  • Log of performance of each model after patches
$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d1/ -w 10 -r 1000
Package Filename d1/
===================================
MODEL_LOAD   takes 22.913 ms
PREPARE      takes 1334.244 ms
EXECUTE      takes 19.613 ms
- MEAN     :  19.613 ms
- MAX      :  20.682 ms
- MIN      :  19.278 ms
- GEOMEAN  :  19.612 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d2/ -w 10 -r 1000
Package Filename d2/
===================================
MODEL_LOAD   takes 7.850 ms
PREPARE      takes 1298.734 ms
EXECUTE      takes 6.637 ms
- MEAN     :  6.637 ms
- MAX      :  7.344 ms
- MIN      :  6.437 ms
- GEOMEAN  :  6.636 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage d3/ -w 10 -r 1000
Package Filename d3/
===================================
MODEL_LOAD   takes 30.664 ms
PREPARE      takes 2324.531 ms
EXECUTE      takes 17.488 ms
- MEAN     :  17.488 ms
- MAX      :  19.223 ms
- MIN      :  17.019 ms
- GEOMEAN  :  17.487 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage e1/ -w 10 -r 1000
Package Filename e1/
===================================
MODEL_LOAD   takes 14.761 ms
PREPARE      takes 1743.793 ms
EXECUTE      takes 27.471 ms
- MEAN     :  27.471 ms
- MAX      :  31.900 ms
- MIN      :  26.469 ms
- GEOMEAN  :  27.465 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage e2/ -w 10 -r 1000
Package Filename e2/
===================================
MODEL_LOAD   takes 56.136 ms
PREPARE      takes 2170.272 ms
EXECUTE      takes 37.826 ms
- MEAN     :  37.826 ms
- MAX      :  40.546 ms
- MIN      :  36.727 ms
- GEOMEAN  :  37.822 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage mobilenet_v2_1.0_224 -w 10 -r 1000
Package Filename mobilenet_v2_1.0_224
===================================
MODEL_LOAD   takes 47.294 ms
PREPARE      takes 5404.024 ms
EXECUTE      takes 71.547 ms
- MEAN     :  71.547 ms
- MAX      :  73.341 ms
- MIN      :  71.255 ms
- GEOMEAN  :  71.547 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage inceptionv3_slim_2016 -w 10 -r 1000
Package Filename inceptionv3_slim_2016
===================================
MODEL_LOAD   takes 263.214 ms
PREPARE      takes 12379.828 ms
EXECUTE      takes 550.894 ms
- MEAN     :  550.894 ms
- MAX      :  551.998 ms
- MIN      :  548.923 ms
- GEOMEAN  :  550.894 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage p1 -w 10 -r 1000
Package Filename p1
===================================
MODEL_LOAD   takes 8.615 ms
PREPARE      takes 5191.016 ms
EXECUTE      takes 55.285 ms
- MEAN     :  55.285 ms
- MAX      :  68.132 ms
- MIN      :  54.630 ms
- GEOMEAN  :  55.283 ms
===================================

$ BACKENDS="acl_cl" ./Product/armv7l-linux.release/out/bin/nnpackage_run --nnpackage p2 -w 10 -r 1000
Package Filename p2
===================================
MODEL_LOAD   takes 8.182 ms
PREPARE      takes 1713.505 ms
EXECUTE      takes 6.714 ms
- MEAN     :  6.714 ms
- MAX      :  7.572 ms
- MIN      :  6.421 ms
- GEOMEAN  :  6.713 ms
===================================

  • Performance of each model on odroid-n2 with acl_cl backend

| model | before (d1886fab6a3bcbcef7a7cdd8a547c538b1574506) | after (#4395) | thread count | improved performance rate (before - after / before) * 100 |
|:--:|:--:|:--:|:--:|:--:|
| d1 | 8.157 ms | 6.929 ms | 1 | 15.1 % |
| d2 | 4.614 ms | 3.154 ms | 2 | 31.6 % |
| d3 | 10.186 ms | 8.664 ms | 1 | 15.0 % |
| e1 | 12.780 ms | 12.245 ms | 2 | 4.2 % |
| e2 | 19.102 ms | 18.791 ms | 1 | 1.6 % |
| mobilenet | 40.779 ms | 40.727 ms | 1 | 0.13 % |
| inception | 365.441 ms | 366.077 ms | 1 | -0.2 % |
| p1 | 41.660 ms | 36.384 ms | 1 | 12.7 % |
| p2 | 6.633 ms | 5.863 ms | 1 | 11.6 % |

  • Log of performance of each model before patches
$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d1/ -w 10 -r 1000
Package Filename d1/
===================================
MODEL_LOAD   takes 11.712 ms
PREPARE      takes 2270.909 ms
EXECUTE      takes 8.191 ms
- MEAN     :  8.191 ms
- MAX      :  21.204 ms
- MIN      :  7.235 ms
- GEOMEAN  :  8.157 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d2/ -w 10 -r 1000
Package Filename d2/
===================================
MODEL_LOAD   takes 2.745 ms
PREPARE      takes 2438.955 ms
EXECUTE      takes 4.625 ms
- MEAN     :  4.625 ms
- MAX      :  7.464 ms
- MIN      :  3.201 ms
- GEOMEAN  :  4.614 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d3/ -w 10 -r 1000
Package Filename d3/
===================================
MODEL_LOAD   takes 11.728 ms
PREPARE      takes 4084.234 ms
EXECUTE      takes 10.203 ms
- MEAN     :  10.203 ms
- MAX      :  19.850 ms
- MIN      :  8.762 ms
- GEOMEAN  :  10.186 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage e1/ -w 10 -r 1000
Package Filename e1/
===================================
MODEL_LOAD   takes 6.707 ms
PREPARE      takes 1983.457 ms
EXECUTE      takes 12.803 ms
- MEAN     :  12.803 ms
- MAX      :  24.841 ms
- MIN      :  11.472 ms
- GEOMEAN  :  12.780 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage e2/ -w 10 -r 1000
Package Filename e2/
===================================
MODEL_LOAD   takes 20.181 ms
PREPARE      takes 2355.564 ms
EXECUTE      takes 19.121 ms
- MEAN     :  19.121 ms
- MAX      :  29.254 ms
- MIN      :  18.055 ms
- GEOMEAN  :  19.102 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage mobilenet_v2_1.0_224 -w 10 -r 1000
Package Filename mobilenet_v2_1.0_224
===================================
MODEL_LOAD   takes 17.108 ms
PREPARE      takes 4341.270 ms
EXECUTE      takes 40.813 ms
- MEAN     :  40.813 ms
- MAX      :  58.547 ms
- MIN      :  39.492 ms
- GEOMEAN  :  40.779 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage inceptionv3_slim_2016 -w 10 -r 1000
Package Filename inceptionv3_slim_2016
===================================
MODEL_LOAD   takes 76.632 ms
PREPARE      takes 11707.544 ms
EXECUTE      takes 365.472 ms
- MEAN     :  365.472 ms
- MAX      :  384.435 ms
- MIN      :  359.136 ms
- GEOMEAN  :  365.441 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage p1 -w 10 -r 1000
Package Filename p1
===================================
MODEL_LOAD   takes 2.802 ms
PREPARE      takes 5363.464 ms
EXECUTE      takes 41.919 ms
- MEAN     :  41.919 ms
- MAX      :  63.400 ms
- MIN      :  36.434 ms
- GEOMEAN  :  41.660 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage p2 -w 10 -r 1000
Package Filename p2
===================================
MODEL_LOAD   takes 1.851 ms
PREPARE      takes 2091.100 ms
EXECUTE      takes 6.669 ms
- MEAN     :  6.669 ms
- MAX      :  22.189 ms
- MIN      :  5.628 ms
- GEOMEAN  :  6.633 ms
===================================
  • Log of performance of each model after patches
$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d1/ -w 10 -r 1000
Package Filename d1/
===================================
MODEL_LOAD   takes 6.152 ms
PREPARE      takes 2261.286 ms
EXECUTE      takes 6.966 ms
- MEAN     :  6.966 ms
- MAX      :  19.381 ms
- MIN      :  5.963 ms
- GEOMEAN  :  6.929 ms
===================================

$ RUY_THREADS=2 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d2/ -w 10 -r 1000
Package Filename d2/
===================================
MODEL_LOAD   takes 1.450 ms
PREPARE      takes 2438.586 ms
EXECUTE      takes 3.182 ms
- MEAN     :  3.182 ms
- MAX      :  5.737 ms
- MIN      :  2.640 ms
- GEOMEAN  :  3.154 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage d3/ -w 10 -r 1000
Package Filename d3/
===================================
MODEL_LOAD   takes 7.580 ms
PREPARE      takes 4088.470 ms
EXECUTE      takes 8.676 ms
- MEAN     :  8.676 ms
- MAX      :  12.090 ms
- MIN      :  6.934 ms
- GEOMEAN  :  8.664 ms
===================================

$ RUY_THREADS=2 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage e1/ -w 10 -r 1000
Package Filename e1/
===================================
MODEL_LOAD   takes 3.468 ms
PREPARE      takes 1972.334 ms
EXECUTE      takes 12.249 ms
- MEAN     :  12.249 ms
- MAX      :  15.098 ms
- MIN      :  11.463 ms
- GEOMEAN  :  12.245 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage e2/ -w 10 -r 1000
Package Filename e2/
===================================
MODEL_LOAD   takes 18.158 ms
PREPARE      takes 2331.893 ms
EXECUTE      takes 18.803 ms
- MEAN     :  18.803 ms
- MAX      :  31.326 ms
- MIN      :  17.971 ms
- GEOMEAN  :  18.791 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage mobilenet_v2_1.0_224 -w 10 -r 1000
Package Filename mobilenet_v2_1.0_224
===================================
MODEL_LOAD   takes 12.159 ms
PREPARE      takes 4346.867 ms
EXECUTE      takes 40.760 ms
- MEAN     :  40.760 ms
- MAX      :  61.887 ms
- MIN      :  39.568 ms
- GEOMEAN  :  40.727 ms
===================================

$ RUY_THREADS=1 BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage inceptionv3_slim_2016 -w 10 -r 1000
Package Filename inceptionv3_slim_2016
===================================
MODEL_LOAD   takes 70.042 ms
PREPARE      takes 11668.927 ms
EXECUTE      takes 366.110 ms
- MEAN     :  366.110 ms
- MAX      :  386.276 ms
- MIN      :  359.115 ms
- GEOMEAN  :  366.077 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage p1 -w 10 -r 1000
Package Filename p1
===================================
MODEL_LOAD   takes 2.814 ms
PREPARE      takes 5346.621 ms
EXECUTE      takes 36.441 ms
- MEAN     :  36.441 ms
- MAX      :  55.867 ms
- MIN      :  34.255 ms
- GEOMEAN  :  36.384 ms
===================================

$ BACKENDS="acl_cl" ./Product/aarch64-linux.release/out/bin/nnpackage_run --nnpackage ../p2 -w 10 -r 1000
Package Filename ../p2
===================================
MODEL_LOAD   takes 3.450 ms
PREPARE      takes 2094.042 ms
EXECUTE      takes 5.880 ms
- MEAN     :  5.880 ms
- MAX      :  7.869 ms
- MIN      :  4.340 ms
- GEOMEAN  :  5.863 ms
===================================

Done.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

underflow101 picture underflow101  路  4Comments

binarman picture binarman  路  3Comments

periannath picture periannath  路  3Comments

ragmani picture ragmani  路  4Comments

mhs4670go picture mhs4670go  路  3Comments