One: [onert] Fix duplicated allocations for static tensors that turns into dynamic

Created on 23 Jun 2020 · 9Comments · Source: Samsung/ONE

If a tensor was static at first but then it changes during the run, it is allocated to both static and dynamic memory managers. This leads waste of memory. However deallocating static memory for those cases is not that easy. As its memory space is likely to be shared with other tensors that is not overlapping lifetime.

sprintask

Source

wateret

👍1

All 9 comments

FYI, some scenarios (they are similar) would be:

case 1 (input reshaping)

at compilation time, #0, #1 are allocated.

#0 = palceholder(shape=[100, 100])
#1 = relu(#0)

but after nnfw_prepare(), if we call

nnfw_set_input_tensorinfo(#0, [2, 2])

static memory are still there but #0, #1 are now treated as dynamic memory.

case 2 (setting unknown dim)

at compilation time, #0, #1, #2 are allocated (None is treated as dim == 1 in tflite)

#0 = palceholder(shape=[None, 100])
#1 = palceholder(shape=[None, 200])
#2 = add(#0, #1)

but after nnfw_prepare(), if we call

nnfw_set_input_tensorinfo(#0, [200, 100])
nnfw_set_input_tensorinfo(#1, [200, 200])

static memory are still there but #0, #1, #2 are now treated as dynamic memory.

hyunsik-yoon on 24 Jun 2020

👍1

I think we can approach those cases differently.

case 1

From my observation, it does not seem like there is a perfect solution for this. That is because any tensors can become dynamic which were static at first. However static tensors share memory space with other tensors so deallocating one is not possible.

Here are some candidates I came up with:

(#151 is a prerequisite.)

Make all tensors be dynamic UNLESS all tensors in the model have fixed shapes and does not change input shape
Make static memory allocator allocates tensors individually(just like dynamic allocator) - then now static tensors can be freed
...

case 2

I think this is solvable as we know that we are not using static tensors. For those, we should make the as dynamic.

wateret on 30 Jun 2020

I did some benchmark with one of our test models.

1) Running a model with --shape_run option (all dynamic tensors):

EXECUTE      takes 15.488 ms
- MEAN     :  15.488 ms
- MAX      :  16.377 ms
- MIN      :  15.256 ms
- GEOMEAN  :  15.484 ms

2) Running a model _without_ --shape_run option (all static tensors):

EXECUTE      takes 12.305 ms
- MEAN     :  12.305 ms
- MAX      :  12.413 ms
- MIN      :  12.189 ms
- GEOMEAN  :  12.305 ms

It seems that using dynamic tensor is slow due to memory alloc and free.

I think we should _use static tensor as much as possible_.

how to run benchmark:

BACKENDS=cpu ./Product/out/bin/nnpackage_run test_model/package/dir -r 20 -w 10

hyunsik-yoon on 6 Jul 2020

👍1

I tried to list cases in more details:

(internally case 1) and case 2) are handled same)

case A) `nnfw_set_input_tensorinfo()` is not called at all

all static tensors. good.

case B) calling `nnfw_set_input_tensorinfo()` before `nnfw_prepare()`

all static tensors. good.

nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_prepare()  ---> static tensors are allocated.
nnfw_run()
nnfw_run()
...

case C) calling `nnfw_set_input_tensorinfo()` after `nnfw_prepare()`

nnfw_prepare()  ---> currently, static tensors are allocated. not good.
nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_run() --> dynamic tensors are allocated and freed
nnfw_run() --> dynamic tensors are allocated and freed
...

it will be good if we make the behavior like the following?

nnfw_prepare()  ---> do nothing (only memory for const is allocated)
nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_run() --> statically and also dynamically allocate memory
nnfw_run() --> reuse statically allocated memory + dynamic tensors are allocated / freed
...

case D) calling `nnfw_set_input_tensorinfo()` whenever `nnfw_run()` is called

this seems same with case C).

nnfw_prepare()  ---> currently, static tensors are allocated. not good.
nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_run() --> dynamic tensors are allocated and freed
nnfw_set_input_tensorinfo(0, [3, 2, 3])
nnfw_run() --> dynamic tensors are allocated and freed
...

case E) weird calling

nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_prepare()  ---> all static tensors! seems good!
nnfw_set_input_tensorinfo(0, [3, 2, 4]) --> hmm???
nnfw_run() --> dynamic tensors are allocated and freed
...

hyunsik-yoon on 6 Jul 2020

Let me try to fix like the following (thanks @wateret):

at first nnfw_run()
- perform static tensor remember memory planning
- and perform allocations for dynamic tensors...
at the second nnfw_run()
- reuse static memory planning (one chunk of allocation is needed)
- and perform allocations for dynamic tensors...

hyunsik-yoon on 7 Jul 2020

Cleared milestone since it was the plan when I work on it.

wateret on 7 Jul 2020

Related - #151

wateret on 14 Jul 2020

Current approach in #3122 is (for CPU backend)

compile time
- For const tensors, perform plan, prepare, allocate
at runtime
- the 1st run
  - at the beginning, for static (non-const) tensors, perform plan, prepare, allocate
  - dynamic tensors will be allocated / deallocated when a kernel runs
- from the 2nd runs
  - if nnfw_set_input_tensorinfo() was _not_ called for the 2nd run,
    - static tensors allocated at the 1st run are re-used
  - else
    - for static (non-const) tensors, perform plan, prepare, allocate
  - dynamic tensors will be allocated / deallocated when a kernel runs

hyunsik-yoon on 15 Jul 2020

👍1

updated: https://github.com/Samsung/ONE/issues/2525#issuecomment-658501660

hyunsik-yoon on 23 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Compiler FE: conversion of pytorch to TF with Ops

seanshpark · 3Comments

[tflitefile_tool] occured "select_operator.py" error.

hasw7569 · 4Comments

Compiler How to disable a single project test

seanshpark · 3Comments

[one-cmds] one-build failed with error message

YongseopKim · 3Comments

Compiler FE: support Shape op in luci-interpreter

mhs4670go · 4Comments