One: [onert] Fix duplicated allocations for static tensors that turns into dynamic

Created on 23 Jun 2020  路  9Comments  路  Source: Samsung/ONE

If a tensor was static at first but then it changes during the run, it is allocated to both static and dynamic memory managers. This leads waste of memory. However deallocating static memory for those cases is not that easy. As its memory space is likely to be shared with other tensors that is not overlapping lifetime.

sprintask

All 9 comments

FYI, some scenarios (they are similar) would be:

case 1 (input reshaping)

at compilation time, #0, #1 are allocated.

#0 = palceholder(shape=[100, 100])
#1 = relu(#0)

but after nnfw_prepare(), if we call

nnfw_set_input_tensorinfo(#0, [2, 2])

static memory are still there but #0, #1 are now treated as dynamic memory.

case 2 (setting unknown dim)

at compilation time, #0, #1, #2 are allocated (None is treated as dim == 1 in tflite)

#0 = palceholder(shape=[None, 100])
#1 = palceholder(shape=[None, 200])
#2 = add(#0, #1)

but after nnfw_prepare(), if we call

nnfw_set_input_tensorinfo(#0, [200, 100])
nnfw_set_input_tensorinfo(#1, [200, 200])

static memory are still there but #0, #1, #2 are now treated as dynamic memory.

I think we can approach those cases differently.

case 1

From my observation, it does not seem like there is a perfect solution for this. That is because any tensors can become dynamic which were static at first. However static tensors share memory space with other tensors so deallocating one is not possible.

Here are some candidates I came up with:

(#151 is a prerequisite.)

  1. Make all tensors be dynamic UNLESS all tensors in the model have fixed shapes and does not change input shape
  2. Make static memory allocator allocates tensors individually(just like dynamic allocator) - then now static tensors can be freed
  3. ...

case 2

I think this is solvable as we know that we are not using static tensors. For those, we should make the as dynamic.

I did some benchmark with one of our test models.

1) Running a model with --shape_run option (all dynamic tensors):

EXECUTE      takes 15.488 ms
- MEAN     :  15.488 ms
- MAX      :  16.377 ms
- MIN      :  15.256 ms
- GEOMEAN  :  15.484 ms

2) Running a model _without_ --shape_run option (all static tensors):

EXECUTE      takes 12.305 ms
- MEAN     :  12.305 ms
- MAX      :  12.413 ms
- MIN      :  12.189 ms
- GEOMEAN  :  12.305 ms

It seems that using dynamic tensor is slow due to memory alloc and free.

I think we should _use static tensor as much as possible_.

how to run benchmark:

BACKENDS=cpu ./Product/out/bin/nnpackage_run test_model/package/dir -r 20 -w 10

I tried to list cases in more details:

(internally case 1) and case 2) are handled same)

case A) nnfw_set_input_tensorinfo() is not called at all

  • all static tensors. good.

case B) calling nnfw_set_input_tensorinfo() before nnfw_prepare()

  • all static tensors. good.
nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_prepare()  ---> static tensors are allocated.
nnfw_run()
nnfw_run()
...

case C) calling nnfw_set_input_tensorinfo() after nnfw_prepare()

nnfw_prepare()  ---> currently, static tensors are allocated. not good.
nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_run() --> dynamic tensors are allocated and freed
nnfw_run() --> dynamic tensors are allocated and freed
...

it will be good if we make the behavior like the following?

nnfw_prepare()  ---> do nothing (only memory for const is allocated)
nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_run() --> statically and also dynamically allocate memory
nnfw_run() --> reuse statically allocated memory + dynamic tensors are allocated / freed
...

case D) calling nnfw_set_input_tensorinfo() whenever nnfw_run() is called

this seems same with case C).

nnfw_prepare()  ---> currently, static tensors are allocated. not good.
nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_run() --> dynamic tensors are allocated and freed
nnfw_set_input_tensorinfo(0, [3, 2, 3])
nnfw_run() --> dynamic tensors are allocated and freed
...

case E) weird calling

nnfw_set_input_tensorinfo(0, [1, 2, 3])
nnfw_prepare()  ---> all static tensors! seems good!
nnfw_set_input_tensorinfo(0, [3, 2, 4]) --> hmm???
nnfw_run() --> dynamic tensors are allocated and freed
...

Let me try to fix like the following (thanks @wateret):

  • at first nnfw_run()

    • perform static tensor remember memory planning

    • and perform allocations for dynamic tensors...

  • at the second nnfw_run()

    • reuse static memory planning (one chunk of allocation is needed)

    • and perform allocations for dynamic tensors...

Cleared milestone since it was the plan when I work on it.

Related - #151

Current approach in #3122 is (for CPU backend)

  • compile time

    • For const tensors, perform plan, prepare, allocate

  • at runtime

    • the 1st run



      • at the beginning, for static (non-const) tensors, perform plan, prepare, allocate


      • dynamic tensors will be allocated / deallocated when a kernel runs



    • from the 2nd runs



      • if nnfw_set_input_tensorinfo() was _not_ called for the 2nd run,





        • static tensors allocated at the 1st run are re-used





      • else





        • for static (non-const) tensors, perform plan, prepare, allocate





      • dynamic tensors will be allocated / deallocated when a kernel runs



Was this page helpful?
0 / 5 - 0 ratings

Related issues

seanshpark picture seanshpark  路  3Comments

hasw7569 picture hasw7569  路  4Comments

seanshpark picture seanshpark  路  3Comments

YongseopKim picture YongseopKim  路  3Comments

mhs4670go picture mhs4670go  路  4Comments