One: How can I get ruy to use multiple cores?

Created on 24 Sep 2020  路  4Comments  路  Source: Samsung/ONE

I tried to optimize PermuteLayer with using ruy library at the draft #4395
I checked it works well with multithreading. But it uses only one-core.
How can I get ruy to use multiple cores?
cc/ @periannath

help wanted

Most helpful comment

@periannath
I confirm that it would runs well on multi-core. Thanks for your information.

All 4 comments

@ragmani Setting environment variable RUY_THREADS to desiable value. Or you can change default value of RUY_THREADS in runtime/onert/core/include/util/Config.lst.

@periannath

I set the environment variable like RUY_THREADS=8. It works well with multithreading, but it uses only one-core :cry:

@ragmani Hybrid FC layer runs well on multi-core with RUY_THREADS option. I'm not sure why it runs on single core. :(

It seems some initialization code is missing in #4395. Maybe those missing code affects multi-core usage?

trmul.cc

  // Initialize per-thread state.                                                                                                                                                                                  
  const int thread_count = block_map.thread_count;                                                                                                                                                                 
  const bool need_atomics = thread_count > 1;                                                                                                                                                                      
  ctx->EnsureThreadSpecificResources(thread_count);                                                                                                                                                                
  for (int i = 0; i < thread_count; i++) {                                                                                                                                                                         
    ctx->GetThreadSpecificTuningResolver(i)->SetTuning(ctx->explicit_tuning());                                                                                                                                    
  }      

@periannath
I confirm that it would runs well on multi-core. Thanks for your information.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

seanshpark picture seanshpark  路  3Comments

binarman picture binarman  路  3Comments

KimDongEon picture KimDongEon  路  4Comments

mhs4670go picture mhs4670go  路  4Comments

kishcs picture kishcs  路  3Comments