I observe that an attempt to run an NN model on heterogeneous devices using a quantization profile produced before on a single device (e.g. CPU) does not always work as expected because after applying the heterogeneous partitioning some nodes are not found in the saved quantization profile and thus cannot be quantized.
I had a brief discussion with @jfix71 and @beicy about this issue and how to solve it.
The solution we discussed looks something like this:
1) The Partitioner should perform its backend kind based partitioning as usual and create multiple partitions
2) Then if -dump-profile mode is used, the Partitioner should assign CPU to the partitions created in the previous step and do an early exit
3) The provisioner would then call compiler for each of those partitions and use the -dump-profile mode, which would result in instrumenting the graph to collect quantization profiles for each of the partitions
4) After the run, the quantization profiles for each partition will be produced and dumped (as a single or multiple profiles)
Later on, to run the quantized model, one would use the usual command-line with -load-profile and this time the partitioner would perform the real heterogeneous partitioning as usual. Since all the nodes in different partitions were recorded in the profile, there should be no problems this time with finding them in the profile and quantization process should succeed.
cc @rdzhabarov
Most helpful comment
I had a brief discussion with @jfix71 and @beicy about this issue and how to solve it.
The solution we discussed looks something like this:
1) The Partitioner should perform its backend kind based partitioning as usual and create multiple partitions
2) Then if
-dump-profilemode is used, the Partitioner should assign CPU to the partitions created in the previous step and do an early exit3) The provisioner would then call
compilerfor each of those partitions and use the-dump-profilemode, which would result in instrumenting the graph to collect quantization profiles for each of the partitions4) After the run, the quantization profiles for each partition will be produced and dumped (as a single or multiple profiles)
Later on, to run the quantized model, one would use the usual command-line with
-load-profileand this time the partitioner would perform the real heterogeneous partitioning as usual. Since all the nodes in different partitions were recorded in the profile, there should be no problems this time with finding them in the profile and quantization process should succeed.