The latest version of Glow compiler provides a certain level of quantization support.
Quantization process consists of the following steps:
1) Instrument an original floating-point graph with the special Quantization Profile node
2) Run inference on the instrumented graph many times with good representative inputs (see dump_profile flag).
During this step Glow automatically captures the distribution of all activations in the compute graph. The distribution contains min/max value seen from the given output of the graph node as well as detailed frequency per floating point range withing [min, max] interval. There are 2000 floating point ranges kept as part of the Quantization Profile node.
As the result of the profiling procedure, Glow generates Scale and Offset parameters for all activations and dumps it in the file (Note, currently Glow uses linear quantization). The scale is a positive fp32 number while Offset is an int32. Note, Glow does not use the distribution of floating point numbers but relies only on values of min and max (this is a separate issue and out of scope here).
3) Transform computation graph based on the captured profile and specific execution backend. Note, not all nodes are quantized as not all backends support certain quantized op. See CPU backend op support here for example.
4) Perform computation of quantized graph. See Interpreter implementation here.
There are backends that tight to a specific quantization schema, e.g., symmetric quantization.
The difference between symmetric and assymetric quantization is that the "offset" parameter equals to 0 in symmetric linear quantization. In this case the dequantization follows the formula: fp32_number = scale * quantized_number (fp32_number = scale * (quantized_number - offset) for the asymmetric case).
The issue could be tackled in two steps:
Please discuss this issue here before implementation.
cc: @qcolombet
For starter, I was planning to introduce some code conditionally executed (via command line option) to force the offset to be zero.
That way we can start playing with it and see how it impacts the accuracy.
@rdzhabarov How does that sound?
@qcolombet I'm fine as long as it could be easily testable.
Another way would be to introduce quantizationSchema (or something like that) and pass it to the chooseQuantizationParams method. The schema itself can be controlled through the loader command line option.
Another way would be to introduce quantizationSchema (or something like that) and pass it to the chooseQuantizationParams method.
@rdzhabarov Works for me.
Do you think it should be more complicated than just an enum today?
In particular, do we want to support the cartesian product of (symmetric, asymmetric) x (zero offset, any offset)?
Do you think it should be more complicated than just an enum today?
Enum for symmetric, asymmetric seems to be fine.
I was looking at the uses of chooseQuantizationParams and some of them happens during our lowering process. Most of them request symmetric ranges, so that's technically not a big deal if we don't provide a schema there, but one ask for an asymmetric range: lowerQuantizedSigmoidNode.
@rdzhabarov Any ideas on how should we tell those nodes which schema to use?
It seems weird to me to have to surface the quantization schema in the lowering API.
Send a PR for the plumbing to the loader #1324.
The idea behind quantized sigmoid node is that CPU/Interpreter implementations are based on the prebuilt table which maps quantized input to quantized output (same for quantized tanh). Table contains a mapping (input -> output) for the input with the specific floating point range, that's why we need to select {S,O} to make sure input is within that range.
So if the backend does not want to use that mapping mechanism it will reject lowering of the quantized sigmoid/tanh. For the default case, CPU/Backend we'll make sure to restrict inputs of those nodes to some floating point range, that's why we need to find out what {S,O} correspond to that floating point range. I think it does not matter which schema will be used in the lowering of those nodes.
Introduce the symmetric schema and deal with the int8 activations (accuracy loss)
We're done with that step thanks to #1324.
For the second step, "Introduce int8, uint8 activations", @nadavrot pointed out that we could use Int8 as a canonical representation. Indeed, UInt8 with a zero offset is equal to Int8 with an offset of 128. More generally UInt8 with whatever offset O is equal to Int8 with offset O + 128. Therefore using only Int8, we simply fail to capture a small portion of what UInt8 would cover when O + 128 overflows and in my opinion, that's not worth adding a new UInt8Q type.
Now, for the motivating use case, UInt8 with offset 0, maybe we could get away with adding a new schema "symmetric with uint8 range", that would allow both offset == 0 (regular symmetric int8) and offset == 128 ("symmetric" uint8).
@rdzhabarov what do you think?
Indeed, UInt8 with a zero offset is equal to Int8 with an offset of 128.
I assume int8 with -128 offset.
Having int8 with values [-128, 127] it would require to add +128 to shift the interval to [0, 255] (given that +128 is calculated on int32 type). Which would result in offset to be reduced by 128 (given fp = x * ( q - offset)).
Now, for the motivating use case, UInt8 with offset 0, maybe we could get away with adding a new schema "symmetric with uint8 range"
From the software layer this is OK, but how would it translate to the hardware which does not support non-0 offset?
I assume int8 with -128 offset.
Exactly! I keep forgetting that the offset is subtracted not added :).
From the software layer this is OK, but how would it translate to the hardware which does not support non-0 offset
I was expecting this to be sorted out during lowering by each backend. Given that would be a specific schema, I would assume that the related support is in place before someone request it.
Admittedly I haven鈥檛 sorted out the details, I wanted to check first if it was somewhat reasonable. It sounds like it is so let us think what it practically means.
uint8 + symmetric schema support: https://github.com/pytorch/glow/pull/1444
Since, #1444 landed, do we have anything else left here?
Looks like we covered necessary parts! Thanks for working on this!
Most helpful comment
Since, #1444 landed, do we have anything else left here?