Glow: [quantization] Add symmetric quantization schema support to Glow

Created on 24 Jul 2018 · 14Comments · Source: pytorch/glow

Current state

The latest version of Glow compiler provides a certain level of quantization support.

Quantization process consists of the following steps:
1) Instrument an original floating-point graph with the special Quantization Profile node

2) Run inference on the instrumented graph many times with good representative inputs (see dump_profile flag).
During this step Glow automatically captures the distribution of all activations in the compute graph. The distribution contains min/max value seen from the given output of the graph node as well as detailed frequency per floating point range withing [min, max] interval. There are 2000 floating point ranges kept as part of the Quantization Profile node.
As the result of the profiling procedure, Glow generates Scale and Offset parameters for all activations and dumps it in the file (Note, currently Glow uses linear quantization). The scale is a positive fp32 number while Offset is an int32. Note, Glow does not use the distribution of floating point numbers but relies only on values of min and max (this is a separate issue and out of scope here).

3) Transform computation graph based on the captured profile and specific execution backend. Note, not all nodes are quantized as not all backends support certain quantized op. See CPU backend op support here for example.

4) Perform computation of quantized graph. See Interpreter implementation here.

What needs to be enhanced

There are backends that tight to a specific quantization schema, e.g., symmetric quantization.
The difference between symmetric and assymetric quantization is that the "offset" parameter equals to 0 in symmetric linear quantization. In this case the dequantization follows the formula: fp32_number = scale * quantized_number (fp32_number = scale * (quantized_number - offset) for the asymmetric case).

Requirement for the symmetric quantization

Need to make sure that the accuracy loss is comparable with the asymmetric case on Resnet50. Important note, currently, activations are signed int8 and in case of symmetric quantization, outputs of Sigmoid/ReLU will effectively use only 7 bits out of 8 (which is undesirable).

The issue could be tackled in two steps:

Introduce the symmetric schema and deal with the int8 activations (accuracy loss)
Introduce int8, uint8 activations. Check if specific backend can handle int8/uint8 activations for a given Op (this way we could gradually onboard backends). This is a bigger change and would require a design discussion here.

Please discuss this issue here before implementation.
cc: @qcolombet

Source

rdzhabarov

👍1

Most helpful comment

Since, #1444 landed, do we have anything else left here?

qcolombet on 14 Aug 2018

❤1 👍1

All 14 comments

For starter, I was planning to introduce some code conditionally executed (via command line option) to force the offset to be zero.

That way we can start playing with it and see how it impacts the accuracy.

@rdzhabarov How does that sound?

qcolombet on 24 Jul 2018

@qcolombet I'm fine as long as it could be easily testable.

Another way would be to introduce quantizationSchema (or something like that) and pass it to the chooseQuantizationParams method. The schema itself can be controlled through the loader command line option.

rdzhabarov on 25 Jul 2018

Another way would be to introduce quantizationSchema (or something like that) and pass it to the chooseQuantizationParams method.

@rdzhabarov Works for me.

Do you think it should be more complicated than just an enum today?
In particular, do we want to support the cartesian product of (symmetric, asymmetric) x (zero offset, any offset)?

qcolombet on 25 Jul 2018

Do you think it should be more complicated than just an enum today?

Enum for symmetric, asymmetric seems to be fine.

rdzhabarov on 25 Jul 2018

👍1

I was looking at the uses of chooseQuantizationParams and some of them happens during our lowering process. Most of them request symmetric ranges, so that's technically not a big deal if we don't provide a schema there, but one ask for an asymmetric range: lowerQuantizedSigmoidNode.

@rdzhabarov Any ideas on how should we tell those nodes which schema to use?

It seems weird to me to have to surface the quantization schema in the lowering API.

qcolombet on 25 Jul 2018

Send a PR for the plumbing to the loader #1324.

qcolombet on 25 Jul 2018

👍1

The idea behind quantized sigmoid node is that CPU/Interpreter implementations are based on the prebuilt table which maps quantized input to quantized output (same for quantized tanh). Table contains a mapping (input -> output) for the input with the specific floating point range, that's why we need to select {S,O} to make sure input is within that range.

So if the backend does not want to use that mapping mechanism it will reject lowering of the quantized sigmoid/tanh. For the default case, CPU/Backend we'll make sure to restrict inputs of those nodes to some floating point range, that's why we need to find out what {S,O} correspond to that floating point range. I think it does not matter which schema will be used in the lowering of those nodes.

rdzhabarov on 25 Jul 2018

👍1

Introduce the symmetric schema and deal with the int8 activations (accuracy loss)

We're done with that step thanks to #1324.

qcolombet on 26 Jul 2018

For the second step, "Introduce int8, uint8 activations", @nadavrot pointed out that we could use Int8 as a canonical representation. Indeed, UInt8 with a zero offset is equal to Int8 with an offset of 128. More generally UInt8 with whatever offset O is equal to Int8 with offset O + 128. Therefore using only Int8, we simply fail to capture a small portion of what UInt8 would cover when O + 128 overflows and in my opinion, that's not worth adding a new UInt8Q type.

Now, for the motivating use case, UInt8 with offset 0, maybe we could get away with adding a new schema "symmetric with uint8 range", that would allow both offset == 0 (regular symmetric int8) and offset == 128 ("symmetric" uint8).

@rdzhabarov what do you think?

qcolombet on 7 Aug 2018

Indeed, UInt8 with a zero offset is equal to Int8 with an offset of 128.

I assume int8 with -128 offset.
Having int8 with values [-128, 127] it would require to add +128 to shift the interval to [0, 255] (given that +128 is calculated on int32 type). Which would result in offset to be reduced by 128 (given fp = x * ( q - offset)).

Now, for the motivating use case, UInt8 with offset 0, maybe we could get away with adding a new schema "symmetric with uint8 range"

From the software layer this is OK, but how would it translate to the hardware which does not support non-0 offset?

rdzhabarov on 8 Aug 2018

👍1

I assume int8 with -128 offset.

Exactly! I keep forgetting that the offset is subtracted not added :).

From the software layer this is OK, but how would it translate to the hardware which does not support non-0 offset

I was expecting this to be sorted out during lowering by each backend. Given that would be a specific schema, I would assume that the related support is in place before someone request it.

Admittedly I haven’t sorted out the details, I wanted to check first if it was somewhat reasonable. It sounds like it is so let us think what it practically means.

qcolombet on 8 Aug 2018

👍1

uint8 + symmetric schema support: https://github.com/pytorch/glow/pull/1444

rdzhabarov on 11 Aug 2018

Since, #1444 landed, do we have anything else left here?

qcolombet on 14 Aug 2018

❤1 👍1

Looks like we covered necessary parts! Thanks for working on this!

rdzhabarov on 14 Aug 2018

Was this page helpful?

0 / 5 - 0 ratings