One: [circle-quantizer] Support requantization from int8 tflite to uint8 circle

Created on 12 Aug 2020 · 8Comments · Source: Samsung/ONE

We'll support requantization between models quantized with different schemes. This will enable
1) to run models quantized by other frameworks on uint8-based hardware.
2) to use test data generated by other frameworks in ONE.

In this issue, we support requantization from int8 tflite (quantized by TF 2.x) to uint8 circle.

Workflow

int8 tflite -> int8 circle (by tflite2circle) -> uint8 circle (by circle-quantizer)

Requantization (int8 -> uint8)

TF 2.x int8 quantization
activation: range -128~127, type int8
weights: range -127~127, type int8
bias: type int32

ONE uint8 quantization
activation: range 0~255, type uint8
weights: range 0~255, type uint8
bias: type int32

For requantization, we shift the zero points and the weights of the source model (int8 tflite) by 128 while keeping the values of scales.

arequant

Source

jinevening

👍1

Most helpful comment

Is this transformation correct? Should not the zero_point == 0 in the int8 case (isn't it symmetric)?

I think this is correct (please tell me if it is wrong).

In tflite int8 model, zero point of weights is always zero as you mentioned.

For requantization (int8 -> uint8), we add 128 to the weights values to shift the int8 weights (-127~127) to uint8 range (1~255). (0 is not used as symmetric int8 uses only 254 numbers). zero point should be also shifted as the range is changed.

This can be shown with simple math.

real value = scale * (Q_i8 - zp_i8) = scale * (Q_u8 - zp_u8)

where,
scale: int8 scale (same with uint8 scale)
Q_i8 : int8 quantized weights
zp_i8 : int8 zero point
Q_u8 : uint8 quantized weights
zp_u8 : uint8 zero point

Since we decided to keep the scale of source model (int8), the scale is the same. Then, the equation becomes

Q_i8 - zp_i8 = Q_u8 - zp_u8

As Q_u8 = Q_i8 + 128 (to shift the range) and zp_i8 = 0 (symmetric quant), the equation becomes

Q_i8 - 0 = Q_i8 + 128 - zp_u8

Therefore,

zp_u8 = 128 for weights.

For activation, zp_i8 is non-zero. In this case, zp_u8 = 128 + zp_i8

jinevening on 12 Aug 2020

😄1 👍1

All 8 comments

@jinevening Just curiosity, what is the purpose of quantization with TF 2.x? I thought circle-quantizer would do that.

mhs4670go on 12 Aug 2020

Just curiosity, what is the purpose of quantization with TF 2.x?

The major reason is to generate the test data (inference result) for CWQ uint8 quantized model. Since there is no proved kernel for CWQ uint8 quantized model, we're going to make the test data by running CWQ int8 quantized model on TFLite interpreter and converting the result to uint8.

Another reason is to run models quantized by TF 2.x on the hardware based on uint8 quantization.

jinevening on 12 Aug 2020

👍1

@jinevening

For requantization, we shift the zero points and the weights of the source model (int8 tflite) by 128 while keeping the values of scales.

Is this transformation correct? Should not the zero_point == 0 in the int8 case (isn't it symmetric)?

s-barannikov on 12 Aug 2020

Is this transformation correct? Should not the zero_point == 0 in the int8 case (isn't it symmetric)?

I think this is correct (please tell me if it is wrong).

In tflite int8 model, zero point of weights is always zero as you mentioned.

This can be shown with simple math.

real value = scale * (Q_i8 - zp_i8) = scale * (Q_u8 - zp_u8)

where,
scale: int8 scale (same with uint8 scale)
Q_i8 : int8 quantized weights
zp_i8 : int8 zero point
Q_u8 : uint8 quantized weights
zp_u8 : uint8 zero point

Since we decided to keep the scale of source model (int8), the scale is the same. Then, the equation becomes

Q_i8 - zp_i8 = Q_u8 - zp_u8

As Q_u8 = Q_i8 + 128 (to shift the range) and zp_i8 = 0 (symmetric quant), the equation becomes

Q_i8 - 0 = Q_i8 + 128 - zp_u8

Therefore,

zp_u8 = 128 for weights.

For activation, zp_i8 is non-zero. In this case, zp_u8 = 128 + zp_i8

jinevening on 12 Aug 2020

😄1 👍1

@jinevening Thank you for the detailed explanation. I read the description wrong -- I understood it as if you wanted to convert uint8 -> int8 (e.g. the other way around), which is not possible if zp_u8 != 127 (or 128) if we are speaking about symmetric i8.
Sorry for the noise.

int8 -> uint8 transformation seems reasonable, although requantizing the model could increase the accuracy (symmetric quantization does not use the whole range integer range because zero_point has to be zero).

s-barannikov on 12 Aug 2020

Sorry for the noise.

Never mind. :) I wanted to write details somewhere, and this issue became the proper place.

although requantizing the model could increase the accuracy (symmetric quantization does not use the whole range integer range because zero_point has to be zero).

Agreed. If we use full range (0~255) instead of 1~255, requantization may increase accuracy. But it will change scales, which makes things complicated. I'd like to leave the item as future work.

jinevening on 12 Aug 2020

If we use full range (0~255) instead of 1~255, requantization may increase accuracy.

Not only that.
Let's say we have min = 1.000 and max = 1.270.
With i8 symmetric quantization, the quantized values will be in range [100, 127], which is only ~10% of the whole range [-127; 127]. Converting from i8 asymm to u8 symm by adding 127 to lower / upper bounds will not increase coverage, it will remain 10%.
As you mentioned, we could not only _shift_, but also _rescale_ the values, but this won't increase precision, while it can get worse because of rounding errors.
The only way to fully utilize the u8 range in this case is to requantize the model. E.g. take the _float_ model, gather min / max statistics, evaluate u8 scale and zero_point, etc.

s-barannikov on 12 Aug 2020

With i8 symmetric quantization, the quantized values will be in range [100, 127], which is only ~10% of the whole range [-127; 127]. Converting from i8 asymm to u8 symm by adding 127 to lower / upper bounds will not increase coverage, it will remain 10%.

Right. :) Our implementation will make the converted uint8 model have the same accuracy with the source int8 model.

we could not only shift, but also rescale the values, but this won't increase precision, while it can get worse because of rounding errors.

Well, I'm not sure with this. Rescaling (decreasing a scale from (max-min)/254 to (max-min)/255) may reduce quantization errors because it can express a given range (min~max) with more numbers (256 rather than 255). The actual errors may depend on the distribution of weights values but precision will get better in general.

The increase of precision would be very small in i8 symm -> u8 asymm conversion, but it can be huge in requantization to more bits, e.g., i8 symm -> i16 symm (using 2^16 - 1 numbers rather than 255 to express a given range of floating point values).

jinevening on 13 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Channel-wise INT16 quantization support

jinevening · 3Comments

[tflitefile_tool] occured "select_operator.py" error.

hasw7569 · 4Comments

How can I get ruy to use multiple cores?

ragmani · 4Comments

[cker/ruy] EXPERIMENTAL_RUY_FEATURE flag not working for android

periannath · 3Comments

Compiler FE: support Shape op in luci-interpreter

mhs4670go · 4Comments