Glow: [Quantization of operators]

Created on 22 May 2020 · 29Comments · Source: pytorch/glow

Hi,
After visualizing the graph.pdf, I can not find the Batch Normalization in the pdf. So I want to know that how glow solves the bn operator when we use glow to quantize the resnet50 model. Does the batch normalization merge into the convolution operator? I find CPUMaxSplat in the graph and it is equal to the ReLU in the original computation graph?

Source

ybai62868

Most helpful comment

@ybai62868 The main goal of the operation is this: given an integer number N and a float number F the questions is how to approximate the operation N x F (multiplication between an integer and a float) using only integer operations.
The way the logic does this is to approximate the operations as:
N x F -> ((N >> pre) * scale) >> post where:

pre is a pre-shift value (right shift)
scale is a scale integer multiplier
post is a post-shift value (right shift)

Basically the float number F is approximated as F = scale / (2^(pre+post)).
Now there are a lot of details on how you choose pre, post, scale because there are mainly two tradeoffs: increase approximation accuracy vs avoiding overflow. For example best approximation of F is when both scale and post values are high (a float number is better approximated with high numerator/denominator) but having high value for scale increases the chance of overflow.
If you want to gain better understanding you should make some simulations in Python to check the approximation accuracy vs a pure floating-point reference.

mciprian13 on 9 Jun 2020

👍3

All 29 comments

thanks a lot

ybai62868 on 22 May 2020

@ybai62868 BatchNorm is fused into Convolution. CPUMaxSplat is same as Relu.

mciprian13 on 22 May 2020

@mciprian13 Hi, Do you know where can I find the code about merging Conv and BN? Thanks a lot. I want to know more details about it.

ybai62868 on 25 May 2020

@ybai62868 See this function where the core of the optimization is: https://github.com/pytorch/glow/blob/dae7a208ddbbf9ca94226bba79a611a63c227395/lib/Optimizer/GraphOptimizer/GraphOptimizer.cpp#L1588

jfix71 on 25 May 2020

Thanks a lot. After I get the data00xx, I want to know the result of each file. For example, if we have the following convolution type:
Type: convolution
Name: gpu_0_res2_0_branch2a__5
[0] Dest: data0012.txt i8[S:0.0255 O:-128][0.000,6.505]<1 x 56 x 56 x 64>
[1] Src: data0013.txt i8[S:0.0341 O:-128][0.000,8.703]<1 x 56 x 56 x 64>
[2] Filter: data0014.txt i8[S:0.0048 O:22][-0.718,0.502]<64 x 1 x 1 x 64>
[3] Bias: data0015.txt i32[S:0.0002 O:0][-350745.750,350745.750]<64>
Where can I find the corresponding code or formula about calculating data0012.txt by (data0013.txt, data0014.txt and data0015.txt).

ybai62868 on 29 May 2020

You can find the code here:
https://github.com/pytorch/glow/blob/master/lib/Backends/CPU/libjit/libjit_conv.cpp

mciprian13 on 29 May 2020

@mciprian13 Thanks a lot. I find the function named libjit_quantized_convolution_generic. And I want to ask some questions about what is the meaning of parameter "depthUnroll", "biasPre", "biasPost", "outPre", "outPost"

ybai62868 on 29 May 2020

@ybai62868
depthUnroll - is a loop unrolling factor along the depth dimension of the convolution
biasPre/biasPost/biasScale - are quantization parameters for the bias operand
outPre/outPost/outScale - are quantization parameters for the output operand
These quantization parameters are derived from the input/filter/bias/output scales parameters based on the logic here:
https://github.com/pytorch/glow/blob/952b1420610dff777fff4835b6d62557881fce1a/lib/LLVMIRCodeGen/LLVMIRGen.cpp#L1931-L1948

mciprian13 on 29 May 2020

@mciprian13 Thanks a lot. For example,
[3] Bias: data0015.txt i32[S:0.0002 O:0][-350745.750,350745.750]<64>
That means biasPre equals to -350745.750, biasPost equals to 350745.750 and biasScale equals to 0.0002.
Am I right?

ybai62868 on 29 May 2020

@ybai62868 No. The format of the print is [Scale, Offset][MinimumValue, MaximumValue]. So in that case:
Scale=0.0002
Offset=0
Minimum = -350745.750
Maximum = 350745.750
The BiasPre/BiasPost/BiasScale and OutPre/OutPost/OutScale are not printed anywhere and are derived only at run-time when calling the convolution kernel.

mciprian13 on 29 May 2020

@mciprian13 If I want to get the value of BiasPre/BiasPost/BiasScale and OutPre/OutPost/OutScale, what should I do ?

ybai62868 on 29 May 2020

@ybai62868 Put some printfs in the section of the code where are computed (I showed you were), recompile Glow, and compile again the model.

mciprian13 on 29 May 2020

Another question about quantization，“ libjit_quantized_conv2d_generic” or "libjit_channelwise_quantized_conv2d_generic" which one should glow choose for the default?

ybai62868 on 1 Jun 2020

The libjit_quantized_conv2d_generic kernel is used by default. The other kernel libjit_channelwise_quantized_conv2d_generic is used only if you specify the CLI flag enable-channelwise when compiling the bundle with model-compiler.

mciprian13 on 2 Jun 2020

@mciprian13 Thanks a lot. BTW, when I use the * libjit_quantized_conv2d_generic *, where can I find the value of the parameter "depthUnroll"?

void libjit_quantized_conv2d_generic(
ElemTy *outW, const ElemTy *inW, const ElemTy *filterW,
const BiasElemTy *biasW, const dim_t *outWdims, const dim_t *inWdims,
const dim_t *filterWdims, const dim_t *biasWdims, const dim_t *kernelSizes,
const dim_t *strides, const dim_t *pads, dim_t group, int32_t outOffset,
int32_t inOffset, int32_t filterOffset, int32_t biasOffset, int32_t biasPre,
int32_t biasPost, int32_t biasScale, int32_t outPre, int32_t outPost,
int32_t outScale, unsigned depthUnroll, dim_t dilation)
I can find all of the other parameters in the list of the function. But I can't find where to define "depthUnroll"

ybai62868 on 2 Jun 2020

@ybai62868 It is nearby the place where the derived quantization parameters are computed:
https://github.com/pytorch/glow/blob/349079dd18664214787517cbd0abd8241592404d/lib/LLVMIRCodeGen/LLVMIRGen.cpp#L1908

mciprian13 on 2 Jun 2020

@mciprian13 Thanks a lot. I have known all of the parameters in libjit_quantized_conv2d_generic. It seems that I can not use printfs in the section of the code to get the value of * biasPost, * biasScale and so on to re-implement the result of convolution in * data0012.txt *. Can you give me some suggestions? Thanks again.

ybai62868 on 2 Jun 2020

@mciprian13 Hi! Thanks for your help. I have successfully gotten the value of biasPost/biasPre/biasScale. And another question is that what the sequence of the data in data0012.txt ? We all know that if you want to change the 3-D or 4-D data to the 1-D data. How to store this data?

ybai62868 on 5 Jun 2020

@ybai62868 The data is stored in binary format, NHWC layout.

mciprian13 on 5 Jun 2020

Hi @mciprian13 ， I want to get the value of this variable. I insert the cout statment in the follwing segment of the code. But I can't find the output information about this variable when I compile the glow in /build using ninja all.
dim_t inChannels = inWdims[3];
dim_t outChannels = outWdims[3];
dim_t inCperG = inChannels / group;
dim_t outCperG = outChannels / group;
dim_t pad_t = pads[0];
dim_t pad_l = pads[1];
dim_t stride_h = strides[0];
size_t stride_w = strides[1];
size_t kernel_h = kernelSizes[0];
size_t kernel_w = kernelSizes[1];
// For each input in the batch:
for (size_t n = 0; n < inWdims[0]; n++) {
// For each group of input channels:
for (size_t g = 0; g < group; g++) {

  // For each output channel in the group. Process 'depthUnroll' output
  // layers together.
  for (size_t d = g * outCperG; d < (g + 1) * outCperG; d += depthUnroll) {
    // For each convolution 'jump' in the input tensor:
    ssize_t x = -(ssize_t)pad_t;
    for (size_t ax = 0; ax < outWdims[1]; x += stride_h, ax++) {
      ssize_t y = -(ssize_t)pad_l;
      for (size_t ay = 0; ay < outWdims[2]; y += stride_w, ay++) {
        int32_t sum[depthUnroll];

        **std::cout << "depthUnroll:" << depthUnroll << std::endl;**
        **std::cout << "biasSum: " << std::endl;**
        for (unsigned i = 0; i < depthUnroll; i++) {
          // Scale the bias to match the scale of the matrix multiplication.
          sum[i] = libjit_scale_i32i8((int32_t)biasW[d + i] - biasOffset,
                                      biasPre, biasPost, biasScale, 0);
          **std::cout << sum[i] << std::endl;**
        }
        **out << std::endl;**
        **out << std::endl;**


        // For each element in the convolution-filter:
        for (size_t fx = 0; fx < kernel_h; fx++) {
          for (size_t fy = 0; fy < kernel_w; fy++) {
            ssize_t ox = x + fx * dilation;

Do you know what should I do to get the output information about this variable?

ybai62868 on 5 Jun 2020

@ybai62868 I think I was clear enough. Put printfs/couts in the following section of the code:
https://github.com/pytorch/glow/blob/952b1420610dff777fff4835b6d62557881fce1a/lib/LLVMIRCodeGen/LLVMIRGen.cpp#L1931-L1948
and NOT in the kernel itself.
Can you remind me what do you want to do and why?

mciprian13 on 5 Jun 2020

Yes. i have successfully get the value of biasPre/biasPost/biasScale by the way you told me in the LLVMIRGen.
I use the data0xxx.txt and all of the parameters I have extracted from LLVMIRGen which includeint32_t filterOffset, int32_t biasOffset, int32_t biasPre,
int32_t biasPost, int32_t biasScale, int32_t outPre, int32_t outPost,
int32_t outScale. and use the function in https://github.com/pytorch/glow/blob/22f0e125ce623f2c088cb272b55ae10612c151fe/lib/Backends/CPU/libjit/libjit_defs.h and https://github.com/pytorch/glow/blob/22f0e125ce623f2c088cb272b55ae10612c151fe/lib/Backends/CPU/libjit/libjit_conv.cpp
When I follow all of the equations and parameters in these files. I can't re-produce the value which I extract from glow( such as a conv layer). I want to know which part I make the mistake.

ybai62868 on 5 Jun 2020

I use this python code to reproduce the gpu_0_res2_0_branch2a__5
https://paste.ubuntu.com/p/hBNkgpHnm3/
Maybe you can know what I want to do.
@mciprian13 Thanks again.

ybai62868 on 5 Jun 2020

@ybai62868 So you want to reproduce the results in Python? Why is that of interest to you?

mciprian13 on 5 Jun 2020

Because I want to get a deep understanding of how glow does the quantization ...

ybai62868 on 5 Jun 2020

@ybai62868 You can try first to reproduce the results using a simple C application. Extract the convolution kernel from libjit and try to call it with the data/parameters you extracted. After you reproduce the results in C you can move in Python.

mciprian13 on 5 Jun 2020

@mciprian13 Thanks a lot. I will use C++ as a try.

ybai62868 on 5 Jun 2020

@mciprain13 Hi, I have used C++ code to re-produce the QuanConv2D in glow. But I don't have a good understanding of the * QuantizationTransform32To8*? There may be some shifts and scale which I can't understand.
I find all of the code which is related to this part is as follows:
https://github.com/pytorch/glow/blob/22f0e125ce623f2c088cb272b55ae10612c151fe/lib/Quantization/Base/Base.cpp#L180-L305

Can you give me some suggestion about how to understand this operation?

ybai62868 on 8 Jun 2020