In our previous quantization procedure, we forced the following type of quantized node output have the same quantization param as the input:
Kinded::Kind::LocalResponseNormalizationNodeKind
Kinded::Kind::SliceNodeKind
Kinded::Kind::ReshapeNodeKind
Kinded::Kind::TopKNodeKind
Kinded::Kind::GatherNodeKind
Kinded::Kind::MaxPoolNodeKind
However, as @tlepley-cadence suggested, since it is generally an accuracy vs speed tradeoff that should be left to the backend designer. We would like to evaluate those ops, and leave some or all of the decision to backends.
With the current approach of forcing at quantization time having the same scale/offset for both input & output, we have seen sometimes that this rule could be broken by later optimizations.
@tlepley-cadence @beicy I think that it would be a good idea to write a short proposal for the desired semantics. We need to decide on a set of rules and make sure that the optimizer and backends conform to these rules. Thierry mentioned that it would be a good idea to allow nodes to have different input and output types in the early stages of the pipeline and let the backends force equal quantization parameter. There could be other options. Could you come up with a proposal for the preferred semantics in the compiler?
@nadavrot Agree. I haven't checked all the nodes which now we force the input and output have the same quantization type so far. But we did some discussion on avgpool and maxpool before and think it is reasonable to let maxpool have the same type while avgpool may have different type. I will investigate the rest of nodes and come up with a proposal.
Right now the constraints basically match whatever the interpreter does.
Going in the direction of more flexibility on that front probably mean we probably need a legalization step to tweak these constraints to match the actual backend we are targeting.
Checked the list of nodes. For the nodes in this list:
Kinded::Kind::LocalResponseNormalizationNodeKind
Kinded::Kind::SigmoidNodeKind
Kinded::Kind::SliceNodeKind
Kinded::Kind::ReshapeNodeKind
Kinded::Kind::TanhNodeKind
Kinded::Kind::TopKNodeKind
Kinded::Kind::GatherNodeKind
Kinded::Kind::MaxPoolNodeKind
The output range is included into the input range. Therefore, I don't think there is accuracy loss if we force the output have the same quantization params as the input. On the other hand, I don't think using different params would help us gain some accuracy or performance.
Therefore, for the nodes in the list , I think it is OK to leave them as they are. For the rest of nodes which are not in the list, the backends can decide if they want to force the same quantization params or not. But this won't affect our current design.
According to my previous comment, I think it is OK to close this issue. Please let me know if you have any suggestion! Thanks! @tlepley-cadence @nadavrot @qcolombet @rdzhabarov
Most helpful comment
Right now the constraints basically match whatever the interpreter does.
Going in the direction of more flexibility on that front probably mean we probably need a legalization step to tweak these constraints to match the actual backend we are targeting.