Glow: [RFC] isOpSupported() redesign?

Created on 18 Dec 2018  路  12Comments  路  Source: pytorch/glow

I'm not sure the isOpSupported() API makes sense as currently specified -- I think backends need to decide if an op is supported based on the expected precision of all inputs and outputs. Perhaps the API should take in the Node itself and not the Kind+ElemKind. That way the backend can look at all of the node's inputs/outputs.

Additionally, for backends that use the Instruction-level IR, I think we should consider adding a pass over the final Instructions with another version of isOpSupported() for Instructions. It's possible for a bug to exist during the compilation phases (lowering to other Nodes, and then lowering via IRGen, and many graph/instruction optimization passes) that results in the final Instruction list not being supported by the backend. This should be caught as early as possible during compilation and return a compilation error up the stack.

Most helpful comment

I believe we would want to go actually further in the redesign.

  1. Instead of returning a boolean, I think it would be interesting to return a cost, where UINT_MAX (or whatever) would mean not supported
  2. Instead of just an enum, I believe we want the actual glow type, so that we get the shape of the different operands as well.

The benefit of going all the way down to those details is that this API could be used for ONNXIFI and the partitioner.

Anyway, at the very least, I would expect these two use cases to share some logic and I wanted to point it out.

cc: @beicy.

All 12 comments

Definitely agree with the first point, the current API doesn't fully make sense to me because I doubt that op compatibility can be decided from a single ElemKind in all cases. The ONNXIFI interface currently just passes FloatTy to isOpSupported() instead of inspecting each ElemKind which isn't great. Like you said, maybe the op should be passed, maybe just a set of all ElemKinds in the op could be passed and isOpSupported() could just check to see that it supports all of them?

To your second point, it sound like a good idea to catch issues that could occur during compilation but this raises a point that it will be pretty difficult to look at a graph (like one passed from onnxifi) and decide if the backend will support all ops in the graph without actually compiling to low level IR and checking with this new isOpSupported(). It would be ideal if we could do enough checks on the high level IR to be confident that the low level IR will run on the backend without issue but that might not be possible.

maybe just a set of all ElemKinds in the op could be passed and isOpSupported() could just check to see that it supports all of them

I still think it's necessary to check what the precision/types of all inputs/outputs are for each op. E.g. for TopK, we have two outputs: Values with its associated supported types (FloatTy, Int8Ty, etc.), and Indices with its associated supported types (Int32ITy, Int64ITy, etc.). So a single set isn't sufficient.

It would be ideal if we could do enough checks on the high level IR to be confident that the low level IR will run on the backend without issue but that might not be possible.

Yeah, I think that's what we should strive for. Like I said I would consider it more of a bug in the lowering/IRGen/optimization stack if the backend says it supports the high-level Nodes, but then says it cannot support the compiled into low-level Instructions. And right now we don't check this and would only find out during inference execution -- e.g. in the Interpreter, if we attempt to execute an Instruction that has no kernel implementation, we'll hit an assertion error in InterpreterFunction::execute(). Or if we attempt to execute a kernel with unsupported types, then we would hit an assertion error inside a kernel's implementation when getting a handle of a Tensor with an incorrect type.

An additional wrinkle here is that we also use this API to determine if a node can be quantized on this backend, so we might not have a physical node to pass in. Maybe we actually need two distinct APIs here; and isOpSupported, and a canBeQuantized. Of course that doesn't necessarily cover the (pretty common) case of a node that _must_ be quantized, like GEMMs compiled for an int8 matrix multiplier.

Totally, original idea of isOpSupported was to invoke that during the quantization phase and make that slightly more generic rather than just canBeQuantized.
But it seems that it's not generic enough to cover all use cases, including compatibility mode for ONNXIFI.

I agree we should introduce isOpSupported with the node passed in which will be used in the ONNXIFI code path and canBeQuantized with the specified quantization precision needed for graph transformation/quantization.

I still don't have an opinion for what the right solution here, but I want to share some ideas that we talked about in the past.

isOpSupported() is a form of 'legalization'. Our input neural network may contain a nodes that are not supported (legal) on this hardware and the compiler has to translate the graph to a legal set of opcodes (legalize it).

The quantizer, optimizer and lowering phases need to ask the backend if it can handle some operation. This happens both when inspecting an existing node and before creating a new node. So, in theory the API that accepts some InstructionKind and ElementKind should be preferable. Except, that some operators don't fit into this API. For example, conversion nodes that convert between Float to Ints don't fit naturally because you need to select either Float or Int as the element type that you pass in. BTW, LLVM also uses the enum-based design and they also have issues with the strange conversion nodes.

@jfix71 , do you have an example of where the current API is problematic? Is it related to conversion nodes or other nodes?

isOpSupported() is a form of 'legalization'. Our input neural network may contain a nodes that are not supported (legal) on this hardware and the compiler has to translate the graph to a legal set of opcodes (legalize it).

In this case, if something is not supported on the hardware directly but we know compiler will make sure this op will be transformed in such a way it could be supported by HW we should just flag op as supported. The idea of the API is to let know if we'll be able to execute given op with given input/output types on hardware, whether it's directly or through some compiler transformation it's not important.

What would we consider as an ElemType in case of isOpSupported() for nodes with multiple inputs.

do you have an example of where the current API is problematic?

For example, Gather operator (several inputs) has indices which might be int32 or int64. Up until recently we had support for only int64, but were not able to communicate that int32 is not supported on ONNXIFI level.

For canBeConverted(), there are also cases where we need to know the different ElemKind for each input. For example, we currently only support Int32QTy for the bias in the Interpreter's Conv kernel, while the other operands must be Int8QTy. And BatchedAdd supports both Int8QTy and Int32QTy for slice, while other operands must be Int8QTy. Etc.

So to me it seems the only difference between canBeConverted() and isOpSupported() is that isOpSupported() is used with a Node that already exists, no? If so, then I think we could keep them as a single API, with signature:

bool isOpSupported(Kinded::Kind opKind, 
                   ArrayRef<ElemKind> inElemKinds,
                   ArrayRef<ElemKind> outElemKinds);

Then if you want to call isOpSupported() with a Node that already exists then just get the inElemKinds/outElemKinds from the Node and pass them in (could be done with a helper function).

One downside with the above proposed API that only uses ElemKinds is that you have to know the index of the input/output you'd like to check in inElemKinds/outElemKinds. E.g. to check if a Conv's bias is some supported ElemKind you need to know which idx it is in the ConvNode to get the right ElemKind from inElemKinds. We have a few places where we do hardcode these input/output indices, e.g. to use with setNthInput() etc., but it's bad practice.

I think we would be better off having these indices autogenerated regardless of if we use the above API. I.e. have the NodeBuilder generate an enum in each Node for its input/output indices -- so e.g. we'd have ConvolutionNode::BiasIdx to use with setNthInput(), or in inElemKinds, etc.

I believe we would want to go actually further in the redesign.

  1. Instead of returning a boolean, I think it would be interesting to return a cost, where UINT_MAX (or whatever) would mean not supported
  2. Instead of just an enum, I believe we want the actual glow type, so that we get the shape of the different operands as well.

The benefit of going all the way down to those details is that this API could be used for ONNXIFI and the partitioner.

Anyway, at the very least, I would expect these two use cases to share some logic and I wanted to point it out.

cc: @beicy.

Yet one more weird wrinkle here, which @jackm321 and I were discussing earlier today: we could have a situation where an op is unsupported, but is frequently optimized away, so we maybe don't want onnxifi to give up on it (I hit this with batchnorm, although there's no reason I _couldn't_ implement batchnorm, so maybe it's not a fundamental problem.)

@bertmaher Don't we lower the graph here before checking if it's supported? I wouldn't expect you to need to support batchnorm since it's lowered by default, unless you don't support some of its lowered nodes.

But yeah there is a potential issue with the optimizations. Perhaps we should call ExecutionEngine::optimizeFunction() on the graph instead of just lowering it. Of course if we were to call it again later on during compilation it would replicate a lot of work, so perhaps we could keep the optimized function around if we're going to use it? Not sure how feasible that is with the current design.

Still, I think this issue is a bit orthogonal to redesigning isOpSupported().

@qcolombet Good idea! I think we should also have a cost function for a subgraph, since e.g. we could we could see a Conv followed by a ReLU, and they have some cost individually, but some architectures could fuse the ReLU into the Conv at no cost. Though if we call ExecutionEngine::optimizeFunction() prior to checking this function then perhaps it would be fused anyway. But still, perhaps the subgraph cost function could call into the new isOpSupported() to help calculate the overall cost of the subgraph.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rdzhabarov picture rdzhabarov  路  4Comments

qcolombet picture qcolombet  路  5Comments

s-peryt picture s-peryt  路  3Comments

georgeokelly picture georgeokelly  路  4Comments

tkclimb picture tkclimb  路  4Comments