Glow: Support for optional inputs/operands in nodes/instructions?

Created on 5 Jun 2019  路  13Comments  路  Source: pytorch/glow

Glow generally tries to limit the amount of mandatory nodes that need to be supported by each backend. This allows for faster bootstrapping of new backends. And it limits the complexity.

There are some nodes which can be expressed in terms of other nodes, but most backends would like to use a different form of them:

  • For example, there is FullyConnectedNode and MatMulNode. The difference is that MatMuNodel does not have a bias parameter. Typically, Glow would lower FullyConnectedNode into MatMulNode followed by a BatchedAddNode.

    • Another example would be the SparseLengthsWeightedSum node. SparseLengthsSum operation would be always lowered by the loader into SparseLengthsWeightedSum where the weights are all 1.0. This way Glow doesn't need to define a dedicated SparseLengthsSum node. (in reality there are 3 different SparseLengthsWeightedSum-based operations and thus Glow saves on definitions of 3 non-weighted variants of these operations). But some backends actually want to be able to use a non-weighted version SparseLengthsSum kernels because they are more efficient. To do that, the backends may need to define backend-specific nodes or instructions to represent those non-weighted versions of the operation and eventually introduce custom graph transforms or IRGen changes (see e.g. #3039 for a discussion about this topic).

One of the problems with these examples is the fact that essentially one of the inputs/operands (e.g. bias in FC or weights in SLWS) is optional in the alternative versions of these operations. E.g. FC without a bias becomes MatMul. And SparseLengthsWeightedSum without weights becomes SparseLengthsSum. But it is currently impossible to express an optional input/operand and thus a new node or instruction is defined today.

What if Glow would allow for a definition of optional inputs/operands? This idea was mentioned today in discussions with @bertmaher and @artemrakhov about these topics, so I decided to capture it in this issue.

With optional inputs/operands, the number of nodes/instructions definitions does not need to be increased in many cases, but some special handling of these optional inputs/operands needs to be introduced.

Specifically, the major changes would be something like this:

  • We could imagine that NodeGen/InstrGen could mark such operands in a special way to indicate that they can be optional and then the auto-generated code for nodes and instructions would have assertions to check of all the mandatory inputs/operands are provided, but would relax those checks for optional inputs/operands.
  • Of course, all the places working with such inputs/operands would need to be inspected/updated to be prepared to work with a missing input/operand. We have some examples of similar places already, e.g. those ones working with Concat nodes and the like, who have essentially a variadic number of operands.

Most helpful comment

Also cc @mortzur who suggested that SLWS and SLS could be represented as a single optionally-weighted node type.

All 13 comments

Also cc @mortzur who suggested that SLWS and SLS could be represented as a single optionally-weighted node type.

I'd like to add, that optional operands will not complicate existing loaders. And backends, that don't support non-weighted SLS, can remain unchanged.

We can have loaders always adding Splat(1) for weights, as they are now. And then we'll have a separate optimization pass, which would pattern-matches weights=1 and (if backends support it) replace weights with NULL.

I like this idea. Some random questions/thoughts:

  • If a Node has an optional input, what do we return from Node::getNumInputs()? This seems like it could be confusing whether or not we count optional inputs. IMO we should still return maximum number of inputs, i.e. not subtract off optional inputs if they aren't present. Otherwise things would get super messy if we treat Nodes with optional inputs like they have variadic number of inputs like Concat has.

  • I don't really like the idea of just setting an input to be a nullptr if it's optional and not present/used. IMO it would be safest to make optional inputs explicitly different from required inputs. Perhaps we could use something similar to llvm::Expected<NodeValue> to return these optional inputs, so we are forced to update all locations where optional inputs are used, and it's very clear what's going on.

  • This change is going to add more complexity/cognitive load in reasoning about our IR, almost definitely resulting in new bugs as we transition things, and as we write new code at least in the short/medium term as we adjust to this new functionality.

    • The upside here of not needing many new Nodes is probably worth the pain right now anyway. But I think we should try out a beta version of this on a Node and think really hard about what subtle issues we might be overlooking.

I'm having the same situation with MaxPool, the tradeoff is between having more kinds nodes or making sure that every backend is aware of optional inputs/outputs and can handle them correctly. Having clearly defined mechanisms for optionals would definitely solve this.

@jfix71

If a Node has an optional input, what do we return from Node::getNumInputs()? This seems like it could be confusing whether or not we count optional inputs. IMO we should still return maximum number of inputs, i.e. not subtract off optional inputs if they aren't present. Otherwise things would get super messy if we treat Nodes with optional inputs like they have variadic number of inputs like Concat has.

+1. I was thinking along the same lines, so we are on the same page.

I don't really like the idea of just setting an input to be a nullptr if it's optional and not present/used. IMO it would be safest to make optional inputs explicitly different from required inputs. Perhaps we could use something similar to llvm::Expected to return these optional inputs, so we are forced to update all locations where optional inputs are used, and it's very clear what's going on.

Sounds reasonable. But I don't have a strong opinion on this.

To piggy back on this discussion a bit, we also have the need for nodes with a variable number of outputs.

@jfix71

Per your suggestion, Node::getNthInput should return an optional as one of the inputs may be optional. And for specific nodes,get<NamedInput> would return an optional if this corresponding input is declared as optional.

But what do you suggest for Instructions? We have llvm::ArrayRef<Operand> Instruction::getOperands(). Should it return llvm::ArrayRef<std::Optional<Operand>>? Or maybe the type should stay unchanged but the returned Operand looks special, e.g. it may have a Value * which is nullptr or points to a special sentinel value? Do you have any other alternatives to consider?

What is the main purpose of minimizing the number of Glow node types?
It seems it should be minimizing duplicated code in order to avoid errors-duplication and make maintenance easier.

This creates the motivation to have a single umbrella node which has different flavors according to provided input arguments.
All of the input arguments can be listed as fields in the umbrella node type class.
This also means that the argument-dependent inputs/outputs are also represented as fields in this class. This way verification and counting the number of input seem simple using the provided arguments.

At the other end of the process we should make sure that redundant structs are not allocated in the backend IR / nodes and that the specific correct kernel is selected. This means that the backend should query about the specific flavor of the op.

What would be the cons for this approach?

@opti-mix @jfix71 @bertmaher

Isn't the case of FC/MatMul a little bit different from the SLS/SLWS (or maxPool with argmax)?
Adding a bias translates to 'Add' op which is a 'building block' op in Glow's language. This is different from 'weights' in SLS which cannot be easily separated to 2 'building block' ops following each other.

Isn't the case of FC/MatMul a little bit different from the SLS/SLWS (or maxPool with argmax)?
Adding a bias translates to 'Add' op which is a 'building block' op in Glow's language. This is different from 'weights' in SLS which cannot be easily separated to 2 'building block' ops following each other.

If we would support optional inputs, we could get rid of MatMul and have only FC (I'm not saying that is is necessarily a very good idea). MatMul would be then modeled as FC without bias (or with a zero bias). Thus you wouldn't be able to express FC as a composition of two building block operations, which makes it similar to the SLS example.

In general, this issue has something to do with the following question:

  • On the one hand, we try to keep the number of nodes/instructions as small as possible (especially if nodes/instructions are very similar), as it introduces less code redundancy, makes adding new backends easier, etc. It is a smaller surface to be covered.
  • On the other hand, by introducing more complex nodes and instructions, e.g. with optional inputs or outputs, with some custom attributes depending on other inputs and so on, we increase the complexity of handling such instructions in the compiler. We may need now to check what is present and what is not present and special case the handling accordingly. This makes the code more complex and also increases the amount of code to be written to handle such nodes/instructions.

It seems like there is always a trade off in such cases and we should in each case try to see if pros outweigh the cons.

IMHO, overall, we should try to keep nodes/instructions simple unless there are good reasons to consider introducing more complex forms of nodes/instructions (e.g. significant reduction of a boiler-plate code in the compiler, simplification of some passes, etc).

From pure infrastructure / code complexity point of view, there is another solution.

We can introduce a parameter: isWeighted. It is a statically known member, not input. If isWeighted=false, then we don't care what is Weights input. It could be anything, e.g. scalar tensor, or something special. All verification checks will be disabled.
Backend is free to do whatever it wants with it: e.g. can say that isWeighted=false is not supported, and we'll have optimization pass to turn isWeighted=false to true and insert fake Splat(1) of correct size.

With that solution, we won't need special NULL-handling logic. No need to support optional inputs, and write complicated compiler logic. All that weirdness will be hidden inside SLWS node and not spread across compiler.

Disadvantage is O(1) bytes of wasted memory for dummy Weights, and overall hacky logic of having garbage-placeholder.

@artemrakhov Yeah I'm warming up to this sort of idea. I think we need to do something special to handle the problem mentioned in https://github.com/pytorch/glow/pull/3039#issuecomment-498791933 though. Perhaps this could be the idea @opti-mix mentioned previously, of introducing a special "Optional" Node and using it as the input. I think it will end up being more elegant and less painful than requiring Nodes to return an llvm::Optional or nullptr etc. when getting an input.

In this design, in NodeGen we could also specify whether an input can be an Optional, and still have the "pass to turn isWeighted=false to true and insert fake Splat(1) of correct size". This would be specific for each Node, e.g. for SLWS -> SLS it would be Weights as Splat with Value 1.0, for MM -> FC it would be Bias as Splat with Value 0.0, etc. And we could add a verification pass to the Graph that checks that OptionalNodes are only used as inputs for Nodes that have specified an input is Optional.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rdzhabarov picture rdzhabarov  路  4Comments

tlepley-cadence picture tlepley-cadence  路  4Comments

s-peryt picture s-peryt  路  3Comments

opti-mix picture opti-mix  路  4Comments

rdzhabarov picture rdzhabarov  路  4Comments