Glow: [Glow Runtime] Top Level Task

Created on 17 Nov 2018  路  1Comment  路  Source: pytorch/glow

This is the top level issue to track all the work we plan to do to make the glow runtime supports concurrent execution, pipelining, batching and so on.

At a high level, the idea for the runtime is to be able to:

  • Enqueue inputs: Run input0, then run input1 as soon as the previous run is done, etc.
  • Slice the inputs into batch size and transparently run them: Take N input and sequentially run them in batches of M (where M is the size of the compiled model and N the actual run size.)
  • Pipeline work across models: Run input1 on model M1, then run the result of M1 on M2 while running input2 on M1, etc.

Among other things, the glow runtime will have to:

  • Manage input/output queues for each model (and communication with the devices)
  • Manage incoming model
  • Keep track of data dependencies and schedule next tasks to be done
  • Split inputs
  • Pad inputs
  • Dispatch workload on device
  • Keep track of the status of devices
    Also, somewhat orthogonal to the runtime, but related, glow will need to:
  • Determine what and where to run things (graph partitioning)

Right now, we started by splitting the compilation and runtime stages properly.
This work is tracked in:

2040, #1967, #1953, #1951

Most helpful comment

Adding #2125 to the list. Work is being done on the HostManager. This is part of the new runtime design. The design has five major components: HostManager, Partitioner, Provisioner, DeviceManager, and Executor.

Partitioner:
This component is responsible for breaking up the provided network into subnetworks that can be run on multiple devices. It does its partitioning based on hardware constraints and heuristics to optimize execution time. It outputs a DAG to be used by the other components.

DeviceManager:
The DeviceManager handles interactions with the device. The manager handles initializing the device, copying constants to the device and preparing the device for execution. It also handles unloading networks from the device.

Provisioner:
The Provisioner takes the output from the partitioner and assigns sub networks to specific devices updating the DAG with device assignments. The Provisioner handles the code generation part of compilation and calls into the device manager to load the subnetworks to the device.

Executor:
The Executor handles the execution of the network. It walks the DAG calling execution of each sub network in accordance with their dependencies.

HostManager:
The HostManager is the container for the other components. It serves as the interface externally, handling network init and run requests. The HostManager routes a request through the other components and holds the DAGs for each network.

>All comments

Adding #2125 to the list. Work is being done on the HostManager. This is part of the new runtime design. The design has five major components: HostManager, Partitioner, Provisioner, DeviceManager, and Executor.

Partitioner:
This component is responsible for breaking up the provided network into subnetworks that can be run on multiple devices. It does its partitioning based on hardware constraints and heuristics to optimize execution time. It outputs a DAG to be used by the other components.

DeviceManager:
The DeviceManager handles interactions with the device. The manager handles initializing the device, copying constants to the device and preparing the device for execution. It also handles unloading networks from the device.

Provisioner:
The Provisioner takes the output from the partitioner and assigns sub networks to specific devices updating the DAG with device assignments. The Provisioner handles the code generation part of compilation and calls into the device manager to load the subnetworks to the device.

Executor:
The Executor handles the execution of the network. It walks the DAG calling execution of each sub network in accordance with their dependencies.

HostManager:
The HostManager is the container for the other components. It serves as the interface externally, handling network init and run requests. The HostManager routes a request through the other components and holds the DAGs for each network.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pjaaskel picture pjaaskel  路  4Comments

artemrakhov-glow picture artemrakhov-glow  路  4Comments

jackm321 picture jackm321  路  3Comments

speryt picture speryt  路  3Comments

s-peryt picture s-peryt  路  3Comments