Ignite: Ignite on multiple GPUs

Created on 11 Jun 2020  路  2Comments  路  Source: pytorch/ignite

The Ignite quickstart is an exciting guide showing the essentials for define and training a simple model. But the examples provided which use multi GPU training do not seem to follow the same simplicity.

Would it be difficult to do something as shown in the code snippet below?

trainer.run(train_loader, max_epochs=100, gpus=[0,1,2,3])

Is there a tutorial that shows how simple it would be to train a model in a multi GPU environment using Ignite?

question

Most helpful comment

@Ceceu thanks for asking ! Currently, in stable v0.3.0 release we relies only on native torch distributed API. Example of that can be found here. User needs to manually setup distributed proc group, wrap model with nn.parallel.DistributedDataParallel and execute the script with torch.distributed.launch tool, or use mp.spawn...

However, we aim to simplify this by providing a helper API to work on GPUs, TPUs etc.
The API is still experimental and will be available with v0.4.0 (probably released the next week).

In nightly version we provide a part of the newer API idist: https://pytorch.org/ignite/distributed.html#ignite-distributed

For a complete example of newer API, please, checkout the same cifar10 example in the branch parallel_api.

HTH

All 2 comments

@Ceceu thanks for asking ! Currently, in stable v0.3.0 release we relies only on native torch distributed API. Example of that can be found here. User needs to manually setup distributed proc group, wrap model with nn.parallel.DistributedDataParallel and execute the script with torch.distributed.launch tool, or use mp.spawn...

However, we aim to simplify this by providing a helper API to work on GPUs, TPUs etc.
The API is still experimental and will be available with v0.4.0 (probably released the next week).

In nightly version we provide a part of the newer API idist: https://pytorch.org/ignite/distributed.html#ignite-distributed

For a complete example of newer API, please, checkout the same cifar10 example in the branch parallel_api.

HTH

@vfdev-5,
These are great news.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samarth-robo picture samarth-robo  路  3Comments

vfdev-5 picture vfdev-5  路  3Comments

milongo picture milongo  路  3Comments

UjwalKandi picture UjwalKandi  路  3Comments

Aiden-Jeon picture Aiden-Jeon  路  3Comments