Hi @vfdev-5 ,
I found that Accuracy and several other metrics can support all_reduce for distributed training, but seems ROC_AUC doesn't have similar logic?
https://github.com/pytorch/ignite/blob/v0.3.0/ignite/contrib/metrics/roc_auc.py
as the comments said:
https://github.com/pytorch/ignite/blob/v0.3.0/ignite/metrics/epoch_metric.py#L20
How to accumulate all the predictions and labels from all processes and compute AUC?
Thanks.
Hi @Nic-Ma
This is related to this issue : https://github.com/pytorch/ignite/issues/978 and what is required is to gather all stored data across the processes with all_gather. As #978 is hi-pri, we'll try to resolve it ASAP.
@Nic-Ma the change should look like this one : https://github.com/pytorch/ignite/pull/1229/files#diff-33e0dd900637729de926044f704c6065
EDIT: to reduce computations on CPU we can apply compute_fn only on a single rank and broadcast the result to all processes.
Hi @vfdev-5 ,
Thanks very much for your sharing!
I will try to modify the AUC metrics in MONAI referring to your patch.
Hi @vfdev-5 ,
I checked your PR, but seems there is no ignite.distributed module in ignite v0.3.0.
So I wanted to use PyTorch native APIs and submitted a draft PR: https://github.com/Project-MONAI/MONAI/pull/870.
I didn't use torch.distributed.gather before, do you know where I can find a quick example to complete the ROCAUC?
Or maybe you can show me some sample code to replace the idist.all_gather in your PR?
Thanks in advance.
Hi @Nic-Ma ,
I checked your PR, but seems there is no ignite.distributed module in ignite v0.3.0.
that's true, idist is introduced since v0.4.0.
maybe you can show me some sample code to replace the idist.all_gather in your PR?
Thinking more about the implementation, there are some points to improve in the the example I show you. Idea is the following, in the compute method:
Proc: p1 p2
Data: [d11....d1N] [d21...d2N]
--- all gather ---
Data: [d11...d2N] [d11....d2N]
--- apply compute_fn ---
Proc: p1 p2
Data: compute([d11...d2N]) -> res -
--- broadcast ---
Proc: p1 p2
Data: res res
Documentation on all_gather is here : https://pytorch.org/docs/stable/distributed.html#torch.distributed.all_gather
An example with pytorch API
tensor = ...
# create placeholder to collect the data from all processes:
output = [torch.zeros_like(tensor) for _ in range(dist.get_world_size())]
# run all gather
dist.all_gather(output, tensor)
# concat output list into tensor
output = torch.cat(output, dim=0)
PS: would you like some support to migrate MONAI to v0.4.X of ignite ?
Hi @vfdev-5 ,
Thanks so much for your detailed introduction!
I updated my PR according to your sample code.
We made many hacks for engine.state based on ignite v0.3.0, so it's not very straight-forward to upgrade to v0.4.0.
But I want to upgrade ignite version when I finished all recent MONAI v0.3 tasks(DDP, AMP, FL, IO, etc.).
Maybe when you guys released v0.4.2, all distributed training features are ready and it's a good time for us to upgrade.
Thanks.
Maybe when you guys released v0.4.2, all distributed training features are ready and it's a good time for us to upgrade.
@Nic-Ma sounds good !
@Nic-Ma what are your plans about FL support ?
We would be interested to provide some interfaces to that too...
@Nic-Ma what are your plans about FL support ?
We would be interested to provide some interfaces to that too...
Hi @vfdev-5 ,
The first step, we will add some FL example based on Clara FL & MONAI, may also add examples based on other FL frameworks.
We haven't finalized a design of FL module in MONAI.
Thanks.
I close this issue as answered. And we are also closed #978 .
Hi @vfdev-5 ,
May I know when do you plan to release ignite v0.4.2?
I think I need to upgrade the dependency in MONAI soon, already got requests.
Thanks.
Hi @Nic-Ma
I think, a patch release will be done in the end of this month. For v0.5.0 it will take a bit more time...
I think I need to upgrade the dependency in MONAI soon, already got requests.
For that, I can suggest you to work with a nightly release and once all incompatibilities are fixed, just switch to v0.4.2. What do you think ?
Hi @vfdev-5 ,
We will have MONAI Bootcamp at Sep 25th, if possible, I will try to upgrade MONAI to use the latest ignite official version.
Thanks.
OK, that's true, there is a MONAI Bootcamp. Let us try to make the release before Sep 25th then.
@Nic-Ma, I also applied to participate on this bootcamp some time ago. Any chance to have it accepted ? Status is still pending... The dates are Sep 30 - Oct 02, right ?
EDIT: GPU Bootcamp Application: Submission # 587
Hi @wyli,
Could you please help on @vfdev-5 's Bootcamp issue?
Thanks.
Sure @vfdev-5, we'll send out the invitation soon :)
@wyli thank you !