I ran the pytorch imagenet example but got an error that float number don't have detach() method. It seems that loss.item() lead to the float number, but I don't know how to fix that in horovod framework.
Can anyone help me? Thanks a lot!
mpirun -np 4 \
-H localhost:4 \
-bind-to none -map-by slot \
-x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH \
-mca pml ob1 -mca btl ^openib \
python main_hvd.py --train-dir /datasets/ILSVRC2012/images/train --val-dir /datasets/ILSVRC2012/images/val
Train Epoch #1: 0%| | 0/10010 [00:00<?, ?it/s]Traceback (most recent call last):
File "main_hvd.py", line 272, in <module>
train(epoch)
File "main_hvd.py", line 179, in train
train_loss.update(loss.item())
File "main_hvd.py", line 263, in update
self.sum += hvd.allreduce(val.detach().cpu(), name=self.name)
AttributeError: 'float' object has no attribute 'detach'
My environment is:
Sorry about that, it's a bug. I've submitted #853 with a fix, meanwhile, you can replace train_loss.update(loss.item()) with train_loss.update(loss).
Most helpful comment
Sorry about that, it's a bug. I've submitted #853 with a fix, meanwhile, you can replace
train_loss.update(loss.item())withtrain_loss.update(loss).