Hi,
I was wondering something. In almost every explanation I find, writing training code for a pytorch network involves
y_pred = model(x)loss_fn(ypred, y)loss.backward()optimizer.step()optimizer.zero_grad()In the create_supervised_trainer() function, the last step involving optimizer.zero_grad() is not included here. Is this on purpose? Or is this handled elsewere in the code?
I wan't to build my own trainer, but want to make sure wheter the zero_grad is needed or not?
@nwschurink thanks for asking. optimizer.zero_grad() is called just above:
https://github.com/pytorch/ignite/blob/68d3ba1baa70d16d7bc35771538a3213300177c4/ignite/engine/__init__.py#L97
I wan't to build my own trainer, but want to make sure wheter the zero_grad is needed or not?
yes, zero_grad is needed :)
If you would like to perform grad accumulation, it is possible to zero grads just after the step like explained here:
https://pytorch.org/ignite/faq.html#gradients-accumulation
Ah my mistake, I looked over it!
It's like you read my mind as I was indeed intending on using the gradients accumulation. As the zero_grad is written there after the step function I was expecting to find it there as well in the create_supervised_trainer() function.
Thanks for your help! 👍
Most helpful comment
Ah my mistake, I looked over it!
It's like you read my mind as I was indeed intending on using the gradients accumulation. As the zero_grad is written there after the step function I was expecting to find it there as well in the create_supervised_trainer() function.
Thanks for your help! 👍