For GANs or similar approaches, we may want optimizer A to step every batch while optimizer B might step every k batches.
This feature will enable this behavior.
Approach still needs to be scoped out. Open to suggestions here.
First, when defining validation_end, instead of passing
return [torch.optim.Adam(self.parameters(), lr=0.02)],
they could try
optimizer = torch.optim.Adam(self.parameters(), lr=2.0)
optimizer.skip_batch = 1
return [optimizer]
I'm sure that whoever wants to use this skipping feature would be comfortable adding a few lines
To accommodate for this, whenever self.optimizers = model.configure_optimizers() is called in trainer.py, you could just add the following:
for optimizer in self.optimizers:
try:
optimizer.skip_batch = 0
except AttributeError:
optimizer.skip_batch = 0
Basically the first part is checking to see if the user manually defined the skip rate, and if not, just set it to 0(never skip)
Later on, when calling optimizer.step(), you can replace it with
if self.batch_nb % (optimizer.skip_batch + 1) == 0:
optimizer.step()
I believe this should also work with schedulers as well.
But then again, I don't know that much about Pytorch.
Nevertheless, this project looks quite exciting and I hope I can provide some help!
On another note, why would you want to have this feature? If its so optimizer A can learn "faster" than B, why not just multiply B's learning rate by 1/k so you fully take advantage of all of the gradients while having it optimize slower than A.
good suggestion, but I wonder how adding properties to the optimizer might affect loading, saving and training continuation.
It seems a bit hacky, so let's think of other alternatives as well? If this turns out to be the best way, then we can go with it.
I was thinking about maybe just allowing the configure_optimizers method to return another list with config stuff:
return [opt_a, opt_b], [sched_a], [{'skip_batch': 2}]
Something like that. But don't love this either haha.
I fail to understand how to implement GAN-related training scheme in pytorch-lightning. Can you give me some examples?
@wheatdog @sidhanthholalkere see #106 for discussion. #107 for changes to support this.
Would these changes work for you?
@wheatdog @sidhanthholalkere on master now. override optimizer_step to update any optimizer at arbitrary intervals.
Hi, I think the following is related.
What if we want just to alternate optimizers during training step by step. I mean the cases when we for example use Adam for 5 epochs than SGD for other 5 epochs.
How could we prevent training_step which in that case has "optimizer_idx" argument from executing twice?
Most helpful comment
Hi, I think the following is related.
What if we want just to alternate optimizers during training step by step. I mean the cases when we for example use Adam for 5 epochs than SGD for other 5 epochs.
How could we prevent
training_stepwhich in that case has"optimizer_idx" argumentfrom executing twice?