Pytorch-lightning: Allow optimizers to alternate at arbitrary intervals

Created on 3 Aug 2019 · 7Comments · Source: PyTorchLightning/pytorch-lightning

For GANs or similar approaches, we may want optimizer A to step every batch while optimizer B might step every k batches.

This feature will enable this behavior.

Approach still needs to be scoped out. Open to suggestions here.

enhancement help wanted

Source

williamFalcon

👍4

Most helpful comment

Hi, I think the following is related.

What if we want just to alternate optimizers during training step by step. I mean the cases when we for example use Adam for 5 epochs than SGD for other 5 epochs.

How could we prevent training_step which in that case has "optimizer_idx" argument from executing twice?

zetyquickly on 8 Apr 2020

👍3

All 7 comments

First, when defining validation_end, instead of passing
return [torch.optim.Adam(self.parameters(), lr=0.02)],
they could try

optimizer = torch.optim.Adam(self.parameters(), lr=2.0)
optimizer.skip_batch = 1
return [optimizer]

I'm sure that whoever wants to use this skipping feature would be comfortable adding a few lines
To accommodate for this, whenever self.optimizers = model.configure_optimizers() is called in trainer.py, you could just add the following:

for optimizer in self.optimizers:
    try:
        optimizer.skip_batch = 0
    except AttributeError:
        optimizer.skip_batch = 0

Basically the first part is checking to see if the user manually defined the skip rate, and if not, just set it to 0(never skip)
Later on, when calling optimizer.step(), you can replace it with

if self.batch_nb % (optimizer.skip_batch + 1) == 0:
    optimizer.step()

I believe this should also work with schedulers as well.
But then again, I don't know that much about Pytorch.
Nevertheless, this project looks quite exciting and I hope I can provide some help!

On another note, why would you want to have this feature? If its so optimizer A can learn "faster" than B, why not just multiply B's learning rate by 1/k so you fully take advantage of all of the gradients while having it optimize slower than A.

sidhanthholalkere on 4 Aug 2019

good suggestion, but I wonder how adding properties to the optimizer might affect loading, saving and training continuation.

It seems a bit hacky, so let's think of other alternatives as well? If this turns out to be the best way, then we can go with it.

I was thinking about maybe just allowing the configure_optimizers method to return another list with config stuff:

return [opt_a, opt_b], [sched_a], [{'skip_batch': 2}]

Something like that. But don't love this either haha.

williamFalcon on 4 Aug 2019

I fail to understand how to implement GAN-related training scheme in pytorch-lightning. Can you give me some examples?

wheatdog on 13 Aug 2019

@wheatdog @sidhanthholalkere see #106 for discussion. #107 for changes to support this.

Would these changes work for you?

williamFalcon on 13 Aug 2019

@wheatdog @sidhanthholalkere on master now. override optimizer_step to update any optimizer at arbitrary intervals.

williamFalcon on 13 Aug 2019

👍2

docs here: https://williamfalcon.github.io/pytorch-lightning/Trainer/hooks/#optimizer_step

williamFalcon on 13 Aug 2019

Hi, I think the following is related.

What if we want just to alternate optimizers during training step by step. I mean the cases when we for example use Adam for 5 epochs than SGD for other 5 epochs.

How could we prevent training_step which in that case has "optimizer_idx" argument from executing twice?

zetyquickly on 8 Apr 2020

👍3

Was this page helpful?

0 / 5 - 0 ratings