Apex: How to use 2 different optimizers in one model?

Created on 20 Feb 2019 · 13Comments · Source: NVIDIA/apex

Hi, I am trying to use 2 different optimizers in one model. For example, one encoder-decoder model, decoder model use Adam optimizer while encoder model use SGD. How do I use Apex to backward loss?

GAN

Source

Lausannen

👍2

Most helpful comment

Yes, https://github.com/NVIDIA/apex/issues/163#issuecomment-465586715 is the correct way for now with Amp. With FP16_Optimizer, you can also wrap each optimizer instance individually.

Also, stay tuned for a new API that unifies the current Amp and FP16_Optimizer. My new API is tracked in branch api_refactor, although I don't have examples and documentation yet. I'll merge it into master by the end of February, and one of the examples will be a GAN with multiple losses and optimizers.

mcarilli on 21 Feb 2019

👍2

All 13 comments

I was about to open the exact same issue. I think this feature would be super useful, especially when combined with distributed, because distributed will not let you perform updates on a subset of the parameters (i.e. only the encoder parameters) which currently forces you to have a different optimizer for the encoder and for the decoder. But apex won't work with 2 optimizers..

glample on 20 Feb 2019

This looks relevant though: https://github.com/NVIDIA/apex/tree/master/apex/amp#multiple-optimizers-or-backward-passes

glample on 20 Feb 2019

Yes, https://github.com/NVIDIA/apex/issues/163#issuecomment-465586715 is the correct way for now with Amp. With FP16_Optimizer, you can also wrap each optimizer instance individually.

mcarilli on 21 Feb 2019

👍2

Thank you for your quick reply ! @glample @mcarilli I will try Amp firstly and I am looking forward for your new API.

Lausannen on 21 Feb 2019

How would I implement something like this:

optimizer1.zero_grad()
optimizer2.zero_grad()

y_hat = model2(model1(x))
#...
loss1 = loss_fn1(y_hat, y)
loss1.backward(retain_graph=True)

optimizer1.step() #has gradient contributions from loss1
# ...
loss2 = ComputeLoss2(model2)
loss2.backward()
# ...
optimizer2.step() # has gradient contributions from both loss1 and loss2, but only applied to model2

gregjohnso on 28 Feb 2019

@gregjohnso Does optimizer1.step() act on model2 or only on model1? Also, why do you need retain_graph=True for loss1.backward? In that minimal sample I don't see where you are backwarding through loss1 again (or any of its subgraphs).

mcarilli on 14 Mar 2019

What I'm trying to do is have loss1 apply to model1 and model2, and loss2 apply only to model2.

optimizer1 acts only on model 1
optimizer2 acts only on model2

I need to retain_graph=True because otherwise the first backward/step would release buffers from the model2 subgraph, and the loss2.backward would throw an error.

Hope that clarifies what I'm doing.

gregjohnso on 14 Mar 2019

I still don't see how the line loss2 = ComputeLoss2(model2) makes sense, because model2 is a model, not an output or anything. Maybe that's a typo. But I understand your control flow. That is tricky to handle with what I have exposed currently (at least in a way that will also allow more general cases like yours). I'll let you know when I have a solution.

mcarilli on 15 Mar 2019

Hi, I notice that you have updated your new API, and I want to know when will the GAN example be updated ?

Lausannen on 15 Mar 2019

@mcarilli I'm facing a situation where I have a single model, with one embedding layer which is sparse. I have 2 optimizers: one for the sparse embeddings (optimizer_sparse), and one for every other parameter (optimizer_dense).

What I would usually do is:

loss.backward()
optimizer_dense.step()
optimizer_sparse.step()

Now I would like to do this, but in float16. Could the current API handle this situation?

glample on 16 Mar 2019

Not quite, as it assumes a backward pass is associated with a particular optimizer. The old API did as well. I know how I can support this in the future, as well as @gregjohnso 's case, but I need a few days to implement it.

mcarilli on 16 Mar 2019

👍1

@mcarilli Thank you! I am looking forward for this since it is important in my case

Lausannen on 16 Mar 2019

Deduplicating to #179

mcarilli on 19 Mar 2019

Was this page helpful?

0 / 5 - 0 ratings