Incubator-mxnet: as_in_context() does not copy the gradient to the new context

Created on 11 Mar 2019 · 11Comments · Source: apache/incubator-mxnet

I think it should copy the gradient as well automatically?

ctx = mx.cpu()
a = mx.nd.ones((10,10), ctx=ctx)
a.attach_grad()
print(a.context)
print(a.grad.context)
ctx = mx.gpu()
a = a.as_in_context(ctx)
print(a.context)
print(a.grad.context)

cpu(0)
cpu(0)
gpu(0)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-16-deab22190603> in <module>()
      5 ctx = mx.gpu()
      6 a = a.as_in_context(ctx)
----> 7 print(a.context, a.grad.context)

AttributeError: 'NoneType' object has no attribute 'context'

Bug Python

Source

ThomasDelteil

All 11 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

mxnet-label-bot on 11 Mar 2019

as_in_context will return a new copied NDArray object, but it doesn't copy the gradient.
If you need to copy the gradient, a.grad.as_in_context(ctx) is okay.

wkcn on 13 Mar 2019

I think it Is not necessary to copy the gradient automatically, because the gradient is not used usually, it may be useless to copy automatically.

wkcn on 13 Mar 2019

I disagree I think it would make more sense to copy the gradient as well as the tensor. It doesn't make sense for them to be on different contexts?
Also a variable that had gradient attached after being moved does not have a gradient attached anymore, that seems clearly a bug to me?

ThomasDelteil on 13 Mar 2019

👍1

You are right. Let me think about a lazy method to return the gradient.

wkcn on 13 Mar 2019

Is it good to define the function def as_in_context(ctx, ignore_grad=True)?
Because sometime we do not need to copy the gradient, and I do not how to copy the gradient lazily.

wkcn on 13 Mar 2019

If we are worried about breaking APIs, I think we could start indeed with a as_in_context(ctx, copy_grad=False) to avoid a breaking API change but for MXNet 2.0 I would suggest to move to copy_grad=True which makes more sense to me.

ThomasDelteil on 13 Mar 2019

I'm worry copy_grad=True will drop the performance. Assume that it takes 1s to copy the data of a large tensor, it will takes extra 1s to copy the gradient.

wkcn on 14 Mar 2019

In what case would you not want to copy gradients that has been attached?
Most as in context happened on data without gradient, I only discovered the
bug on some niche use case that requires data with gradient attached.

If the data does not have gradient attached the runtime would be the same
since no gradient would need moving

On Wed, Mar 13, 2019, 19:00 JackieWu notifications@github.com wrote:

I'm worry copy_grad=True will drop the performance. Assume that it takes
1s to copy the data of a large tensor, it will takes extra 1s to copy the
gradient.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/apache/incubator-mxnet/issues/14391#issuecomment-472674519,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADi00-T27pr84_lNYc42pd4Jrfq8QmnEks5vWa1JgaJpZM4bphps
.

ThomasDelteil on 14 Mar 2019

👍1

https://github.com/apache/incubator-mxnet/pull/14427#pullrequestreview-214315633

szha on 14 Mar 2019

I can see @szha argument on that one, though I remain on the fence. However that's a niche use-case and I'm happy to go on the side of simplicity. To add a use-case to the conversation: when you want to differentiate the loss with respect to the input for example (DeepDream style visualization) and want to visualize the gradient. If you simply copy back the ndarray to the CPU, the gradient is not forwarded.

ThomasDelteil on 15 Mar 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings