Incubator-mxnet: mx.random.seed does not give deterministic results on multi-cpus

Created on 5 Aug 2020 · 7Comments · Source: apache/incubator-mxnet

Description

I am trying to fix the random seed across multiple cpus and followed the example to pass ctx here : https://mxnet.apache.org/versions/1.6/api/python/docs/api/mxnet/random/index.html and the example code works but it seems order dependent. See the output below. If I call mx.random.seed(128, ctx=mx.cpu(0)) and mx.random.seed(128, ctx=mx.cpu(1)), it does not work. Also is this the only way to loop over all the num_cpus and set each ctx?

To Reproduce

mx.random.seed(128, ctx=mx.cpu(0))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.cpu(0)).asnumpy())
[[ 0.47400656  0.20251541]
 [ 1.3648157  -1.4962182 ]]
mx.random.seed(128, ctx=mx.cpu(1))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.cpu(1)).asnumpy())
[[ 0.47400656  0.20251541]
 [ 1.3648157  -1.4962182 ]]
mx.random.seed(128, ctx=mx.cpu(0))
mx.random.seed(128, ctx=mx.cpu(1))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.cpu(0)).asnumpy())
[[ 0.47400656  0.20251541]
 [ 1.3648157  -1.4962182 ]]
print(mx.nd.random.normal(shape=(2,2), ctx=mx.cpu(1)).asnumpy())
[[ 1.0954498  -0.20808885]
 [ 1.590508   -0.41777727]]

The above arrays do not match and I am not sure why because the seed is set. We also see this when trying to use multi-processing in our algorithms and the numbers are very different even for fixed seed. Above is a simple representative example.
```

Bug random

Source

dcmaddix

👍2

Most helpful comment

It's not expected and should be fixed. I'd just like to point out the apparent root cause.

leezu on 6 Aug 2020

👍2 🚀1

All 7 comments

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

github-actions[bot] on 5 Aug 2020

mx.cpu(0) is not really different from mx.cpu(1). There is only one random resource for CPUs, whereas there is one per device-id for GPUs:

https://github.com/apache/incubator-mxnet/blob/6bbd53107aa16fc41e8d462cf5dc46fb70d592df/src/resource.cc#L394-L402

leezu on 6 Aug 2020

You may see also the issue: https://github.com/apache/incubator-mxnet/issues/15662

sxjscience on 6 Aug 2020

mx.cpu(0) is not really different from mx.cpu(1). There is only one random resource for CPUs, whereas there is one per device-id for GPUs:

https://github.com/apache/incubator-mxnet/blob/6bbd53107aa16fc41e8d462cf5dc46fb70d592df/src/resource.cc#L394-L402

Why is the result different than on different cpus? The seed does not seem to be fixed in my example. Thank you!

dcmaddix on 6 Aug 2020

Why is the result different than on different cpus? The seed does not seem to be fixed in my example. Thank you!

You are running two random operations in sequence. The first operation uses the fixed seed but modifies the internal state of the random resource. My point is that even though you specify different contexts (mx.cpu(0), mx.cpu(1)), they rely on the same shared random state.

leezu on 6 Aug 2020

However resetting the seed of cpu(1) should give us deterministic results at CPU(1). The current behavior is not expected.

sxjscience on 6 Aug 2020

It's not expected and should be fixed. I'd just like to point out the apparent root cause.

leezu on 6 Aug 2020

👍2 🚀1

Was this page helpful?

0 / 5 - 0 ratings