Incubator-mxnet: mx.random.seed does not give deterministic results on multi-cpus

Created on 5 Aug 2020  路  7Comments  路  Source: apache/incubator-mxnet

Description

I am trying to fix the random seed across multiple cpus and followed the example to pass ctx here : https://mxnet.apache.org/versions/1.6/api/python/docs/api/mxnet/random/index.html and the example code works but it seems order dependent. See the output below. If I call mx.random.seed(128, ctx=mx.cpu(0)) and mx.random.seed(128, ctx=mx.cpu(1)), it does not work. Also is this the only way to loop over all the num_cpus and set each ctx?

To Reproduce

mx.random.seed(128, ctx=mx.cpu(0))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.cpu(0)).asnumpy())
[[ 0.47400656  0.20251541]
 [ 1.3648157  -1.4962182 ]]
mx.random.seed(128, ctx=mx.cpu(1))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.cpu(1)).asnumpy())
[[ 0.47400656  0.20251541]
 [ 1.3648157  -1.4962182 ]]
mx.random.seed(128, ctx=mx.cpu(0))
mx.random.seed(128, ctx=mx.cpu(1))
print(mx.nd.random.normal(shape=(2,2), ctx=mx.cpu(0)).asnumpy())
[[ 0.47400656  0.20251541]
 [ 1.3648157  -1.4962182 ]]
print(mx.nd.random.normal(shape=(2,2), ctx=mx.cpu(1)).asnumpy())
[[ 1.0954498  -0.20808885]
 [ 1.590508   -0.41777727]]

The above arrays do not match and I am not sure why because the seed is set. We also see this when trying to use multi-processing in our algorithms and the numbers are very different even for fixed seed. Above is a simple representative example.
```

Bug random

Most helpful comment

It's not expected and should be fixed. I'd just like to point out the apparent root cause.

All 7 comments

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

mx.cpu(0) is not really different from mx.cpu(1). There is only one random resource for CPUs, whereas there is one per device-id for GPUs:

https://github.com/apache/incubator-mxnet/blob/6bbd53107aa16fc41e8d462cf5dc46fb70d592df/src/resource.cc#L394-L402

mx.cpu(0) is not really different from mx.cpu(1). There is only one random resource for CPUs, whereas there is one per device-id for GPUs:

https://github.com/apache/incubator-mxnet/blob/6bbd53107aa16fc41e8d462cf5dc46fb70d592df/src/resource.cc#L394-L402

Why is the result different than on different cpus? The seed does not seem to be fixed in my example. Thank you!

Why is the result different than on different cpus? The seed does not seem to be fixed in my example. Thank you!

You are running two random operations in sequence. The first operation uses the fixed seed but modifies the internal state of the random resource. My point is that even though you specify different contexts (mx.cpu(0), mx.cpu(1)), they rely on the same shared random state.

However resetting the seed of cpu(1) should give us deterministic results at CPU(1). The current behavior is not expected.

It's not expected and should be fixed. I'd just like to point out the apparent root cause.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Zhaoyang-XU picture Zhaoyang-XU  路  3Comments

seongkyun picture seongkyun  路  3Comments

sbodenstein picture sbodenstein  路  3Comments

dmadeka picture dmadeka  路  3Comments

WangcsShuai picture WangcsShuai  路  3Comments