MXNet version is mxnet-cu100mkl 1.5.0b20190605(also tested on mxnet-cu100 1.4.1)
import time
import numpy as np
import mxnet as mx
p = [1 / 300000.] * 300000
probs = np.array(p)
t0 = time.time()
sample_times = np.random.multinomial(len(probs), np.array(probs))
t1 = time.time()
print("Numpy MultiNomial Time", t1 - t0)
probs = mx.nd.array(p, ctx=mx.cpu())
t0 = time.time()
indices = mx.nd.random.multinomial(probs, len(probs))
mx.nd.waitall()
t1 = time.time()
print("MXNet CPU MultiNomial Time", t1 - t0)
# stucked
# probs = mx.nd.array(p, ctx=mx.gpu(0))
# t0 = time.time()
# indices = mx.nd.random.multinomial(probs, len(probs))
# mx.nd.waitall()
# t1 = time.time()
# print("MXNet GPU MultiNomial Time", t1 - t0)
The result is
Numpy MultiNomial Time 0.01881909
MXNet CPU MultiNomial Time 43.93881797
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance
@zixuanweeei could you take a look for the CPU performance?
While np.random.multinomial gives the frequency of each outcome drawn from n experiments (sampling without replacement), mx.nd.random.multinomial gives the outcomes sampled from the multinomial distribution (sampling with replacement). At first thought, they produce the result in different ways. mx.nd.random.multinomial uses a very simple method as described in this link. The complexity of the method above can be O(N^2) for sampling K times from K possible outcomes with replacement. So it will be time-consuming for a large value of K as 300000 in your script. Any method can be taken to optimize the sampling method after that we further analysis the complexity and take some surveys on advance sampling algorithms.
Any insights of this?
You can get nearly the same result as mx.nd.random.multinomial, but very swift.
sample_list = []
p = [1 / 300000.] * 300000
probs = np.array(p)
sample_times = np.random.multinomial(len(probs), np.array(probs))
for i, t in enumerate(sample_times):
if t > 0:
sample_list.extend([i]*t)
random.shuffle(sample_list)
pyt*ch version
t0 = time.time()
p = [1/300000.] * 300000
probs = torch.as_tensor(p)
indices = torch.multinomial(probs, len(probs), True)
print("Pytorch output:\t\t", indices.tolist())
t1 = time.time()
print(t1 - t0)
@chinakook Thanks for your examples. It's informative. Pytorch uses a binary search for the slot in which the prob falls, while it uses linear search in MXNet. Pytorch uses the alias method as well. But I am not sure whether it works for drawing samples just once. We will optimize it using binary search at this moment.
We used binary search to optimize mx.nd.random.multinomial and it got interesting result. On Xeon Platinum 8180 platform, it costs ~0.0535s for the example @chinakook provided, while it costs ~59s before our optimization. By the way, the numpy one costs ~0.0279s.
Thanks for @chinakook 's examples. The optimization requires further verification. And the PR is on the way:tada:.
@chinakook Thanks for the issue. You can close it if there are no problems. BTW, considering that
... the binomial distribution describes the probability of k successes in n draws with replacement.
from this, I think np.random.multinomial also draws samples with replacement. Sorry for the mistake in my reply.
Any insights of this?
No problem.
Most helpful comment
We used binary search to optimize
mx.nd.random.multinomialand it got interesting result. On Xeon Platinum 8180 platform, it costs ~0.0535s for the example @chinakook provided, while it costs ~59s before our optimization. By the way, the numpy one costs ~0.0279s.Thanks for @chinakook 's examples. The optimization requires further verification. And the PR is on the way:tada:.