Incubator-mxnet: mxnet.nd.random.multinomial is very slow on CPU and stucked on GPU

Created on 13 Jun 2019 · 9Comments · Source: apache/incubator-mxnet

MXNet version is mxnet-cu100mkl 1.5.0b20190605(also tested on mxnet-cu100 1.4.1)

import time
import numpy as np
import mxnet as mx


p = [1 / 300000.] * 300000

probs = np.array(p)
t0 = time.time()
sample_times = np.random.multinomial(len(probs), np.array(probs))
t1 = time.time()
print("Numpy MultiNomial Time", t1 - t0)

probs = mx.nd.array(p, ctx=mx.cpu())
t0 = time.time()
indices = mx.nd.random.multinomial(probs, len(probs))
mx.nd.waitall()
t1 = time.time()
print("MXNet CPU MultiNomial Time", t1 - t0)

# stucked
# probs = mx.nd.array(p, ctx=mx.gpu(0))
# t0 = time.time()
# indices = mx.nd.random.multinomial(probs, len(probs))
# mx.nd.waitall()
# t1 = time.time()
# print("MXNet GPU MultiNomial Time", t1 - t0)

The result is

Numpy MultiNomial Time 0.01881909
MXNet CPU MultiNomial Time 43.93881797

Performance

Source

chinakook

Most helpful comment

We used binary search to optimize mx.nd.random.multinomial and it got interesting result. On Xeon Platinum 8180 platform, it costs ~0.0535s for the example @chinakook provided, while it costs ~59s before our optimization. By the way, the numpy one costs ~0.0279s.

Thanks for @chinakook 's examples. The optimization requires further verification. And the PR is on the way:tada:.

zixuanweeei on 17 Jun 2019

🎉4

All 9 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Performance

mxnet-label-bot on 13 Jun 2019

@zixuanweeei could you take a look for the CPU performance?

pengzhao-intel on 13 Jun 2019

While np.random.multinomial gives the frequency of each outcome drawn from n experiments (sampling without replacement), mx.nd.random.multinomial gives the outcomes sampled from the multinomial distribution (sampling with replacement). At first thought, they produce the result in different ways. mx.nd.random.multinomial uses a very simple method as described in this link. The complexity of the method above can be O(N^2) for sampling K times from K possible outcomes with replacement. So it will be time-consuming for a large value of K as 300000 in your script. Any method can be taken to optimize the sampling method after that we further analysis the complexity and take some surveys on advance sampling algorithms.
Any insights of this?

zixuanweeei on 16 Jun 2019

👍1

You can get nearly the same result as mx.nd.random.multinomial, but very swift.

        sample_list = []
        p = [1 / 300000.] * 300000

        probs = np.array(p)
        sample_times = np.random.multinomial(len(probs), np.array(probs))
        for i, t in enumerate(sample_times):
            if t > 0:
                sample_list.extend([i]*t)
        random.shuffle(sample_list)

chinakook on 17 Jun 2019

pyt*ch version

t0 = time.time()
p = [1/300000.] * 300000
probs = torch.as_tensor(p)
indices = torch.multinomial(probs, len(probs), True)
print("Pytorch output:\t\t", indices.tolist())
t1 = time.time()
print(t1 - t0)

chinakook on 17 Jun 2019

@chinakook Thanks for your examples. It's informative. Pytorch uses a binary search for the slot in which the prob falls, while it uses linear search in MXNet. Pytorch uses the alias method as well. But I am not sure whether it works for drawing samples just once. We will optimize it using binary search at this moment.

zixuanweeei on 17 Jun 2019

👍1

Thanks for @chinakook 's examples. The optimization requires further verification. And the PR is on the way:tada:.

zixuanweeei on 17 Jun 2019

🎉4

@chinakook Thanks for the issue. You can close it if there are no problems. BTW, considering that

... the binomial distribution describes the probability of k successes in n draws with replacement.

from this, I think np.random.multinomial also draws samples with replacement. Sorry for the mistake in my reply.

Any insights of this?

zixuanweeei on 25 Jun 2019

No problem.

chinakook on 25 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Is there a simple way to make two similar networks share same weights?

xzqjack · 3Comments

a error! Mxnet will crash!

dushoufu · 3Comments

Use R, how to manually confficients predict y^, make the result equal to the function `predict`?

GuilongZh · 3Comments

Training Accuracy and Validation Accuracy problem

WangcsShuai · 3Comments

Per-layer learning rate for fine tuning a pretrained network

phunterlau · 3Comments