CNTK speed: BrainScript vs Python

Created on 30 Dec 2016 · 9Comments · Source: microsoft/CNTK

Hi,

I tested the training speed of two examples from the standard CNTK distribution:

BrainScript: cntk/Examples/Image/GettingStarted/01_OneHidden.cntk
Python: cntk/Examples/Image/Classification/MLP/Python/SimpleMNIST.py

Both examples have the same model. I trained on GPU on both cases. Here are the results:

BrainScript: ~ 95,000 samples per second
Python: ~61,000 samples per second

Is there something wrong?

Source

vaasily

Most helpful comment

Asynchronous CPU/GPU transfers are also missing from the C# bindings, unless you use CNTK.MinibatchSource.TextFormatMinibatchSource.

The method suggested in #2516 (Calling Value.CreateBatch) leads to very low GPU utilisation (even when done on a separate thread).

Ideally there would be a C# version of MinibatchSourceFromData (like in the python bindings), that uses asynchronous GPU/CPU transfers.

mjmckp on 6 Feb 2018

👍2

All 9 comments

When answering this question, can someone please explain me the advantages/disadvantages of BrainScript vs. Python?
Do we need to keep track of both?

arijit17 on 31 Dec 2016

Hi @arijit17,

Python is, currently, a little bit slower than BrainScript because of the overhead introduced by the conversion of Python Objects to CNTK's C++ defined types.

I asked myself yesterday how much this overhead could be, and here is a graph of the operations involved for a prediction (evaluate()) and a minibatch training (train_minibatch):

You can see that each call to both method will result in subsequent calls :

wrapper()
sanitize_batch()
etc.

On the other hand, on the BrainScript side, all the operations made by CNTK are done on C++ objects directly, without the overhead of mapping Python types to C++.

Edit :

Python is a very flexible way to use CNTK within an IDE, with all the Python's features
Brainscript is a little more restricted (data formats, preprocessing, no IDE support, etc.) but is currently the fastest way to use CNTK.

Hope it helps :)
Morgan

mfuntowicz on 31 Dec 2016

👍1

Hi Morgan: thanks for the analysis!

arijit17 on 2 Jan 2017

SimpleMNIST.py has a small minibatch size and caused the python overhead in loop to stand out. If you increase the minibatch size to 1024 (depends on your GPU memory), you'll notice smaller gaps.

KeDengMS on 10 Jan 2017

Is it possible to create MinibatchData/Value/NDArrayView objects only once and then move a data from numpy arrays?

1ytic on 24 May 2017

Yes, you can use Value.create to create the value object on GPU. However, the copy from numpy array to GPU here is done synchronously so if you do it per minibatch you still would have CPU/GPU stalls. CNTK readers asynchronously copies the data to GPU while computation is going on to eliminate the stall.

KeDengMS on 24 May 2017

Indeed, caching MinibatchData/Value/NDArrayView objects inside the custom UserMinibatchSource speed up training.

When you said "asynchronously", did you mean ReaderShim class with DataTransferer? It would be great to extend SwigMinibatchSource class with such features for further speed up UserMinibatchSource.

1ytic on 2 Jun 2017

Yes DataTransferer does the async copy from CPU to GPU. We are working on exposing this to user, keep tuned.

KeDengMS on 2 Jun 2017

👍2

Asynchronous CPU/GPU transfers are also missing from the C# bindings, unless you use CNTK.MinibatchSource.TextFormatMinibatchSource.

The method suggested in #2516 (Calling Value.CreateBatch) leads to very low GPU utilisation (even when done on a separate thread).

Ideally there would be a C# version of MinibatchSourceFromData (like in the python bindings), that uses asynchronous GPU/CPU transfers.

mjmckp on 6 Feb 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Support for netcoreapp2.0

robinhad · 61Comments

.NET Support

StevenGann · 125Comments

Is anyone trying to build CNTK with CUDA 11.1?

haryngod · 17Comments

Package Microsoft.Research.CNTK.CpuEval-acml 1.5.0 is not compatible with netcoreapp1.0

loretoparisi · 19Comments

Iteration Plan (September - October 2017)

cha-zhang · 49Comments