Keras: Running Keras in a multi-node/multi-machine environment

Created on 26 Jul 2015 · 5Comments · Source: keras-team/keras

Has anyone tried running Keras at a larger scale, such as on a cluster or in an HPC environment? I have access to such an environment and would be interested to understand whether Keras can be used effectively in such an environment? Perhaps it would require using intermediate outputs stored on disk such as with joblib to break up the processing?

stale

Source

tleeuwenburg

Most helpful comment

You can check out the Elephas project for Keras parallelization on Spark: https://github.com/maxpumperla/elephas

fchollet on 16 Aug 2015

❤2 👍2

All 5 comments

What sort of parallelism are you looking to do? Data parallelism would be very easy to setup. Model parallelism would require some changes.

fchollet on 26 Jul 2015

I ran a derivative of the char-rnn successfully on our cluster via a PBS queue system, but only on a single machine with 16 cpu cores (we have no GPUs here yet). I would be interested to find out if it is possible to train a model on several nodes at once, but it might be that this is not supported by theano.