Keras: NumPy (or JAX) backend for Keras.

Created on 3 Sep 2018 · 63Comments · Source: keras-team/keras

Most of keras is compatible with numpy / scipy.
It would be awesome ( and probably competitively fast at inference on cpu) if keras was available with a numpy/scipy backend.

This would help simplify inference significantly on a variety of machines ( in my case a non-raspbery pi arm computer). It would also be very fast.

Alternatively a simplified keras inference engine that does not require tensorflow/cntk would great as well.

Here is a link to something similar that unfortunately lost maintenance: https://github.com/riga/tfdeploy

feature

Source

danFromTelAviv

Most helpful comment

There is a "numpy backend" in the test suite. See this file:
https://github.com/keras-team/keras/blob/master/tests/keras/backend/reference_operations.py

It's being worked on right now (PR welcome). It's mainly for testing purposes but nobody prevents you from copy pasting the code and load it as an external backend. Loading external backends is not a documented feature, but it's possible, you just have to write:

{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "numpy_backend",
    "image_data_format": "channels_last"
}

in your ~/.keras/keras.json.

Then copy the code from reference_operations.py and put it in a file called numpy_backend.py. I cannot garantee that everything will work as expected, but it should behave kind of like tensorflow eager or even a hypothetical pytorch backend.

You should use the model subclassing and then call the method call on your input numpy arrays.

There might be a few things to modify here and there, but it should work with a bit of tweaking. Don't forget to write how you made everything work somewhere.

I get the point of a numpy backend to remove dependencies for inference. It's possible that one day, this numpy backend gets out of the test suite and goes to the backends directory. We never know. For now, you should be able to use it as an external backend.

gabrieldemarmiesse on 3 Sep 2018

❤2 👍2

All 63 comments

All existing backends provide several "standard" deep learning operations such as convolution/deconvolution, pooling, activation functions, etc, as well as their gradients, which would have to be implemented from scratch for a numpy backend. This seems to me to be outside the scope of keras.

kgrm on 3 Sep 2018

Most of keras is compatible with numpy / scipy.

That's maybe a bit of a stretch. I think that all of Keras' backends can do symbolic math for various transformations and optimizations of the function graph, which is quite crucial to the execution speed. Neither numpy nor scipy can do any of that. So unless that part is covered by something, I don't really see the point. I'm not familiar with sympy and its tensor module, but perhaps there is some potential there.

Can you explain the motivation behind this a bit more? Does your machine really implement full numpy and scipy support but none of Keras' backends? Tensorflow runs on raspberries, so maybe it does on your arm machine? At least some version of theano runs on arm too.

That said, I don't think writing a custom pure numpy (maybe not even scipy) backend is very hard, especially if you restrict yourself to simple models. It just won't be as performant as you think it would be.

jlherren on 3 Sep 2018

There is a "numpy backend" in the test suite. See this file:
https://github.com/keras-team/keras/blob/master/tests/keras/backend/reference_operations.py

{
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "numpy_backend",
    "image_data_format": "channels_last"
}

in your ~/.keras/keras.json.

You should use the model subclassing and then call the method call on your input numpy arrays.

There might be a few things to modify here and there, but it should work with a bit of tweaking. Don't forget to write how you made everything work somewhere.

gabrieldemarmiesse on 3 Sep 2018

❤2 👍2

Let me know if you encounter any issues with this setup. I'm happy to help. I'd be interested in seeing a numpy backend for inference too :)

gabrieldemarmiesse on 3 Sep 2018

Is numpy faster than available backends on certain platforms? Any numbers available?

farizrahman4u on 3 Sep 2018

I don't expect it to be faster. But it'd be nice to have numbers indeed. What I see is mostly a benefit in terms of dependencies for the inference part.

gabrieldemarmiesse on 3 Sep 2018

Thanks for the feed back to everyone.
What i would love to have ( and help create) is keras simplicity and speed at inference ( on arm ).

I spent about 2 weeks ( 9 hours a day ) with a linix expert trying to install tensorflow on arm ( tinker board ). It is very non trivial unless it's a raspberry pi which google recently started officially supporting. cntk is explicitly not supported. Theano is very easy to install but not maintained :/.
I have found that these libraries are also far from optimized for arm. While regular algorithmic code runs about 1x-2x slower as my i7 a convnet I built runs about 7x slower.

I obviously can not beat the speed of the most optimized frameworks on their intended x86 + gpu targets - but as for arm computers ( which is where a lot of models actually end up running for IoT devices/routers/some servers...etc ) I can likely get at least comparable speed with much less hustle of installation.

Again - I am only interested in inference - sort of like tf lite. Maybe this can be like a keras lite prototype ( if ya'll are on board for that ).

@kgrm Inference is also much simpler to implement...

@gabrieldemarmiesse - that is super awesome - I'll try it out .

I also started working on a cython + numpy implementation that is showing a lot of promise to be super fast for non gpu enviroment.
I tested a conv2d + seperable conv2d and it seems to perform with similar or even faster speeds compared to tf for my use cases ( very small images + kernels). This may be since there is almost zero overhead.

You can check out my ( very early ) work:
https://github.com/danFromTelAviv/cynfrence

I tried my best to follow tf standards.

@farizrahman4u - I will run torough tests and let you know. ( It is a great honor to have you comment on my post btw )

danFromTelAviv on 7 Sep 2018

👍1

@danFromTelAviv Thanks for the detailed response. Note that i have an existing architecture for running Keras with imperative backends (like pytorch) and currently i use it for running keras models on Nd4j (www.nd4j.org). See inference_only branch of deeplearning4j/keras . Nd4j api is quite similar to numpy. If you can show me some solid numbers, we can definitely get this going.

farizrahman4u on 7 Sep 2018

Having support for imperative backends in keras would be super nice. Maybe that would help keras and tf.keras be in sync since tf.keras already supports eager execution (just a guess)? And maybe reduce @fchollet 's workload, since keeping keras and tf.keras in sync must be very time-consuming.

gabrieldemarmiesse on 7 Sep 2018

@gabrieldemarmiesse I just looked at your code. It is beautifully simple. My only concern is runtime for conv / seperable conv and maybe rnn. I think I'll ditch my version that I was working on in favor of yours but i'll merge in the convolutions that I wrote in cython ( and maybe a couple more functions that can gain significant speedup over good old numpy ). It looks far more maintainable but I'll have to compare runtime just to make sure...

@farizrahman4u That is awesome. I'm at ECCV right now. When I get back to work I'll start testing and update you as I go.

danFromTelAviv on 11 Sep 2018

Short update.
I ran tests on my i-3 with a windows linux subsystem :
1000 runs (conv2d with (2,2) stride) = 2.062500 (s) - cython
1000 runs (conv2d with (2,2) stride) = 1.140625 (s) - tf

* edit *
after some minor tweaks
1000 runs (conv2d with (2,2) stride) = 1.468750 (s) - cython

So not as fast as I was precieving. But not a bad starting point at all. I think it maybe they are using fft based convolutions rather than direct. I'm not really sure how to aply strides/dilation in fourier space but it seems doable. Otherwise I was also thinking about looking deeper into how np/tf do their convolutions and just add in the striding / dilation options as needed. 2x is also with in range of just pure code optimization so maybe i'll attack it that way although I'm more experienced with highlevel.

danFromTelAviv on 12 Sep 2018

I found a way to speedup the convolution in the numpy backend while staying in pure python:

@normalize_conv
def conv(x, w, padding, data_format):
    _y = []
    for j in range(w.shape[1]):
        __y = []
        for k in range(w.shape[0]):
            __y.append(signal.convolve(x[:, k], w[None, k, j], mode=padding))
        _y.append(np.sum(__y, axis=0))
    return np.stack(_y, axis=1)

But it's still nowhere as fast as tensorflow for example.

See this PR: #11156

gabrieldemarmiesse on 16 Sep 2018

nice. What is the run time compared to tf roughly? Maybe we can request scipy to add dilated strided nd convolutions ?

Update with regards to the cython implementation ->
1) The above run times did not include parallelization which should give > 1.5x speedup with 2 cores.
2) I found out time.clock() can not time parallelized cython properly ( I get 2x slow down ).
I think even 30% slower than tf is good enough for now. Especially with the assumption that when I get around to it I can recover that speed with parallelization.

I think I will start to integrate with @gabrieldemarmiesse 's code and add tests and test on arm...etc
What do you guys think?

danFromTelAviv on 20 Sep 2018

If numpy / scipy could have a convolution operation for 1d/2d/3d equivalent to what deep learning frameworks have, it would be awesome.
Currently, not only our convolution is slow (I think something like 10-20x slower than TF) but our pooling operations in numpy in reference_operations.py is very slow too.
We cannot use Cython in the keras codebase. That would complicate the build too much.
We cannot go with a numpy backend in the keras codebase unless we have a fast pooling and convolution operation. Those operations should only use scipy/numpy. For pooling, I think numpy tricks can help us, but I'm not so sure for the convolution.

gabrieldemarmiesse on 20 Sep 2018

I ran a VGG with the numpy backend from reference_operations.py and it was orders of magnitude slower than TF. Mainly because of pooling and convolutions. I don't have precise numbers though.

gabrieldemarmiesse on 20 Sep 2018

Yeah... orders of magnitude is an issue.
I opened an issue for scipy - hopefully, they think it's a significant feature and select it for dev. I tried to follow their code and had a hard time so I don't think I'll be able to contribute there.

As for pooling - average pooling is just a strided convolution with a np.ones. There is also a max filter similarly. So it's two bird with one stone if we can figure out the convolutions. There are also rnns to watch out for.

In any case, for truly competative speed, none of the for loops can be in python.
Cython code precompiled or at least translated to c seems not that bad for distribution but it's not as easy as python obviously.
Official cython advice:
1) Turn the cython code into c automatically with cythonize ( no cython needed on user's computer ).
2) Add it to setup.py like the following.

from distutils.core import setup
from distutils.extension import Extension

setup(
    ext_modules = [Extension("example", ["example.c"])]
)

Maybe a simpler custom backend interface ( imperative backends as you suggested ) would be the best way to go - this way it's not part of official keras. ( If the scipy/ numpy avenue doesn't work out )

danFromTelAviv on 21 Sep 2018

Hmm. Let's not do the cython stuff. I know where it goes.. you will end up writing your own TF. Let's have a Keras fork with a basic template for imperative backends. I will add a bunch ops for numpy, and you guys can help out with the rest?

farizrahman4u on 21 Sep 2018

👍1

Fair enough. @gabrieldemarmiesse - it's not going to be fast enough for practical use but will open a really big door for innovation and maybe speed eventually will come too. You in?

Let's get a list of things that need to be done somewhere so we know who's on what. Where is the most convenient ( maybe trello or something)? maybe we should even do TDD ( seems classic here ).
** note that between gabriel's code and mine there's already a lot of functionality in pure numpy/scipy. ( mine is still not well tested ).

danFromTelAviv on 22 Sep 2018

Let's have a Keras fork with a basic template for imperative backends.

@farizrahman4u From what I know, this keras-team/keras repo will have to support TF eager. Why not work directly here to support TF eager? We'll load the numpy backend as an external one when we're done. What do you think?

gabrieldemarmiesse on 22 Sep 2018

@gabrieldemarmiesse We don't have to make much changes to this repo to support eager. Keras simply calls tf functions. Whether to execute ops eagerly or not is upto TF. Current codebase should be almost eager compatible. Maybe there are some api changes (additions) to be made to support eager workflow more explicitly, and there are upcoming breaking api changes in TF2, but I think fchollet and google will handle it :)

Here i am talking about truly imperative backends (which doesn't have the concept of placeholders). These include: pytorch, nd4j, numpy. I already have partially working prototypes for pytorch and nd4j. Lets make it a single project, with switchable backends. I can do this over the weekend (next weekend), but willl need help with adding ops.

farizrahman4u on 22 Sep 2018

Sure. I'd be interested in helping out adding ops. I mean, especially with numpy, it's not very difficult (pytorch too. There is just their weird to_device()to handle).

One request though. I would like to add missing numpy backend functions to references_operations.py in this repo. This allows a healthier testing of the backends in keras-team/keras. (because I don't like the fact that we test all backends against each other right now. If one test fail, we don't know which backend caused the tests to fail, it's contrary to the spirit of unit testing). You can always port them to your fork of course. I'd just ask some help since I need someone to read and review my PRs in this repo, so I'll count on you for that.

In short, I'll give priority to references_operations.py in this repo for the moment.

gabrieldemarmiesse on 22 Sep 2018

Sure. You can keep adding ops to reference_operations.py, I will just copy from there for my numpy backend?

farizrahman4u on 22 Sep 2018

👍1

I am relatively free this week ( we are on holidays over here ). I am not sure though exactly what functionality is needed and how the architecture of this project is set.

I think I will be most helpful if you provide me with a list of small functionalities that you want me to help out with and I can bring back a set of functions + tests.

I haven't had a chance to try out pytorch yet so I'm probably more useful with numpy as well.

danFromTelAviv on 22 Sep 2018

If you want to help, you can do PRs to add ops to
reference_operations.py. I think it's the simplest since you don't need
much knowledge of keras' internals to do that.

On Sat, 22 Sep 2018, 15:58 danFromTelAviv, notifications@github.com wrote:

I am relatively free this week ( we are on holidays over here ). I am not
sure though exactly what functionality is needed and how the architecture
of this project is set.

I think I will be most helpful if you provide me with a list of small
functionalities that you want me to help out with and I can bring back a
set of functions + tests.

I haven't had a chance to try out pytorch yet so I'm probably more useful
with numpy as well.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/11068#issuecomment-423746004,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMS2Kz0DlVWXc6wFpwFf1ANHF1oBrSjoks5udkIIgaJpZM4WXh8V
.

gabrieldemarmiesse on 22 Sep 2018

just to clarify - I should look at tensorflow_backend.py as a reference and write the functions that are missing? I assume that tf specific things like clear_session() and such are not needed?

danFromTelAviv on 22 Sep 2018

Indeed, but you should look at https://github.com/keras-team/keras/blob/master/tests/keras/backend/backend_test.py

Specifically, we want to remove the use of the variable BACKENDS and replace it with the variable WITH_NP. We can only do that for operations which are implemented in numpy.

I can guide you for a first PR if you want:

Look at this line: https://github.com/keras-team/keras/blob/e00be3d95dc7d88706a5b76d4b11856c8a7b5f6f/tests/keras/backend/backend_test.py#L208
We use BACKENDS. Replace it with WITH_NP (like for the 'dot' operation just a few lines above).
Run the test. It won't work. This is because the numpy backend doesn't have the 'batch_dot' op yet.
Add batch_dot in the numpy backend (reference_operations.py).
Run the same test again.
Success!
PR.

I'll review your PR for simplicity when you do it.

gabrieldemarmiesse on 22 Sep 2018

ok, excellent. I'll let you know if/when I run into an issue.

danFromTelAviv on 22 Sep 2018

If you want to know why we're doing that in the test suite, this is because we compare backends against each other to check that they all produce the same results. When using BACKENDS we compare the three backends together. And it's not great, because the error won't tell us easily which backend has the op which is failing. By using WITH_NP we just compare the current backend to the numpy backend to check that this is the same result. Then we know exactly which backend is failing if there is an error.

gabrieldemarmiesse on 22 Sep 2018

oh ok. that makes sense.