Incubator-mxnet: Implement a model serving framework

Created on 17 Apr 2016 · 9Comments · Source: apache/incubator-mxnet

as @tqchen suggested in https://github.com/soumith/convnet-benchmarks/issues/101#issuecomment-210551464 to compete with https://github.com/tensorflow/serving.

Call for Contribution

Source

futurely

Most helpful comment

@futurely @piiswrong a straightforward way to deploy mxnet models into production environments would indeed be highly welcome.
I have made some very good experience with an open source API server called Deepdetect (http://www.deepdetect.com/, https://github.com/beniz/deepdetect) which I am using heavily to deploy models for my commercial production environments. Currently it is supporting Caffe and XGBoost with partial support for Tensorflow on its way (my experience so far only relates to using Caffe). Would this be a route to go down for mxnet?

revilokeb on 4 Jun 2016

👍3

All 9 comments

revilokeb on 4 Jun 2016

👍3

@futurely @revilokeb @piiswrong Hey guys,
Where did you end up with this? Model serving and management is an area of focus for me, and I'd be keen to spend some dev hours on a compatible solution

jordan-green-zz on 30 Dec 2016

No one is doing it yet. An easy solution is to use AWS Lambda but it doesn't support GPU and doesn't do batching.

You are welcome to work on it. Please propose a design and we can discuss it

piiswrong on 30 Dec 2016

@jordan-green you may be interested in opening an issue for mxnet prediction support with https://github.com/beniz/deepdetect as it already has support for Caffe, XGBoost and Tensorflow. It may not be executed immediately, though not too difficult I believe. If you can help a bit with it, it is even better and will happen faster.

beniz on 30 Dec 2016

👍2

Excited to see this!!

zihaolucky on 30 Dec 2016

Hi all, my current gut feeling is that this piece of functionality may be best provided as a standalone project, under a compatible and permissive license (most likely apache), so as to benefit other frameworks also.

It would seem that outside of TF Serving, there's not a lot out there. Deep Detect looks interesting @beniz, however it appears to be under the GPL license - can you please confirm?

Lambda / OpenWhisk

Lambda would almost certainly be a great option if it had GPU support, and Amazon will almost certainly provide this in the near future, whether via a different class of lambda or via their new elastic GPU offering (which may be slightly less suited here than the prior). This if of course not an open source solution, and as such may not be the ideal. This had me thinking about other options for implementing a simple, server-less method for hosting inference models, and I think OpenWhisk may suit here.

GPU Compatibility

I can't find validation that it works on GPUs, however their generic action invocation appears to run an arbitrary binary via Alpine Linux, which I've used with cuda in the past with some success. I'll spin up an OpenWhisk VM on my GPU box and report back as to whether or not GPUs are accessible, however it's not immediately obvious to me why it shouldn't be.

Simplification

From there, I think making use of the amalgamation script/s within Mxnet to provide a simple 'runnable' object may be a good approach to providing a simple deployment process to users. This will obviously need performance testing.

Mxnet Integration

I think this could prove to be a powerful tool for many ML frameworks, with MxNet serving as the foundation in places. Perhaps this would best be its own project/repository, mirrored within and closely integrating with Mxnet? Thoughts on this are much appreciated.

Please let me know your thoughts, and once I've validated some of the moving pieces, particuarly GPU support on OpenWhisk, I'll knock together a design proposal for further discussion.

jordan-green-zz on 3 Jan 2017

DD is under LGPL, please see https://github.com/beniz/deepdetect/blob/master/COPYING.

beniz on 3 Jan 2017

@kevinthesun

eric-haibin-lin on 28 Sep 2017

@yuruofeifei and I are working on MXNet Model serving. It's still in early stage. In current phase, it creates a http end point and allows developers to fully customize their preprocess and post process function for inference. In the future stage, more powerful functions will be added.
https://github.com/yuruofeifei/mms