Serving: Load-balancing for real-time production environments in TensorFlow

Created on 14 Feb 2018 · 4Comments · Source: tensorflow/serving

We use deep learning to process very large images for manufacturing. Images come at a very high rate so we load-balance appropriately. Because of our performance requirements, we tweak everything to squeeze every last millisecond we can out of our deep learning inference.

We have been using a different machine learning framework but are now researching TensorFlow. we want to have a machine that serves up inferences to input data that and has multiple GPU cards. The inferences are made by a model with high GPU RAM requirements. These inferences are consumed by multiple machines on the same LAN.

The input data is high volume and needs to be processed in real-time so there is a need for load-balancing among the GPU cards. The machines that consume these inferences are load-balanced and more machines are added as the load increases. Ideally you would add GPU cards to the single machine until it reaches its limit and then add another GPU machine.

How do you load-balance TensorFlow inferences in such a setup?

performance

Source

johnsrude

👍6

Most helpful comment

Any updates on this? Incredibly practical problem statement.

sidshopin on 28 Jun 2018

👍5

All 4 comments

Any updates on this? Incredibly practical problem statement.

sidshopin on 28 Jun 2018

👍5

Please go to Stack Overflow for help and support:

https://stackoverflow.com/questions/tagged/tensorflow-serving

If you open a GitHub issue, it must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).

Thanks!

gautamvasudevan on 24 Jul 2018

👎5

@gautamvasudevan Issue has been open for even longer at https://stackoverflow.com/questions/48104939/load-balancing-for-real-time-production-environments-in-tensorflow but still no answer from your team.

KevinLucidyne on 25 Jul 2018

An external load balancer program should be enough, no need to implement in tensorflow-serving.

wydwww on 27 Nov 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

inception-client error with tensorflow-serving-apis, but works well with bazel built server

TonyChouZJU · 4Comments

VIRTUAL MEMORY EXHAUSETED

akkiagrawal94 · 3Comments

Tensorflow Serving docker failed compilation

demiladef · 4Comments

Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf//': Could not find handler for bind rule //external:protobuf error on ubuntu 16.04

sandipmgiri · 3Comments

grpc Issue in sending request to tensorflow serving from a python client

prateekgupta11 · 4Comments