Serving: tensorflow model trained on 'N'-GPUs requires same number of GPU to get served using tf-serving

Created on 6 Jul 2018 · 13Comments · Source: tensorflow/serving

I got tensorflow-model which is trained on 8-GPU power and it has been saved using SavedModelBuilder as per the requirements of the tf-serving, to get the model file in this structure:

sample_model_2

After saving it, I am serving it with tf-serving. But the problem is that if tf-serving is installed on CPU machince or less than 8 GPU machince it throws error(TF-serving doesn't accept the models file). And when I install TF-serving on 8 GPU machine the model gets serve properly.

Is there any parameter or flag which can help to save and serve the model on different configuration machine? and not be depended on training configuration of the machine?

Error it return when training and serving configuration of machine are not same:

2018-07-06 08:09:16.698221: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10743 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2018-07-06 08:09:16.698443: E external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:228] Illegal GPUOptions.experimental.num_dev_to_dev_copy_streams=0 set to 1 instead.
2018-07-06 08:09:17.844564: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:291] SavedModel load for tags { serve }; Status: fail. Took 1834084 microseconds.
2018-07-06 08:09:17.912314: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: default version: 2} failed: Invalid argument: Cannot assign a device for operation 'clone_7/gradients/clone_7/softmax_cross_entropy_loss/div_1_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:7 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device specification refers to a valid device.
         [[Node: clone_7/gradients/clone_7/softmax_cross_entropy_loss/div_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _output_shapes=[[?], [?]], _device="/device:GPU:7"](clone_7/gradients/clone_7/softmax_cross_entropy_loss/div_1_grad/Shape, clone_7/gradients/clone_7/softmax_cross_entropy_loss/div_1_grad/Shape_1)]]

Any help is appreciated, thanks.

awaiting response bug

Source

gr8Adakron

❤1

Most helpful comment

I believe this is correct. Depending on the way your model is structured, you have probably explicitly encoded 8 GPUs in the training part. I'm going to assume you are using an Estimator. What gets served is defined in the model_fn when mode=PREDICT. If you again define a model specialized explicitly for 8 GPUs in that call, serving will require 8 GPUs. If you define one for a single GPU, it will work with a single GPU (and if soft placement is an option) it should also work if no GPU is present.

martinwicke on 10 Jul 2018

👍2

All 13 comments

@ewilderj @kchodorow @montanaflynn @jart any clue about this?

gr8Adakron on 6 Jul 2018

martinwicke on 10 Jul 2018

👍2

Facing the same problem. Any help is appreciated. Thank You.

bansarishah on 11 Jul 2018

There are some parameters to clear the "device placement" when exporting the model. It is a little difficult to specify devices just like training unless we modify the graph manually.

tobegit3hub on 9 Aug 2018

Is this still an issue ?

Harshini-Gadige on 23 Oct 2018

I believe this hasn't changed.

martinwicke on 23 Oct 2018

@tobegit3hub thank you. The issue got resolved for me after clearing device info while exporting the model.

bansarishah on 24 Oct 2018

@bansarishah Could you please give some hints on how you did it? Thanks in advance.

frallain on 6 Nov 2018

@frallain , while exporting model, enable flag clear_devices=True. It will help to resolve this issue.
builder = tf.saved_model.builder.SavedModelBuilder(export_path) ... builder.add_meta_graph_and_variables( sess, [tf.saved_model.tag_constants.SERVING], signature_def_map={ 'model':signature }, **clear_devices=True**)
Thank You.

bansarishah on 7 Nov 2018

👍1

@gr8Adakron Did you get a chance to try the above solution as suggested by @bansarishah ?

Harshini-Gadige on 3 Dec 2018

Besides clear_devices flag, is there a way to define the device placement in tf-serving, like stated in https://github.com/tensorflow/serving/issues/311?

wydwww on 4 Dec 2018

Closing this at it is in "awaiting response" status for more than a week. Feel free to add comments(if any), we will reopen the issue. Thanks !

Harshini-Gadige on 11 Feb 2019

@Harshini-Gadige Hi, any progress on device placement for tf-serving?

heibaidaolx123 on 11 Jul 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Encountered error while reading extension file 'protobuf.bzl': no such package '@protobuf//': Could not find handler for bind rule //external:protobuf error on ubuntu 16.04

sandipmgiri · 3Comments

Serving "metadata" - empty input signature

marcoadurno · 3Comments

VIRTUAL MEMORY EXHAUSETED

akkiagrawal94 · 3Comments

Tensorflow Serving docker failed compilation

demiladef · 4Comments

inception-client error with tensorflow-serving-apis, but works well with bazel built server

TonyChouZJU · 4Comments