Serving: tensorflow model trained on 'N'-GPUs requires same number of GPU to get served using tf-serving

Created on 6 Jul 2018  路  13Comments  路  Source: tensorflow/serving

I got tensorflow-model which is trained on 8-GPU power and it has been saved using SavedModelBuilder as per the requirements of the tf-serving, to get the model file in this structure:

sample_model_2

After saving it, I am serving it with tf-serving. But the problem is that if tf-serving is installed on CPU machince or less than 8 GPU machince it throws error(TF-serving doesn't accept the models file). And when I install TF-serving on 8 GPU machine the model gets serve properly.

Is there any parameter or flag which can help to save and serve the model on different configuration machine? and not be depended on training configuration of the machine?

Error it return when training and serving configuration of machine are not same:

2018-07-06 08:09:16.698221: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10743 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
2018-07-06 08:09:16.698443: E external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:228] Illegal GPUOptions.experimental.num_dev_to_dev_copy_streams=0 set to 1 instead.
2018-07-06 08:09:17.844564: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:291] SavedModel load for tags { serve }; Status: fail. Took 1834084 microseconds.
2018-07-06 08:09:17.912314: E tensorflow_serving/util/retrier.cc:37] Loading servable: {name: default version: 2} failed: Invalid argument: Cannot assign a device for operation 'clone_7/gradients/clone_7/softmax_cross_entropy_loss/div_1_grad/BroadcastGradientArgs': Operation was explicitly assigned to /device:GPU:7 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0 ]. Make sure the device specification refers to a valid device.
         [[Node: clone_7/gradients/clone_7/softmax_cross_entropy_loss/div_1_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _output_shapes=[[?], [?]], _device="/device:GPU:7"](clone_7/gradients/clone_7/softmax_cross_entropy_loss/div_1_grad/Shape, clone_7/gradients/clone_7/softmax_cross_entropy_loss/div_1_grad/Shape_1)]]

Any help is appreciated, thanks.

awaiting response bug

Most helpful comment

I believe this is correct. Depending on the way your model is structured, you have probably explicitly encoded 8 GPUs in the training part. I'm going to assume you are using an Estimator. What gets served is defined in the model_fn when mode=PREDICT. If you again define a model specialized explicitly for 8 GPUs in that call, serving will require 8 GPUs. If you define one for a single GPU, it will work with a single GPU (and if soft placement is an option) it should also work if no GPU is present.

All 13 comments

@ewilderj @kchodorow @montanaflynn @jart any clue about this?

I believe this is correct. Depending on the way your model is structured, you have probably explicitly encoded 8 GPUs in the training part. I'm going to assume you are using an Estimator. What gets served is defined in the model_fn when mode=PREDICT. If you again define a model specialized explicitly for 8 GPUs in that call, serving will require 8 GPUs. If you define one for a single GPU, it will work with a single GPU (and if soft placement is an option) it should also work if no GPU is present.

Facing the same problem. Any help is appreciated. Thank You.

There are some parameters to clear the "device placement" when exporting the model. It is a little difficult to specify devices just like training unless we modify the graph manually.

Is this still an issue ?

I believe this hasn't changed.

@tobegit3hub thank you. The issue got resolved for me after clearing device info while exporting the model.

@bansarishah Could you please give some hints on how you did it? Thanks in advance.

@frallain , while exporting model, enable flag clear_devices=True. It will help to resolve this issue.
builder = tf.saved_model.builder.SavedModelBuilder(export_path) ... builder.add_meta_graph_and_variables( sess, [tf.saved_model.tag_constants.SERVING], signature_def_map={ 'model':signature }, **clear_devices=True**)
Thank You.

@gr8Adakron Did you get a chance to try the above solution as suggested by @bansarishah ?

Besides clear_devices flag, is there a way to define the device placement in tf-serving, like stated in https://github.com/tensorflow/serving/issues/311?

Closing this at it is in "awaiting response" status for more than a week. Feel free to add comments(if any), we will reopen the issue. Thanks !

@Harshini-Gadige Hi, any progress on device placement for tf-serving?

Was this page helpful?
0 / 5 - 0 ratings