Models: Run object_detection prediction on cloudML

Created on 29 Jun 2017 · 61Comments · Source: tensorflow/models

With the latest changes for exporting a SavedModel d9d10fbbb938534af72f405983cabb85258ac5f3 I tried to run predictions on Google CloudML.
When I uploaded my model here https://console.cloud.google.com/mlengine/models I got the following error Model validation failed: SavedModel must contain exactly one metagraph with tag: serve which I was able to fix here: #1810
After this fix I ran into another error: Model validation failed: Outer dimension for SignatureDef outputs must be unknown, outer dimension of 'detection_scores:0' is 1

Printing out the detection_signature (https://github.com/tensorflow/models/blob/master/object_detection/exporter.py#L281) it looks like this:

inputs {
  key: "inputs"
  value {
    name: "ToFloat:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: -1
      }
      dim {
        size: -1
      }
      dim {
        size: 3
      }
    }
  }
}
outputs {
  key: "detection_boxes"
  value {
    name: "detection_boxes:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 300
      }
      dim {
        size: 4
      }
    }
  }
}
outputs {
  key: "detection_classes"
  value {
    name: "detection_classes:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 300
      }
    }
  }
}
outputs {
  key: "detection_scores"
  value {
    name: "detection_scores:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 300
      }
    }
  }
}
outputs {
  key: "num_detections"
  value {
    name: "num_detections:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
    }
  }
}
method_name: "tensorflow/serving/predict"

I am just getting started with tensorflow... it looks to me like the dimension in the shapes should be somehow removed?

awaiting model gardener

Source

NilsLattek

👍3

Most helpful comment

Sorry all for the long delay. You can find our working in progress version of the exporter here. Note that this change will require you to use the Tensorflow 1.2 runtime on cloud ML. Please give it a spin, we will merge it into the main branch when we are confident it works well.

derekjchow on 27 Jul 2017

❤3 🎉1 👍1

All 61 comments

I have the same issue. I tried to reshape the output tensors but it didn't work. Did you have any progress?

LucasAntevere on 30 Jun 2017

same problem for me

joergkiesewetter on 30 Jun 2017

👍1

@derekjchow Do you have any idea how we can fix that? Thanks!!!

LucasAntevere on 30 Jun 2017

I changed my outputs definition:

      num_detections = tf.placeholder(dtype=tf.float32, shape=[None], name='num_detections')
      boxes = tf.placeholder(dtype=tf.float32, shape=[None, 300, 4], name='detection_boxes')
      scores = tf.placeholder(dtype=tf.float32, shape=[None, 300], name='detection_scores')
      classes = tf.placeholder(dtype=tf.float32, shape=[None, 300], name='detection_classes')     

      tensor_info_outputs = {
        'detection_boxes': tf.saved_model.utils.build_tensor_info(boxes),
        'detection_scores': tf.saved_model.utils.build_tensor_info(scores),
        'detection_classes': tf.saved_model.utils.build_tensor_info(classes),
        'num_detections': tf.saved_model.utils.build_tensor_info(num_detections)
      }

And the error changes to:

  "error": {
    "code": 3,
    "message": "Bad model detected with error:  \"Error loading the model: Could not load model. \""
  }

LucasAntevere on 2 Jul 2017

Just FYI, this seems to be a google cloud ml issue. Something changed on their side in the last few days, because models that could previously be uploaded can't be deployed anymore either.

@asimshankar any pointers here?

viksit on 4 Jul 2017

I'm from the Cloud ML Engine team. We apologize for the inconvenience and are investigating.

jlewi on 4 Jul 2017

I am from Google Cloud ML engine team. The current object detection example only supports reading one image at a time when doing inference. However, Cloud ML engine prediction service requires the inference graph be able to accept arbitrary number of input instances.The shape of the input tensor(s) must have None as its first dimension. We are actively working on adding that. The blog post and the code in github will be updated in next a few days.

yixinshi on 6 Jul 2017

@jlewi @yixinshi the problem isn't only limited to object detection. One of my models which is a text based model with a very simple input shape of (None, 100) results in the following error,

response: {u'error': {u'message': u"Model validation failed: Outer dimension for SignatureDef outputs must be unknown, outer dimension of 'Reshape:0' is 1", u'code': 3}

The thing to note is that this same model worked last week - we're unable to even deploy models to cloudml because of this.

viksit on 6 Jul 2017

Hi, I am from Google Cloud Support. The outer dimension of the output should also be None, so the prediction service can produce one output row for each input row. @viksit, could you share the signatures?

@LucasAntevere the second errors seems to be related to another issue. Could you try starting TensorFlow Model Server locally using your model. You don't need to implement the client, it is just to test whether the TensorFlow Model Server can use the model.

ilnar on 6 Jul 2017

@ilnar here you go,

signature_def['predict']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
    dtype: DT_STRING
    shape: (-1, 15, 100)
    name: input_2:0
The given SavedModel SignatureDef contains the following output(s):
outputs['classes'] tensor_info:
    dtype: DT_STRING
    shape: (1, 10)
    name: Reshape:0
outputs['scores'] tensor_info:
    dtype: DT_FLOAT
    shape: (-1, 10)
    name: dense_3/Softmax:0
Method name is: tensorflow/serving/predict

To clarify whats working and whats not.

a) Two weeks ago, I uploaded the model above (lets call it model A) to cloud ML. It uploaded fine and predictions worked.
b) This week, I trained the same model type on a different dataset and tried to upload it. It failed to upload with the error I mentioned earlier.
c) I figured I was doing something wrong, so I tried to re-upload model A (same saved_model file, just as a different version).
d) Step c failed. So the model that was uploaded earlier (Model A, version 1) still serves. But the same model can't be uploaded again with a new version. That is, Model A version 2 won't upload. It's the exact same file though.
e) Other models of the same type but different datasets can't be uploaded either.

PS, I realize this is a cloud ml issue. Is there a better forum to continue this discussion @ilnar @yixinshi @jlewi ?

viksit on 6 Jul 2017

Thank you for the update, @viksit. The outer dimension of classes (Reshape:0) is indeed not equal to None (aka unknown or -1), that is why the model check fails. There was a change in the service that introduced stricter signature checks. If we request prediction for 5 instances, the inputs will have shape of (5, 15, 100). As we provided 5 instances we expect to get 5 predictions, but we will get scores of shape (5, 10) and classes of shape (1, 10). So we don't know how to extend the classes tensor to 5 predictions. This is the logic behind that change.
There are several ways to get support and if you decide to open a support case, please mention my name, so the case is routed to me.

ilnar on 6 Jul 2017

@ilnar thanks for the information. I think I understand although I'm still trying to wrap my head around it. What would you say is the change needed from our side to make it compatible to the new model check paradigm?

Edit: The classes tensor is a custom tensor that is constructed like this : https://github.com/tensorflow/serving/issues/450

After seeing that error, I added one more line to the code that did,

# class_list = ["a", "B", ...] # len 10
class_tensor = tf.constant(class_list)
indices = tf.constant(list(range(0,len(class_list))))
table = tf.contrib.lookup.index_to_string_table_from_tensor(class_tensor)
classes = table.lookup(tf.to_int64(indices))

reshaped_classes = tf.reshape(classes, [1, num_classes]) ##<-- new addition after issue #450

signature = predict_signature_def(inputs={"input": inputs},
                                  outputs={
                                  "scores": outputs,
                                  "classes": reshaped_classes})

From what I understand, the outer dimension here needs to be ?/-1/unknown such that the system can appropriately build the tensor - which makes sense.

But since I do this during the model export process, there needs to be some way to auto-broadcast this [10,] tensor into [?, 10] (vs the [1,10] that I do today. Is there an easy way to do this in TF?

Edit:

Alright, solved this by doing a tf.tile and then a tf.reshape based on the input shape.

viksit on 6 Jul 2017

@viksit I'm trying to understand the semantics of classes. Is classes supposed to always repeat all available classes? Or are you trying to looking the class name corresponding to the highest score?

rhaertel80 on 6 Jul 2017

@rhaertel80 classes is just a table. The output then appears as [[list of classes],[list of scores]] - and you can use indices to do mappings to the class name strings in any way you'd like.

viksit on 7 Jul 2017

Solution:

# class_list = ["a", "B", ...] # len 10
class_tensor = tf.constant(class_list)
indices = tf.constant(list(range(0,len(class_list))))
table = tf.contrib.lookup.index_to_string_table_from_tensor(class_tensor)
classes = table.lookup(tf.to_int64(indices))
reshaped_classes = tf.tile(tf.reshape(classes, (1, -1)), [inputs.get_shape()[0], 1])
signature = predict_signature_def(inputs={"input": inputs},
                                  outputs={
                                  "scores": outputs,
                                  "classes": reshaped_classes})

farizrahman4u on 7 Jul 2017

@viksit thanks for confirming. Repeating the list of classes for every instances is obviously wasteful, unfortunately, we only support having outputs/predictions for every instance at this time. We are looking into the best way to allow for per-request outputs, although with something like classes, I would expect that to be queryable on a per-model basis (something else we'll look into).

As long as you don't mind the repetition, your proposal should work.

Tip: in your code classes == class_tensor, so you can simplify by deleting everything related to the table and lookup.

rhaertel80 on 7 Jul 2017

@NilsLattek @viksit a quick update. We are still working on the new object_detection code to support batched inputs for Cloud ML prediction. The code would export a model that have correct shapes for both inputs and outputs (which would pass the model validation). We expect to have that ready early next week.

yixinshi on 7 Jul 2017

👍1

@yixinshi thanks for the update, and for your team's work. It appears that I can't assign you to this issue, because you're not in the "TensorFlow" github organization. But let's treat it as if you own things, until you determine that you've completed your part.

cy89 on 8 Jul 2017

@yixinshi thanks a lot for your work! Looking forward to testing the
changes.

Cliff Young notifications@github.com schrieb am Fr. 7. Juli 2017 um 20:23:

@yixinshi https://github.com/yixinshi thanks for the update, and for
your team's work. It appears that I can't assign you to this issue, because
you're not in the "TensorFlow" github organization. But let's treat it as
if you own things, until you determine that you've completed your part.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/1811#issuecomment-313820892,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAcvmgsCd0ndizUor69u8THQOJYFIWpzks5sLsvpgaJpZM4OJCBk
.

NilsLattek on 8 Jul 2017

are there any new results, yet?

joergkiesewetter on 13 Jul 2017

@joergkiesewetter @NilsLattek Apologies for the delay. The change took longer than we expected. I will update the issue soon.

yixinshi on 14 Jul 2017

👍1

Hi there, do you have any updates? I m facing the same issue when trying to create an ML version of a saved model using object_detection/export_inference_graph.py

thaiat on 21 Jul 2017

Sorry for the delay. All code is ready internally. We are doing final tests. The updated code should be available early next week.

yixinshi on 21 Jul 2017

❤2 👍1

derekjchow on 27 Jul 2017

❤3 🎉1 👍1

No need to apologies, thank you very much for your hard work. Can we just use our pretrained models and export them or do we need to train them again from scratch?

When I use my pretrained checkpoint, the ML engine says 'Model validation failed: Outer dimension for SignatureDef outputs must be unknown, outer dimension of 'detection_scores:0' is 1'.

I'm using a trained checkpoint produced by the pets example. So I downloaded the COCO-pretrained model and used transfer learning to produce the actual checkpoint.

joergkiesewetter on 27 Jul 2017

Our pretrained COCO models should work with my linked version of the exporter.

derekjchow on 28 Jul 2017

You can use previously-trained model with the new exporter. The exporter from Derek's code (we will merge it to the official repo soon) added and renamed some flags. Make sure you use the right ones. Running it will generate a file called saved_model.pb. Try to run saved_model_cli to verify the proto has right format and content:

$saved_model_cli show --dir ${YOUR_LOCAL_EXPORT_DIR)/saved_model --all.

It should show the signatures in the graph as well as the shapes of inputs and outputs tensor. The first dimension should be -1. Deployment to the ML engine should work if saved_model_cli checks passed. The validation error should be gone.

However, it is a known problem that online prediction does not work with this particular model because it consumes more memory than what the machines online prediction supports provided currently. We are working on that.

The good news is that you can still run batch prediction without deploying the model as you can specify a GCS uri to the model. Running prediction locally using gcloud also works if your local host has enough memory (3G or more).

yixinshi on 28 Jul 2017

Forgot one thing: when deploying the object detection model, as Derek said, you must specify the runtime-version to be 1.2.

yixinshi on 28 Jul 2017

great, when I use the gcloud sdk to upload the model and specify the runtime-version, it works :)

However, it is

a known problem that online prediction does not work with this particular model because it consumes more memory than what the machines online prediction supports provided currently. We are working on that.

Do we have other models to use for transfer learning?

joergkiesewetter on 28 Jul 2017

@joergkiesewetter which model are you interested in using? IIRC I've gotten the SSDMobilenet COCO model to work for online prediction.

derekjchow on 28 Jul 2017

Hi!
When I host the SSDMobileNet COCO model using Tensorflow Serving with source compiled Tensorflow 1.2, everything works well on the server side but when I try to run the grpc client like this

request = predict_pb2.PredictRequest()
    request.model_spec.name = 'mobilenet_v1'
    request.model_spec.signature_name = 'serving_default'
    request.inputs['inputs'].CopyFrom(
        tf.contrib.util.make_tensor_proto(data, shape=[1]))
    result = stub.Predict(request, 10.0)  # 10 secs timeout

I get this weird error,

Traceback (most recent call last):
  File "/home/mcw-nn/Documents/SSP/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/tf_serving/tensorflow_serving/example/inception_client.py", line 56, in <module>
    tf.app.run()
  File "/home/mcw-nn/Documents/SSP/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home/mcw-nn/Documents/SSP/serving/bazel-bin/tensorflow_serving/example/inception_client.runfiles/tf_serving/tensorflow_serving/example/inception_client.py", line 51, in main
    result = stub.Predict(request, 10.0)  # 10 secs timeout
  File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 324, in __call__
    self._request_serializer, self._response_deserializer)
  File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 210, in _blocking_unary_unary
    raise _abortion_error(rpc_error_call)
grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.NOT_FOUND, details="FeedInputs: unable to find feed output ToFloat:0")

I am unsure if this is a serving specific issue or if I am making the request the wrong way. Please help!

suryaprakaz on 29 Jul 2017

@suryaprakaz make sure you are updating the tf_model repo to get the correct export scripts.

jlertle on 30 Jul 2017

I pulled the master branch of tf_models and the error went away. Thanks!

suryaprakaz on 30 Jul 2017

@derekjchow My understanding of the whole process is a bit to low to have a deeper understanding which model fits best my use cases. So I use the recommended models for my use cases. In general, Im looking for a model with high accuracy and I hope that the runtime is not this important in gcloud.

joergkiesewetter on 31 Jul 2017

I'm trying to deploy one of the pre-trained models to ML Engine without doing any additional transfer learning.
I'm using downloaded ssd_mobilenet_v1_coco_11_06_2017 checkpoint files and ssd_mobilenet_v1_pets.config from samples.
Getting the following error when running the new exporter (with one of the sample config files):

Caused by op u'save/Assign_10', defined at:
  File "../export_inference_graph.py", line 106, in <module>
    tf.app.run()
  File "/Library/Python/2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "../export_inference_graph.py", line 102, in main
    FLAGS.output_directory)
  File "/Users/slavab/local/tfmodels/object_detection/exporter.py", line 377, in export_inference_graph
    optimize_graph, output_collection_name)
  File "/Users/slavab/local/tfmodels/object_detection/exporter.py", line 337, in _export_inference_graph
    trained_checkpoint_prefix=trained_checkpoint_prefix)
  File "/Users/slavab/local/tfmodels/object_detection/exporter.py", line 292, in _write_graph_and_checkpoint
    tf.import_graph_def(inference_graph_def, name='')
  File "/Library/Python/2.7/site-packages/tensorflow/python/framework/importer.py", line 311, in import_graph_def
    op_def=op_def)
  File "/Library/Python/2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Library/Python/2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [228] rhs shape= [546]
     [[Node: save/Assign_10 = Assign[T=DT_FLOAT, _class=["loc:@BoxPredictor_2/ClassPredictor/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/cpu:0"](BoxPredictor_2/ClassPredictor/biases, save/RestoreV2_10)]]

Any help would be greatly appreciated.

balasan on 4 Aug 2017

👍1

@balasan Did you manage to fix it?

BelvedereHenrique on 29 Aug 2017

@BelvedereHenrique no, ended up giving up trying to run these on ML Engine for now :\

balasan on 29 Aug 2017

@balasan , sorry to head that. Have you tried the newest version: https://github.com/tensorflow/models/tree/master/object_detection? Did you use TF 1.2? If it doesn't help, would you share the complete command line you export the model?

yixinshi on 29 Aug 2017

@yixinshi Using TF 1.2.1 and latest version of object_detection repo
this is the command i'm running from the /object_detection/test directory

python ../export_inference_graph.py \
    --input_type image_tensor \
    --pipeline_config_path ../samples/configs/ssd_mobilenet_v1_pets.config \
    --trained_checkpoint_prefix ./ssd_mobilenet_v1_coco_11_06_2017/model.ckpt \
    --output_directory output

balasan on 30 Aug 2017

A check point typically includes multiple files. E.g.
model.ckpt-3485.data-00000-of-00001
model.ckpt-3485.index
model.ckpt-3485.meta

We should specify model.ckpt-3485 for the trained_checkpoint_prefix flag. Can you take a look at the files and give another try?

yixinshi on 30 Aug 2017

I'm using the checkpoint files downloaded directly here (without any additional training): https://github.com/tensorflow/models/blob/master/object_detection/g3doc/detection_model_zoo.md

the files i have are

model.ckpt.data-00000-of-00001
model.ckpt.index
model.ckpt.meta

btw, I was able to generate a savedModel using just the frozen_inference_graph.pb file (via some converter code) but that ended up giving me OP's error when deploying to ML Engine (thats how I ended up here)

balasan on 30 Aug 2017

@balasan the config needs to change a bit: can you change the num_of_classes from 37 to 90? The reason is the pets vs coco configs aren't exactly the same since they predict a different number of classes. We will update the configs soon. Sorry for the confusion. Let me know if you can correctly export the model now and if you can deploy it to the Cloud ML engine.

yixinshi on 31 Aug 2017

🎉1

@yixinshi That did the trick! Thanks!
Was able to export savedModel successfully, will try deploying a little later

balasan on 31 Aug 2017

Hi everyone:

we got a working exported model in local that is falling to create a new model version in Google Cloud ML as follows:

Create Version failed. Model validation failed: Outer dimension for outputs must be unknown, outer dimension of 'Const_2:0' is 1 For more information on how to export Tensorflow SavedModel, seehttps://www.tensorflow.org/api_docs/python/tf/saved_model.

Our current exported model response is working in tensorflow-serve and gcloud predict local with this responses:

outputs {
  key: "categories"
  value {
    dtype: DT_STRING
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 17
      }
    }
    string_val: "Business Essentials"
    string_val: "Business Skills"
    string_val: "Communication"
    string_val: "Customer Service"
    string_val: "Desktop Computing"
    string_val: "Finance"
    string_val: "Health & Wellness"
    string_val: "Human Resources"
    string_val: "Information Technology"
    string_val: "Leadership"
    string_val: "Management"
    string_val: "Marketing & Advertising"
    string_val: "Personal Development"
    string_val: "Project Management"
    string_val: "Sales"
    string_val: "Technical Skills"
    string_val: "Training & Development"
  }
}
outputs {
  key: "category"
  value {
    dtype: DT_STRING
    tensor_shape {
      dim {
        size: 1
      }
    }
    string_val: "Training & Development"
  }
}
outputs {
  key: "class"
  value {
    dtype: DT_INT64
    tensor_shape {
      dim {
        size: 1
      }
    }
    int64_val: 16
  }
}
outputs {
  key: "prob"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 1
      }
      dim {
        size: 17
      }
    }
    float_val: 0.051308773458
    float_val: 2.39087748923e-05
    float_val: 4.77133402232e-11
    float_val: 0.00015225057723
    float_val: 0.201782479882
    float_val: 2.11781745287e-17
    float_val: 3.61836161034e-09
    float_val: 0.104659214616
    float_val: 6.55719213682e-06
    float_val: 1.16744895001e-12
    float_val: 1.68323947491e-06
    float_val: 0.00510392058641
    float_val: 3.46840134738e-12
    float_val: 1.02085353504e-08
    float_val: 0.000151587591972
    float_val: 3.04983092289e-25
    float_val: 0.636809647083
  }
}

The issue must be in categories as all the other outputs where there already in the first working version of the output.

Any ideas??

My code looks like:

n_classes = len(CATEGORIES)

    logits = tf.contrib.layers.fully_connected(words, n_classes, activation_fn=None)
    categories_tensor = tf.constant(CATEGORIES)
    indices = tf.constant(list(range(0, n_classes)))
    table = tf.contrib.lookup.index_to_string_table_from_tensor(categories_tensor)
    categories = table.lookup(tf.to_int64(indices))
    reshaped_categories = tf.reshape(categories, [-1, len(CATEGORIES)])
    # reshaped_classes = tf.tile(tf.reshape(categories, (1, -1)), [words.get_shape()[0], 1])
    # print('logits={}'.format(logits)) # (?, 3)
    predictions_dict = {
      'category': tf.gather(CATEGORIES, tf.argmax(logits, 1)),
      'class': tf.argmax(logits, 1),
      'prob': tf.nn.softmax(logits),
      'categories': reshaped_categories
    }

Where I tried to follow @viksit and @farizrahman4u solutions, however this version still fires the error and

reshaped_classes = tf.tile(tf.reshape(categories, (1, -1)), [words.get_shape()[0], 1])

fires an error while training

TypeError: Failed to convert object of type <type 'list'> to Tensor. Contents: [Dimension(None), 1]. Consider casting elements to a supported type.

andresbravog on 4 Oct 2017

Responding to my own question:

I need to use one of the existing tensors of the shape I need to create a [?, len(CATEGORIES)] tensor based on them.

For that purpose we need a tensor [?] as tf.argmax(logits, 1) for using tf.till over the categories_tensor and a tensor [?, len(CATEGORIES)] for using tf.reshape over the result of that. So

CATEGORIES # => ['dog', 'elephant']
n_classes = len(CATEGORIES) # => 2
categories_tensor = tf.constant(CATEGORIES) # => Shape [2]
pob_tensor = tf.nn.softmax(logits) 
# => Shape [?, 2] being ? the number of inputs to predict
class_tensor = tf.argmax(logits, 1) 
# => Shape [?, 1]

tiled_categories_tensor = tf.tile(categories_tensor, tf.shape(class_tensor)) # => Shape [2*?] 
# => ['dog', 'elephant', 'dog', 'elephant', ... (? times) , 'dog', 'elephant' ]
categories = tf.reshape(tiled_categories_tensor, tf.shape(prob_tensor)) # => Shape [?, 2] 
# => [['dog', 'elephant'], ['dog', 'elephant'], ... (? times) , ['dog', 'elephant'] ]

predictions_dict = {
      'category': tf.gather(CATEGORIES, tf.argmax(logits, 1)),
      'class': class_tensor,
      'prob': prob_tensor,
      'categories': categories
    }

Hope it helps to anyone facing this issue

andresbravog on 9 Oct 2017

Hi everyone,
After running the export_inference_graph.py . I got the saved model and performed local predictions using google cloud ml. The SSD model predictions were wrong but the faster_rcnn_resnet predictions were good. I uploaded the saved model on my google cloud storage and created a model.
While creating the version of the model I got the following error:-
ERROR: (gcloud.ml-engine.versions.create) Bad model detected with error: "Error loading the model: Could not load model: Loading servable: {name: default version: 1} failed: Not found: Op type not registered 'DecodeBmp'\n\n"
Thanks In Advance For Your Help..

Himanshu141 on 3 Nov 2017

The error says the OP is not found in the TF version supported by the Cloud ML engine today. It only supports TF 1.2. So you have to re-export your model in TF 1.2. (Tensorflow 1.4 was released yesterday. Cloud ML engine will support that version soon).

yixinshi on 3 Nov 2017

That worked splendidly , Thank You

Himanshu141 on 6 Nov 2017

Closing the issues as it has been resolved effectively.

tombstone on 7 Nov 2017

@Himanshu141 how did export in TF 1.2 using export_inference_graph.py ? downgrading TF to 1.2 doesnt work

smarzban on 21 Nov 2017

@smarzban I downloaded the zip file for tensorflow 1.2 and installed it in my conda environment .

Himanshu141 on 21 Nov 2017

@Himanshu141, thanks for quick reply, did you compile tf 1.2?
With pip install tensorflow==1.2 I get following seg fault while exporting:

2017-11-21 13:47:28.098725: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-21 13:47:28.098757: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-21 13:47:28.098764: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-11-21 13:47:28.098782: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-21 13:47:28.098788: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-11-21 13:47:42.846643: I tensorflow/core/grappler/optimizers/constant_folding.cc:302] Initial graph size: 19559
Segmentation fault (core dumped)

smarzban on 21 Nov 2017

👍1

@smarzban i have same problem, you fix this?

temiklis on 23 Nov 2017

@yixinshi you suggested that ML-engine will support TF 1.4 soon. Can I ask how do you know that? And how _soon_ exactly? Is it a matter of weeks or rather months?

jstypka on 23 Nov 2017

@temiklis unfortunately not. I think this ticket needs to be re-opened (or create a fresh one).

smarzban on 23 Nov 2017

Any updates on this? I'm facing the same issue.

brunnoattorre on 29 Nov 2017

👍2

Any updates ~!?

geun on 6 Dec 2017

I also tried retraining whole model using TF1.2 and it would still through seg fault as above. export_inference_graph.py seems to be incompatible with TF1.2.
@Himanshu141, it would be great if you can give me more details. Which commit did you try while extracting the model with TF1.2

smarzban on 6 Dec 2017

@smarzban I have exactly the same issue. Tried to export model with TF 1.2, but fails.

Would be great to have a solution, or get Cloud ML support for TF 1.4

dumkar on 8 Dec 2017

@smarzban @geun @brunnoattorre @jstypka Cloud ML Support for TF 1.2 has been supported for several months already. The support for TF 1.4 will be available in next few DAYs. Stay tuned!

yixinshi on 9 Dec 2017

👍1

Still facing issues with TF 1.4 model serving. The uploads fail. We could go back to 1.2 for the moment.