What is the top-level directory of the model you are using:
DeepLab v3+ and with offical demo script on Colab notebook
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
Yes, I tried to adjust the resizing procedure, using original image size instead.
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Windows 10
TensorFlow installed from (source or binary):
With pip3 binary installation
TensorFlow version (use command below):
1.7 CPU only
Bazel version (if compiling from source):
GPU model and memory:
Not used
Exact command to reproduce:
I tried to adjust official demo script not to resize the input image , just to predict on raw image
From:
def run(self, image):
width, height = image.size
resize_ratio = 1.0 * self.INPUT_SIZE / max(width, height)
target_size = (int(resize_ratio * width), int(resize_ratio * height))
resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)
batch_seg_map = self.sess.run(
self.OUTPUT_TENSOR_NAME,
feed_dict={self.INPUT_TENSOR_NAME: [np.asarray(resized_image)]})
seg_map = batch_seg_map[0]
return resized_image, seg_map
To:
def run(self, image):
width, height = image.size
#resize_ratio = 1.0 * self.INPUT_SIZE / max(width, height)
resize_ratio = 1
target_size = (int(resize_ratio * width), int(resize_ratio * height))
resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)
batch_seg_map = self.sess.run(
self.OUTPUT_TENSOR_NAME,
feed_dict={self.INPUT_TENSOR_NAME: [np.asarray(resized_image)]})
seg_map = batch_seg_map[0]
return resized_image, seg_map
And when I run the code , the TF always throw error like InvalidArgumentError: padded_shape[0]=98 is not divisible by block_shape[0]=4.
I wonder how to use the demo script to eval on raw image predict with original size output instead of resized image. And how to use the model in real environment for random image size input and original image size output.
I am new to Tensorflow and Deeplab, sorry for the inconvenience.
Source code
class DeepLabModel(object):
"""Class to load deeplab model and run inference."""
INPUT_TENSOR_NAME = 'ImageTensor:0'
OUTPUT_TENSOR_NAME = 'SemanticPredictions:0'
INPUT_SIZE = 513
FROZEN_GRAPH_NAME = 'frozen_inference_graph'
def __init__(self, tarball_path):
"""Creates and loads pretrained deeplab model."""
self.graph = tf.Graph()
graph_def = None
# Extract frozen graph from tar archive.
tar_file = tarfile.open(tarball_path)
for tar_info in tar_file.getmembers():
if self.FROZEN_GRAPH_NAME in os.path.basename(tar_info.name):
file_handle = tar_file.extractfile(tar_info)
graph_def = tf.GraphDef.FromString(file_handle.read())
break
tar_file.close()
if graph_def is None:
raise RuntimeError('Cannot find inference graph in tar archive.')
with self.graph.as_default():
tf.import_graph_def(graph_def, name='')
self.sess = tf.Session(graph=self.graph)
def run(self, image):
width, height = image.size
#resize_ratio = 1.0 * self.INPUT_SIZE / max(width, height)
resize_ratio = 1
target_size = (int(resize_ratio * width), int(resize_ratio * height))
resized_image = image.convert('RGB').resize(target_size, Image.ANTIALIAS)
batch_seg_map = self.sess.run(
self.OUTPUT_TENSOR_NAME,
feed_dict={self.INPUT_TENSOR_NAME: [np.asarray(resized_image)]})
seg_map = batch_seg_map[0]
return resized_image, seg_map
Logs:
InvalidArgumentErrorTraceback (most recent call last)
<ipython-input-4-0792ed70ae9c> in <module>()
24
25 image_url = IMAGE_URL or _SAMPLE_URL % SAMPLE_IMAGE
---> 26 run_visualization(image_url)
<ipython-input-4-0792ed70ae9c> in run_visualization(url)
18
19 print('running deeplab on image %s...' % url)
---> 20 resized_im, seg_map = MODEL.run(orignal_im)
21
22 vis_segmentation(resized_im, seg_map)
<ipython-input-2-1b4ac6ee0d3d> in run(self, image)
48 batch_seg_map = self.sess.run(
49 self.OUTPUT_TENSOR_NAME,
---> 50 feed_dict={self.INPUT_TENSOR_NAME: [np.asarray(resized_image)]})
51 seg_map = batch_seg_map[0]
52 return resized_image, seg_map
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
903 try:
904 result = self._run(None, fetches, feed_dict, options_ptr,
--> 905 run_metadata_ptr)
906 if run_metadata:
907 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
1138 if final_fetches or final_targets or (handle and feed_dict_tensor):
1139 results = self._do_run(handle, final_targets, final_fetches,
-> 1140 feed_dict_tensor, options, run_metadata)
1141 else:
1142 results = []
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1319 if handle is None:
1320 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1321 run_metadata)
1322 else:
1323 return self._do_call(_prun_fn, handle, feeds, fetches)
/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1338 except KeyError:
1339 pass
-> 1340 raise type(e)(node_def, op, message)
1341
1342 def _extend_graph(self):
InvalidArgumentError: padded_shape[1]=85 is not divisible by block_shape[1]=2
[[Node: xception_65/middle_flow/block1/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND = SpaceToBatchND[T=DT_FLOAT, Tblock_shape=DT_INT32, Tpaddings=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](xception_65/middle_flow/block1/unit_1/xception_module/Relu, xception_65/exit_flow/block1/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND/block_shape, xception_65/exit_flow/block1/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND/paddings)]]
Caused by op u'xception_65/middle_flow/block1/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND', defined at:
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/lib/python2.7/dist-packages/ipykernel_launcher.py", line 16, in <module>
app.launch_new_instance()
File "/usr/local/lib/python2.7/dist-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelapp.py", line 477, in start
ioloop.IOLoop.instance().start()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 888, in start
handler_func(fd_obj, events)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/usr/local/lib/python2.7/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
return self.dispatch_shell(stream, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
handler(stream, idents, msg)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
user_expressions, allow_stdin)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/usr/local/lib/python2.7/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
if self.run_code(code, result):
File "/usr/local/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-f707f9cb549e>", line 26, in <module>
MODEL = DeepLabModel(download_path)
File "<ipython-input-2-1b4ac6ee0d3d>", line 30, in __init__
tf.import_graph_def(graph_def, name='')
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 577, in import_graph_def
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): padded_shape[1]=85 is not divisible by block_shape[1]=2
[[Node: xception_65/middle_flow/block1/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND = SpaceToBatchND[T=DT_FLOAT, Tblock_shape=DT_INT32, Tpaddings=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](xception_65/middle_flow/block1/unit_1/xception_module/Relu, xception_65/exit_flow/block1/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND/block_shape, xception_65/exit_flow/block1/unit_1/xception_module/separable_conv1_depthwise/depthwise/SpaceToBatchND/paddings)]]
Hi highland0971,
The frozen graph is exported in such a way that the input's dimension must be 513x513. If you would like to change the input dimension, you need to export the frozen graph by yourself (i.e., change the flag value while exporting the model).
Hi @aquariusjay ,
Thanks for you reply.
Do you mean I can export a new dimension like 2052 * 2052 from the pretrained model by the official team? Like extracted checkpoint from deeplabv3_pascal_trainval_2018_01_04.tar.gz , and run export_model.py like:
python export_model.py --checkpoint_path deeplabv3_pascal_trainval --export_path newsizefolder --crop_size 2052 --crop_size 2052
If I do this, does it degrade the predict precision compared to original result? Because I saw the most of the VOC dataset image is with a smaller dimension.
Thanks again.
Hi highland0971,
I think it will not work with the provided checkpoints. You should probably do
Thanks again, I will try to make some larger inputs.
However, when I tried this command with offical trained model, I encounter following error:
CMD
python export_model.py --checkpoint_path train/deeplabv3_pascal_trainval/model.ckpt --export_path train/newsize --crop_size 2052 --crop_size 2052
Output
root@iZm5e7a6fq119jbdfttqnrZ:~/workspace/models-master/research/deeplab# python export_model.py --checkpoint_path train/deeplabv3_pascal_trainval/model.ckpt --export_path train/newsize --crop_size 2052 --crop_size 2052
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
INFO:tensorflow:Prepare to export model to: train/newsize
INFO:tensorflow:Exported model performs single-scale inference.
2018-04-25 08:15:48.422344: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-25 08:15:48.990792: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-25 08:15:48.990995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:00:08.0
totalMemory: 15.90GiB freeMemory: 368.88MiB
2018-04-25 08:15:48.991025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-25 08:15:49.738808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-25 08:15:49.738867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-04-25 08:15:49.738884: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-04-25 08:15:49.739027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 89 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:00:08.0, compute capability: 6.0)
INFO:tensorflow:Restoring parameters from train/deeplabv3_pascal_trainval/model.ckpt
2018-04-25 08:15:50.075597: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key MobilenetV2/Conv/BatchNorm/beta not found in checkpoint
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1312, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1420, in _call_tf_sessionrun
status, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key MobilenetV2/Conv/BatchNorm/beta not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_301 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "export_model.py", line 165, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "export_model.py", line 159, in main
initializer_nodes=None)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/tools/freeze_graph.py", line 106, in freeze_graph_with_def_protos
saver.restore(sess, input_checkpoint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1775, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1140, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key MobilenetV2/Conv/BatchNorm/beta not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_301 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Caused by op 'save/RestoreV2', defined at:
File "export_model.py", line 165, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "export_model.py", line 147, in main
saver = tf.train.Saver(tf.model_variables())
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1311, in __init__
self.build()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1320, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1357, in _build
build_save=build_save, build_restore=build_restore)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 809, in _build_internal
restore_sequentially, reshape)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 448, in _AddRestoreOps
restore_sequentially)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 860, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1458, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Key MobilenetV2/Conv/BatchNorm/beta not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/RestoreV2/_301 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_306_save/RestoreV2", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
Is there is any wrong with my command ? Why does the tf complaint about NotFoundError (see above for traceback): Key MobilenetV2/Conv/BatchNorm/beta not found in checkpoint
I found the problem, missing --model_variant="xception_65" parameter when exporting.
@highland0971 Where you able to make it work with dimensions 2052 x 2052? For some reason even with 512 your code doesn't work. I ran this:
python export_model.py --checkpoint_path deeplabv3_pascal_trainval/model.ckpt --export_path deeplabv3_pascal_trainval/2052 --crop_size 513 --crop_size 513 --model_variant="xception_65"
And still encountered this error
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,512,256] rhs shape= [1,1,1280,256]
[[Node: save/Assign_9 = Assign[T=DT_FLOAT, _class=["loc:@concat_projection/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/
Have you had something similar?
EDIT: This seems to have worked for me
python export_model.py --checkpoint_path deeplabv3_pascal_trainval/model.ckpt --export_path deeplabv3_pascal_trainval/1024/frozen_inference_graph.pb --crop_size 1024 --crop_size 1024 --atrous_rates 12 --atrous_rates 24 --atrous_rates 36 --model_variant="xception_65"
I added --atrous_rates as recommended by the source file, as well as given the path to the file rather than just the folder.
@xhlulu
Hi, I use your method and successfully generate frozen_inference_graph.pb, but the test result seem to be wrong. The segmentation result was just an all black image. Do you have the same problem?
Hi highland0971,
I think it will not work with the provided checkpoints. You should probably do
- Train a model with large inputs and export it.
or- Resize all your inputs so that you can use the provided checkpoints. You could then resize back if you really want large outputs.
@aquariusjay how would #2 approach work ? I tried rescaling the final image back to original size but that just pixellates it. Followed this flow -> (resize down to 513, segment/process, resize up to original size/ratio).
Tried exporting the model with 1080 or 2k crop_sizes but just like @rolandsehun I also get a black/blank image output (even though the process does take its full time to run)
I am also getting a blank image output when attempting to train on higher resolution imagery? Could you suggest the correct values for:
Hi guys,
Is there anybody resolve the black image problem? I faced the same issue here.
@frankdeepl try adding --decoder_output_stride=4 option. it worked for me.
Most helpful comment
@xhlulu
Hi, I use your method and successfully generate frozen_inference_graph.pb, but the test result seem to be wrong. The segmentation result was just an all black image. Do you have the same problem?