Models: Documentation request - own dataset usage with deeplab

Created on 24 Apr 2018 · 17Comments · Source: tensorflow/models

Thank you for putting this code together and available on github!

Attempting to use deep lab on my own dataset.

I would like to use an arbitrary input resolution, ie 1280x720

In utils/input_generator.py get() there is a min/max resize_value.
There is also a crop_size.

Settings max_resize_value=512 and crop_size=[512, 512], and feeding an image of 1280 yields:

Assign requires shapes of both tensors to match. lhs shape= [1,1,512,256] rhs shape= [1,1,1280,256]

What is the right settings to use for fine tuning our own dataset with this?

Source

swirlingsand

👍4

Most helpful comment

@swirlingsand I used a different strategy to use my own dataset: I just created a script to convert my dataset in the pascal_voc format and it works well.

@AnameZT I don't know exactly what your problem is but I had the same
Assign requires shapes of both tensors to match. lhs shape= [1,1,512,256] rhs shape= [1,1,1280,256]
problem during the export part of the process, even if my training and evaluation succeeded.
A common mistake is to forget writing ALL the same arguments you used to use during the training part while typing your python command line.

For example my problem was that I did not include values for the "atrous rates" flags. Please verify you have included ALL the flags you included during the training part.

GTimothee on 30 May 2018

👍5 ❤2

All 17 comments

Hmm could you provide the full stacktrace and your commandline arguments as well?

YknZhu on 25 Apr 2018

"args": [
    "--logtostderr",
    "--model_variant",
    "xception_65",
    "--training_number_of_steps",
    "2000",
    "--fine_tune_batch_norm",
    "true",
    "--tf_initial_checkpoint",
    "gs://eminent-century-190103/transfer_learning/semantic_segmentation/deeplab/deeplabv3_cityscapes_train/model.ckpt",
    "--dataset_dir",
    "gs://eminent-century-190103/projects/ml/53/96",
    "--train_logdir",
    "gs://eminent-century-190103/projects/ml/53/96",
    "--dataset",
    "cityscapes",
    "--train_crop_size",
    "512",
    "--train_crop_size",
    "512",
    "--initialize_last_layer",
    "false"
  ],
  "region": "us-central1",
  "runtimeVersion": "1.6",
  "jobDir": "gs://eminent-century-190103/projects/ml/53/96",
  "pythonVersion": "3.5"

Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__ c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,512,256] rhs shape= [1,1,1280,256] [[Node: save/Assign_15 = Assign[T=DT_FLOAT, _class=["loc:@concat_projection/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](concat_projection/weights/Momentum, save/RestoreV2:15)]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.5/site-packages/deeplab/train.py", line 380, in <module> tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/root/.local/lib/python3.5/site-packages/deeplab/train.py", line 373, in main save_interval_secs=FLAGS.save_interval_secs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 746, in train master, start_standard_services=False, config=session_config) as sess: File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__ return next(self.gen) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 1000, in managed_session self.stop(close_summary_writer=close_summary_writer) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 828, in stop ignore_live_threads=ignore_live_threads) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join six.reraise(*self._exc_info_to_raise) File "/usr/local/lib/python3.5/dist-packages/six.py", line 686, in reraise raise value File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 989, in managed_session start_standard_services=start_standard_services) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/supervisor.py", line 726, in prepare_or_wait_for_session init_feed_dict=self._init_feed_dict, init_fn=self._init_fn) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/session_manager.py", line 283, in prepare_session init_fn(sess) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 690, in callback saver.restore(session, model_path) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1755, in restore {self.saver_def.filename_tensor_name: save_path}) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1137, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1355, in _do_run options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1374, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,512,256] rhs shape= [1,1,1280,256] [[Node: save/Assign_15 = Assign[T=DT_FLOAT, _class=["loc:@concat_projection/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](concat_projection/weights/Momentum, save/RestoreV2:15)]] Caused by op 'save/Assign_15', defined at: File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/.local/lib/python3.5/site-packages/deeplab/train.py", line 380, in <module> tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/root/.local/lib/python3.5/site-packages/deeplab/train.py", line 370, in main ignore_missing_vars=True), File "/root/.local/lib/python3.5/site-packages/deeplab/utils/train_utils.py", line 113, in get_model_init_fn ignore_missing_vars=ignore_missing_vars) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 688, in assign_from_checkpoint_fn saver = tf_saver.Saver(var_list, reshape=reshape_variables) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1293, in __init__ self.build() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1302, in build self._build(self._filename, build_save=True, build_restore=True) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1339, in _build build_save=build_save, build_restore=build_restore) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 796, in _build_internal restore_sequentially, reshape) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 471, in _AddRestoreOps assign_ops.append(saveable.restore(saveable_tensors, shapes)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 161, in restore self.op.get_shape().is_fully_defined()) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/state_ops.py", line 280, in assign validate_shape=validate_shape) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_state_ops.py", line 58, in assign use_locking=use_locking, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3271, in create_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1650, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,512,256] rhs shape= [1,1,1280,256] [[Node: save/Assign_15 = Assign[T=DT_FLOAT, _class=["loc:@concat_projection/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](concat_projection/weights/Momentum, save/RestoreV2:15)]]

swirlingsand on 25 Apr 2018

Ahha this error message indicates the tensor depth is different? It complains the 1x1 conv kernel "concat_projection/weights/" should be used to project 1280 input channels to 256 output channels. But you get 512 input channels instead. It is not related on input image size (at least not directly).

Could you double check your network architecture?

YknZhu on 1 May 2018

I'm using the default "xception_65". I apologize if I'm missing something, is there a different network I should be using with this or a different concept you are referring to?

swirlingsand on 1 May 2018

Facing the same problem

liubingkarin on 9 May 2018

Hmm sorry for the late response. Could you do --train_crop_size=513 and give it a try?

YknZhu on 29 May 2018

@swirlingsand Whether you met this problem during your train process? I met the same problem during my evaluation process.I trained my data with train_crop_size=513 and evaluated my data with eval_crop_size equals other values(according to the link https://github.com/tensorflow/models/issues/3886), I'm not clear about how to solve it, are there any suggestions?

AnameZT on 30 May 2018

@swirlingsand I used a different strategy to use my own dataset: I just created a script to convert my dataset in the pascal_voc format and it works well.

For example my problem was that I did not include values for the "atrous rates" flags. Please verify you have included ALL the flags you included during the training part.

GTimothee on 30 May 2018

👍5 ❤2

Hi @AvaPunksmash, I am also trying to convert my dataset to pascal format. Can you please share some ideas about how you did this? I found something strange in the remove_gt_color file. It seems that by converting the SegmentationClass images to np.array changes the images from 3 channels to 1 channel. How could this happen?

Blackpassat on 1 Jun 2018

👍1

Yes it is normal, it is because Pascal Voc uses a so-called "color map" to create "palette images": The labels of the segmentation are 1-channel images and each value of this channel is then mapped to a RGB color thanks to a"color map" (so, as I understood, you have a limited number of colours). In fact if you look at pascal voc images in windows explorer it will appear in color because the explorer "interprets" the image but its still a 1-channel image.

If you want to use the same color map, the simpler manner is to open an image from pascal voc and extract its color map. If you don't know how to do it look at stack overflow, someone gave a code snippet to do this:
https://stackoverflow.com/questions/42959364/tensorflow-how-to-create-a-pascal-voc-style-image
I also found this by the way : https://gist.github.com/wllhf/a4533e0adebe57e3ed06d4b50c8419ae

If you really want to build the same color map by yourself (which is not useful I think), then I don't know it but you can probably find it on pascal voc website (I did'nt) or here :
https://github.com/tensorflow/models/blob/master/research/deeplab/deeplab_demo.ipynb

Personnaly I just used this code from the stackoverflow post cited above :
# We can load the palette from some random image in the PASCAL VOC dataset palette = Image.open('.../VOC2012/SegmentationClass/2007_000032.png').getpalette()

GTimothee on 1 Jun 2018

@AvaPunksmash I am also getting the same problem
Assign requires shapes of both tensors to match. lhs shape= [1,1,512,256] rhs shape= [1,1,1280,256]
during the export part of the process. As you suggested, I tried using all the parameters in export command which I had used for training, but still getting same.

Can you please share your training run command and export command?
Thanks in advance.

ashwinichhipa on 7 Jun 2018

👍2

Thank you very much for your detailed explanation! @AvaPunksmash The given codes are really helpful!

Blackpassat on 8 Jun 2018

I was having the same problem. Setting exact atrous_rates as the training solved the problem.

sumsuddin on 27 Aug 2018

Closing this issue since a workaround is available. Feel free to reopen if the issue still persists. Thanks!

ymodak on 28 Dec 2018

A common mistake is to forget writing ALL the same arguments you used to use during the training part while typing your python command line.

For example my problem was that I did not include values for the "atrous rates" flags. Please verify you have included ALL the flags you included during the training part.

Thank you so much @GTimothee !! Spent a few days not making progress because of this exact issue, and I also forgot to set the flags that I used for training (specifically the atrous ones as well). Really appreciate your answer :)

lolitsgab on 13 Oct 2019

@swirlingsand I used a different strategy to use my own dataset: I just created a script to convert my dataset in the pascal_voc format and it works well.

@AnameZT I don't know exactly what your problem is but I had the same
Assign requires shapes of both tensors to match. lhs shape= [1,1,512,256] rhs shape= [1,1,1280,256]
problem during the export part of the process, even if my training and evaluation succeeded.
A common mistake is to forget writing ALL the same arguments you used to use during the training part while typing your python command line.

For example my problem was that I did not include values for the "atrous rates" flags. Please verify you have included ALL the flags you included during the training part.

you are a life saver! Thank you so much!

moFang222 on 23 Jan 2020

This is the command, customise it according to the parameters used in training.

python deeplab/export_model.py --checkpoint_path=/code/models/research/deeplab/weights_input_level_17/model.ckpt-22000 --export_path=/code/models/research/deeplab/frozen_weights_level_17/frozen_inference_graph.pb --model_variant="xception_65" --atrous_rates=6 --atrous_rates=12 --atrous_rates=18 --output_stride=16 --crop_size=2048 --crop_size=2048 --num_classes=3

Hey, Guys please use this config to export your deeplabv3plus mode. It worked for me.