I have successfully run deeplabv3+ with xception on my own dataset.But got errors when change model_variant to resnet_v1_50.
According to #4464 , I made the following changes but still got errors:
ubuntu 16.04 LTS
tf-gpu 1.8
python 3.6
python train.py \
--learning_rate=1e-5 \
--logtostderr \
--train_split="train" \
--model_variant="resnet_v1_50" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size=513 \
--train_crop_size=513 \
--train_batch_size=4 \
--training_number_of_steps="100" \
--fine_tune_batch_norm=true \
--tf_initial_checkpoint="${INIT_FOLDER}/resnet_v1_50/model.ckpt" \
--train_logdir="${TRAIN_LOGDIR}" \
--dataset_dir="${MY_DATASET}" \
--initialize_last_layer=False \
--last_layers_contain_logits_only=True
INFO:tensorflow:Training on train set
INFO:tensorflow:Initializing model from path: /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/depthwise_weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/depthwise_weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/depthwise_weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable concat_projection/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_pointwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/weights/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/gamma/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/beta/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/weights/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/gamma/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/beta/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/weights/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:From /home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py:736: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-07-11 17:05:43.321358: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-11 17:05:43.777630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:84:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-07-11 17:05:43.777695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-11 17:05:44.234604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-11 17:05:44.234656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-07-11 17:05:44.234669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-07-11 17:05:44.235439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10407 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Assign requires shapes of both tensors to match. lhs shape= [1,1,64,64] rhs shape= [1,1,128,64]
[[Node: save/Assign_6 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights, save/RestoreV2:6)]]
Caused by op 'save/Assign_6', defined at:
File "/home/fei/work/models/research/deeplab/train.py", line 394, in <module>
tf.app.run()
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "/home/fei/work/models/research/deeplab/train.py", line 384, in main
ignore_missing_vars=True),
File "/home/fei/work/models/research/deeplab/utils/train_utils.py", line 127, in get_model_init_fn
ignore_missing_vars=ignore_missing_vars)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 689, in assign_from_checkpoint_fn
write_version=saver_pb2.SaverDef.V1)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in __init__
self.build()
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
build_save=build_save, build_restore=build_restore)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
restore_sequentially, reshape)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 494, in _AddRestoreOps
assign_ops.append(saveable.restore(saveable_tensors, shapes))
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 185, in restore
self.op.get_shape().is_fully_defined())
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 283, in assign
validate_shape=validate_shape)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 60, in assign
use_locking=use_locking, name=name)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,64,64] rhs shape= [1,1,128,64]
[[Node: save/Assign_6 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights, save/RestoreV2:6)]]
Traceback (most recent call last):
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,64,64] rhs shape= [1,1,128,64]
[[Node: save/Assign_6 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights, save/RestoreV2:6)]]
How can I fix it?
Hi XFeiF,
The latest merge should have resolved those issues, and you could just sync to the head.
What init checkpoint are you using? If you are using the one provided in model_zoo.md, you should set model_variant = 'resnet_v1_50_beta'.
Cheers,
Hi aquariusjay,
Yes, I am using the checkpoint provided in model_zoo.md. After I set model_variant='resnet_v1_50_beta', it works right now.
Thanks!
Most helpful comment
Hi XFeiF,
The latest merge should have resolved those issues, and you could just sync to the head.
What init checkpoint are you using? If you are using the one provided in model_zoo.md, you should set model_variant = 'resnet_v1_50_beta'.
Cheers,