Models: [Deeplabv3+] resnet_v1_50 train error

Created on 11 Jul 2018  路  2Comments  路  Source: tensorflow/models

I have successfully run deeplabv3+ with xception on my own dataset.But got errors when change model_variant to resnet_v1_50.
According to #4464 , I made the following changes but still got errors:

  • in common.py line 45-46: flags.DEFINE_string('model_variant', 'mobilenet_v2', 'DeepLab model variant.')
  • I have removed argument 'activation_fn' in this line.

- Run 'resnet_v1_beta_test.py' and it's ok.

System Info

ubuntu 16.04 LTS
tf-gpu 1.8
python 3.6

python command

python train.py \
  --learning_rate=1e-5 \
  --logtostderr \
  --train_split="train" \
  --model_variant="resnet_v1_50" \
  --atrous_rates=6 \
  --atrous_rates=12 \
  --atrous_rates=18 \
  --output_stride=16 \
  --decoder_output_stride=4 \
  --train_crop_size=513 \
  --train_crop_size=513 \
  --train_batch_size=4 \
  --training_number_of_steps="100" \
  --fine_tune_batch_norm=true \
  --tf_initial_checkpoint="${INIT_FOLDER}/resnet_v1_50/model.ckpt" \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${MY_DATASET}" \
  --initialize_last_layer=False \
  --last_layers_contain_logits_only=True

log

INFO:tensorflow:Training on train set
INFO:tensorflow:Initializing model from path: /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/depthwise_weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_depthwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp1_pointwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/depthwise_weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_depthwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp2_pointwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/depthwise_weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_depthwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp3_pointwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable concat_projection/weights missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/gamma missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/beta missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable concat_projection/BatchNorm/moving_mean missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable decoder/decoder_conv1_pointwise/BatchNorm/moving_variance missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/weights/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/gamma/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable resnet_v1_50/conv1/BatchNorm/beta/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/weights/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/gamma/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable image_pooling/BatchNorm/beta/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:Variable aspp0/weights/Momentum missing in checkpoint /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
WARNING:tensorflow:From /home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py:736: Supervisor.__init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
2018-07-11 17:05:43.321358: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-07-11 17:05:43.777630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:84:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2018-07-11 17:05:43.777695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-11 17:05:44.234604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-11 17:05:44.234656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2018-07-11 17:05:44.234669: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2018-07-11 17:05:44.235439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10407 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:84:00.0, compute capability: 6.1)
INFO:tensorflow:Restoring parameters from /home/fei/work/models/research/deeplab/datasets/DATA2018/init_models/resnet_v1_50/model.ckpt
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Assign requires shapes of both tensors to match. lhs shape= [1,1,64,64] rhs shape= [1,1,128,64]
     [[Node: save/Assign_6 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights, save/RestoreV2:6)]]

Caused by op 'save/Assign_6', defined at:
  File "/home/fei/work/models/research/deeplab/train.py", line 394, in <module>
    tf.app.run()
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/fei/work/models/research/deeplab/train.py", line 384, in main
    ignore_missing_vars=True),
  File "/home/fei/work/models/research/deeplab/utils/train_utils.py", line 127, in get_model_init_fn
    ignore_missing_vars=ignore_missing_vars)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/variables.py", line 689, in assign_from_checkpoint_fn
    write_version=saver_pb2.SaverDef.V1)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1338, in __init__
    self.build()
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1347, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1384, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 835, in _build_internal
    restore_sequentially, reshape)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 494, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 185, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 283, in assign
    validate_shape=validate_shape)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 60, in assign
    use_locking=use_locking, name=name)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1,1,64,64] rhs shape= [1,1,128,64]
     [[Node: save/Assign_6 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights, save/RestoreV2:6)]]

Traceback (most recent call last):
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/fei/anaconda3/envs/tf1_8/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,64,64] rhs shape= [1,1,128,64]
     [[Node: save/Assign_6 = Assign[T=DT_FLOAT, _class=["loc:@resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](resnet_v1_50/block1/unit_1/bottleneck_v1/conv1/weights, save/RestoreV2:6)]]

How can I fix it?

Most helpful comment

Hi XFeiF,

The latest merge should have resolved those issues, and you could just sync to the head.
What init checkpoint are you using? If you are using the one provided in model_zoo.md, you should set model_variant = 'resnet_v1_50_beta'.

Cheers,

All 2 comments

Hi XFeiF,

The latest merge should have resolved those issues, and you could just sync to the head.
What init checkpoint are you using? If you are using the one provided in model_zoo.md, you should set model_variant = 'resnet_v1_50_beta'.

Cheers,

Hi aquariusjay,

Yes, I am using the checkpoint provided in model_zoo.md. After I set model_variant='resnet_v1_50_beta', it works right now.

Thanks!

Was this page helpful?
0 / 5 - 0 ratings