Models: mobilenet_v2_0.5_224 checkpoint Logits depth is 1280 and not 640 (50%), breaking transfer learning

Created on 29 May 2018 · 2Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using: slim
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): v1.8.0-1-g8753e2e
Bazel version (if compiling from source): 0.12.0
CUDA/cuDNN version: 9.0/7.0
GPU model and memory: GeForce GTX 1080 Ti/11264MB
Exact command to reproduce: python3 train_image_classifier.py --train_dir=/path/to/train/dir --checkpoint_path=/path/to/mobilenet_v2_0.5_224/mobilenet_v2_0.5_224.ckpt --dataset_dir=/path/to/dataset/parent/dir --dataset_name=arbitrary_name_of_dataset_with_four_classes --model_name=name_of_wrapper_network_fn_having_0.5_as_its_default_value_for_depth_multiplier --checkpoint_exclude_scopes=MobilenetV2/Logits --trainable_scopes=MobilenetV2/Logits --batch_size=128 --optimizer=rmsprop --log_every_n_steps=803 --save_interval_secs=60 --save_summaries_secs=60 --max_checkpoints_to_keep=1000 --num_preprocessing_threads=8 --num_readers=8 --num_clones=1 --gpu_device_num=0 --gpu_memory_fraction=0.85

Describe the problem

In short, the Logits layer of the pre-trained mobilenet_v2_0.5_224 checkpoint linked to in slim/nets/mobilenet/README.md appears to have had its original depth of 1280 preserved during training while the rest of the network was scaled down by 50%. When using nets_factory to instantiate a network having a depth_multiplier of 0.5, the entire network is scaled down by 50%, including the Logits layer. Consequently, the checkpoint fails to be restored and the following error is presented:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [640] rhs shape= [1280]
[[Node: save_1/Assign_7 = Assign[T=DT_FLOAT, _class=["loc:@MobilenetV2/Conv_1/BatchNorm/moving_mean"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](MobilenetV2/Conv_1/BatchNorm/moving_mean, save_1/RestoreV2:7)]]

This failure also occurs for mobilenet_v2_0.75_224, but not for mobilenet_v2_1.4_224 (using a similar wrapper function that I added to mobilenet_v2.py to make using mobilenets of various sizes convenient). I have not tested other sizes.

If mobilenet_v2 is intended to have a minimum Logits layer size of 1280, then the tf-slim implementation should enforce that (assuming I'm right that it does not). I would much prefer, though, that the smaller mobilenet_v2 models be re-trained, and that the current depth_multiplier behavior remain unchanged.

Source code / logs

Source code that I added to slim/nets/mobilenet/mobilenet_v2.py:

@slim.add_arg_scope
def mobilenet_050(input_tensor, num_classes=1001, depth_multiplier=0.5, scope='MobilenetV2', conv_defs=None, finegrain_classification_mode=False, min_depth=None, divisible_by=None, *kwargs):
return mobilenet(input_tensor, num_classes=num_classes, depth_multiplier=depth_multiplier, scope=scope, conv_defs=conv_defs, finegrain_classification_mode=finegrain_classification_mode, min_depth=min_depth, divisible_by=divisible_by, *kwargs)

mobilenet_050.default_image_size = 224

Source code that I added to slim/nets/nets_factory.py:

networks_map = { ..., 'mobilenet_v2_050': mobilenet_v2.mobilenet_050, ... }
arg_scopes_map = { ..., 'mobilenet_v2_050': mobilenet_v2.training_scope, ... }
default_image_size_map = { ..., 'mobilenet_v2_050': mobilenet_v2.mobilenet_050.default_image_size, ... }

Source code that I added to slim/preprocessing/preprocessing_factory.py:

preprocessing_fn_map = { ..., 'mobilenet_v2_050': inception_preprocessing, ... }

Traceback log

traceback_log.txt

Source

foabodo

Most helpful comment

Have you tride finegrain_classification_mode=True in your call to mobilenet (nets_factory diverts here)? This looks like intended behaviour, and they just haven't released weights for models trained with finegrain_classification_mode=False