I want to know what will happen if I change depth multiplier number from 1 to 0.5? Will it be more fast when infereence?And is the depth multiplier here the same thing as the width multiplier in the mobilenet paper?
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce
The depth multiplier
here is the width multiplier
of the original paper. I don't know why the author of the paper call the alpha with the name width multiplier
, call it depth multiplier
is much more reasonable as far as I concerned.
You can refer to this post ,http://machinethink.net/blog/mobilenet-v2/ as it pointer:
The most important of these hyperparameters is the depth multiplier
, confusingly also known as the width multiplier
. This changes how many channels are in each layer. Using a depth multiplier
of 0.5 will halve the number of channels used in each layer, which cuts down the number of computations by a factor of 4 and the number of learnable parameters by a factor 3. It is therefore much faster than the full model but also less accurate.
Closing this issue since explanation is provided above. Feel free to reopen if want to have a follow up discussion. Thanks!
Most helpful comment
The
depth multiplier
here is thewidth multiplier
of the original paper. I don't know why the author of the paper call the alpha with the namewidth multiplier
, call itdepth multiplier
is much more reasonable as far as I concerned.You can refer to this post ,http://machinethink.net/blog/mobilenet-v2/ as it pointer:
The most important of these hyperparameters is the
depth multiplier
, confusingly also known as thewidth multiplier
. This changes how many channels are in each layer. Using adepth multiplier
of 0.5 will halve the number of channels used in each layer, which cuts down the number of computations by a factor of 4 and the number of learnable parameters by a factor 3. It is therefore much faster than the full model but also less accurate.