Lately, many papers describing a new deep architecture report the number of parameters the net requires and the number of flops (floating-point operations) required to process a single input through the net (a forward-pass).
Is it possible to add a simple tool that computes these quantities given a "deploy.prototxt" file?
You can find a similar request here: http://stackoverflow.com/q/30403590/1714410
Yes this will be very nice.
I was assuming its already there on caffe,Given, its very basic and fundamental tool to quickly analyse network. Can Caffe developers please comment on this thread??
I think that it's indeed necessary! Why hasn't any of developers answered this issue yet?!
For C++, just single line to implement it in addition to variable initialization:
for (size_t i = 0; i < net.params().size(); ++i)
count += net.params()[i]->count();
@guohengkai this line only counts the number of parameters. What about number of flops?
See this SO question.
To count the number of trainable parameters in a model in python:
import caffe
net = caffe.Net('/path/to/trainval.prototxt',caffe.TRAIN)
count = 0
for l in net.layers:
if l.type == 'BatchNorm":
continue # batchnorm params are not exactly "trainable"
for b in list(l.blobs):
count += b.data.size
Regarding "BatchNorm" this layer has internal parameters but they are not exactly "trained" using SGD, they are more "adapted" to data. I'm not sure if they should be counted as "trainable" parameters or not.
@shaibagon only CNN and Fully Connected layers are usually (as far as I have seen) considered. Batchnorm as you have just pointed out is not considered trainable .
@Coderx7 what about "PReLU"? what about "Scale" and "Bias"? and "LSTM" and "Recurrent"?
In a typical CNN architecture, only CNN and fully connected layers are taken into account as far as I know. I have no idea about LSTM or other recurrent variants .
If you are only concerned with "InnerProduct" and "Convolution" ("Deconvolution"?) maybe you can estimate the number of FLOPS required based on the input blob shape and the kernel size?
@shaibagon : Actually I'm not sure if this is the correct way, after all I guess in calculating FLOPS the whole architecture needs to be taken into account, I can not simply ignore BN or other intermediate operations, and only use learnable parameters. besides, How am I supposed to estimate it using learnable parameters only?
What about number of FLOPS?
Apparently, there is an online tool that is able to estimate number of FLOPS. See this SO answer.
Most helpful comment
To count the number of trainable parameters in a model in python:
Regarding
"BatchNorm"this layer has internal parameters but they are not exactly "trained" using SGD, they are more "adapted" to data. I'm not sure if they should be counted as "trainable" parameters or not.