Onnx: Should ONNX have Dropout / request for FeatureDropout

Created on 12 Oct 2017  路  3Comments  路  Source: onnx/onnx

If ONNX is intended to be an inference oriented framework, arguably it should not have Dropout at all (since this is a purely training-time construct).

If we decide we should keep Dropout, then I'd also like to request adding FeatureDropout to ONNX https://github.com/pytorch/pytorch/blob/master/torch/nn/_functions/dropout.py#L58

operator

Most helpful comment

My bad, I should have motivated this better (and also, perhaps FeatureDropout is not a very good name).

"FeatureDropout", also known as "Dropout2d/Dropout3d" in PyTorch (http://pytorch.org/docs/master/nn.html#torch.nn.Dropout2d) slightly modifies the Dropout algorithm so that only channels (the first two dimensions as in NCHW ordering; the important thing is that it matches the output of Conv) are randomly masked out; the actual pixels aren't randomly masked (since the correlations between adjacent pixels means that the masking is just decreasing the effective learning rate.)

It's a fairly important technique and shows up in some models that we are interested in exporting.

All 3 comments

Although dropout is nop in inference, but they are still in the model saved after training. Also, no one prevent anyone from loading the model in CNTK or PyTorch or Caffe2 and do fine tuning. When you load ONNX in CNTK, it just covert ONNX model to CNTK internal presentation, in theory you can continue training, replace some nodes...etc.

Regarding FeatureDropout, it isn't clear to me from the code you share what it does? Can you provide more detail?

My bad, I should have motivated this better (and also, perhaps FeatureDropout is not a very good name).

"FeatureDropout", also known as "Dropout2d/Dropout3d" in PyTorch (http://pytorch.org/docs/master/nn.html#torch.nn.Dropout2d) slightly modifies the Dropout algorithm so that only channels (the first two dimensions as in NCHW ordering; the important thing is that it matches the output of Conv) are randomly masked out; the actual pixels aren't randomly masked (since the correlations between adjacent pixels means that the masking is just decreasing the effective learning rate.)

It's a fairly important technique and shows up in some models that we are interested in exporting.

(and also, perhaps FeatureDropout is not a very good name)

Instead of coming up with names for additional dropout variants, consider adding a parameter to the existing Dropout that gives the list of axes to tie together (i.e., an empty list by default, [0] for batchwise dropout, [1] for dropping pixels/voxels in all channels, [2,3] for 2d spatial dropout, [2,3,4] for 3d spatial dropout, all assuming NC012... layout). This keeps the namespace simple and gives much greater flexibility in what can be represented.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

leodestiny picture leodestiny  路  5Comments

zhangguangzhi picture zhangguangzhi  路  5Comments

kbullis picture kbullis  路  4Comments

ezyang picture ezyang  路  4Comments

RanyaJumah picture RanyaJumah  路  3Comments