Incubator-mxnet: General support for Float16 and other DTypes

Created on 1 Jun 2016 · 30Comments · Source: apache/incubator-mxnet

So from what I can tell the following operators currently don't support anything than real_t e.g. Float32. I am going to work to fix the ones important for my research and I would welcome any help. I feel that having comprehensive support for other datatypes is important for MXNet.

Up for grabs

[ ] crop
[ ] slice_channel
[ ] softmax_activation
[ ] matrix_op
[ ] l2_normalization
[ ] make_loss
[ ] identity_attach_KL_sparse_reg
[ ] broadcast_reduce
[ ] embedding
[ ] smooth_l1_unary

Depending on a resolution to https://github.com/dmlc/mshadow/issues/125
[ ] leaky_relu #2280
[ ] regression_output #3018
[ ] lrn
[x] batch_norm

Done
[x] roi_pooling #3011
[x] deconvolution #2322
[x] dropout
[x] pooling
[x] reshape #2380
[x] swapaxis #2380
[x] elementwise_sum #2380
[x] upsampling #2380
[x] concat #2380
[x] block_grad#2380

Call for Contribution FP16 Feature request Operator

Source

vchuravy

👍6

Most helpful comment

I would need to mention one thing. Simply support half_t type was not enough for making things faster with fp16. Usually a explicit vectorization of code is needed. So unless things are operated together in a Packet structure with intrinsics. There is less likely to be speedup.

tqchen on 21 Jun 2016

👍2

All 30 comments

May I ask how the half precision is supported on the CPU?
Can I view the pooling #2280 as an example to do this?

Godricly on 2 Jun 2016

👍1

I am following the way convolution is implemented and I think on CPU Float16 is implemented by promoting to Float32. https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html

vchuravy on 2 Jun 2016

I have a 1080 coming in next week. Let's try to merge in the common ones so we can have benchmark numbers released asap

piiswrong on 3 Jun 2016

If you start working on one, just post it here so that we don't duplicate effort.

vchuravy on 3 Jun 2016

@Godricly #2322 Is probably a good template of how to do it.

vchuravy on 3 Jun 2016

I updated the list a bit.

vchuravy on 11 Jun 2016

tqchen on 21 Jun 2016

👍2

It sounds like the underlining mshadow needs optimization for data alignment. Another thing I'm thinking about is whether we should add a option to enable backward computation in higher precision(float) for half_t type. The half_t type cannot represent too small gradients. And it will be messy in coding to enable this.
Working on embedding now.

Godricly on 22 Jun 2016

@vchuravy I have a updated Dtype pooling branch based on your work. Can you double check this one and submit a pr? My fork of MxNet is kind of messy now.

Godricly on 20 Jul 2016

@Godricly Hi, do you have some examples of using fp16? Is it used in training or inference?

xlvector on 25 Jul 2016

Not yet. Basically, u can insert some cast layer to transform input(data and label) into fp16, so the network flows in fp16. Currently, mxnet has compatible issue with fp16, so I cancelled my previous PR #2564.

If you are interested in fp16, u can follow my branch to enable fp16 param init and single machine training, the multi machine one depends ps-lite which is a little bit hard to get it work.

U also need to make some modification on optimizer, which use float type to update weights and convert them back to fp16 in network.

For the lstm case, the provided data type is need, which is painful. If you have any better solution, please let me know. 😆

BTW, the DType BN is only functional using cudnn.

Godricly on 25 Jul 2016

@Godricly Thanks very much.

xlvector on 26 Jul 2016

Pooling and Dropout have been merged.

Godricly on 10 Aug 2016

@vchuravy Can you update the todo list please? Or create a new issue to track the progress?
DType regression is submitted in #3018.

Godricly on 15 Aug 2016

Updated it. What is your current status on BatchNorm? #2562 Is my latest stand but you mentioned that you made some updates?

vchuravy on 15 Aug 2016

There is a branch under my mxnet.If you are only using cuDNN BN, it should be good enough to start with.

It is functional with cuDNN. But not with the native mshadow one.
The cudnn BN of FP16 is using float for mean and variance. I haven't figure out how to get infer_type compatible both with and without cuDNN. The marco I used will break non-cuDNN version.

Considering these two issues, I didn't submit it.

Godricly on 15 Aug 2016

what's the situation of fp16 support in mxnet

lygstate on 21 May 2017

@lygstate inference or trainning acceleration on mobile or embed system device etc.

ysh329 on 21 May 2017

I means the progress of fp16 support， if it's not finished what I can do for it？

lygstate on 21 May 2017

@lygstate you can train a fp16 model from scratch by using cast function in symbol file.
How to set int8 or float16 to predict? · Issue #5822 · dmlc/mxnet
https://github.com/dmlc/mxnet/issues/5822

ysh329 on 22 May 2017

I want do traning in fp32， but predict in fp16。 are that possible？

lygstate on 22 May 2017

@piiswrong @Godricly

ysh329 on 23 May 2017

@lygstate
There is a fp16 example of image classification. You can refer to that one.
You can predict with trained fp32 model using fp16 with proper clipping, But I think the performance will drop.
However I don't think you can deploy fp16 on mobile devices with mxnet. The current one relies on cudnn backend.

Godricly on 23 May 2017

@Godricly , Yeap, I want using fp16 with cudnn for performance reason:) Thanks a lot

lygstate on 23 May 2017

Quick fix for monitoring weights on float16: https://github.com/apache/incubator-mxnet/issues/8506

lisa-imagia on 1 Nov 2017

Seems like crop, slice_channel, softmax_activation are all deprecated operators, I think maybe we can skip the support for FP16 for those operators?

haojin2 on 15 Mar 2018

@eric-haibin-lin Since we have quite a few separate (specific) requests for FP16 support. Do we merge one together and close out the redundant ones? or we keep the issues the way they are?