Incubator-mxnet: General support for Float16 and other DTypes

Created on 1 Jun 2016  Â·  30Comments  Â·  Source: apache/incubator-mxnet

So from what I can tell the following operators currently don't support anything than real_t e.g. Float32. I am going to work to fix the ones important for my research and I would welcome any help. I feel that having comprehensive support for other datatypes is important for MXNet.

Up for grabs

  • [ ] crop
  • [ ] slice_channel
  • [ ] softmax_activation
  • [ ] matrix_op
  • [ ] l2_normalization
  • [ ] make_loss
  • [ ] identity_attach_KL_sparse_reg
  • [ ] broadcast_reduce
  • [ ] embedding
  • [ ] smooth_l1_unary

    Depending on a resolution to https://github.com/dmlc/mshadow/issues/125

  • [ ] leaky_relu #2280

  • [ ] regression_output #3018
  • [ ] lrn
  • [x] batch_norm

    Done

  • [x] roi_pooling #3011

  • [x] deconvolution #2322
  • [x] dropout
  • [x] pooling
  • [x] reshape #2380
  • [x] swapaxis #2380
  • [x] elementwise_sum #2380
  • [x] upsampling #2380
  • [x] concat #2380
  • [x] block_grad#2380
Call for Contribution FP16 Feature request Operator

Most helpful comment

I would need to mention one thing. Simply support half_t type was not enough for making things faster with fp16. Usually a explicit vectorization of code is needed. So unless things are operated together in a Packet structure with intrinsics. There is less likely to be speedup.

All 30 comments

May I ask how the half precision is supported on the CPU?
Can I view the pooling #2280 as an example to do this?

I am following the way convolution is implemented and I think on CPU Float16 is implemented by promoting to Float32. https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html

I have a 1080 coming in next week. Let's try to merge in the common ones so we can have benchmark numbers released asap

If you start working on one, just post it here so that we don't duplicate effort.

@Godricly #2322 Is probably a good template of how to do it.

I updated the list a bit.

I would need to mention one thing. Simply support half_t type was not enough for making things faster with fp16. Usually a explicit vectorization of code is needed. So unless things are operated together in a Packet structure with intrinsics. There is less likely to be speedup.

It sounds like the underlining mshadow needs optimization for data alignment. Another thing I'm thinking about is whether we should add a option to enable backward computation in higher precision(float) for half_t type. The half_t type cannot represent too small gradients. And it will be messy in coding to enable this.
Working on embedding now.

@vchuravy I have a updated Dtype pooling branch based on your work. Can you double check this one and submit a pr? My fork of MxNet is kind of messy now.

@Godricly Hi, do you have some examples of using fp16? Is it used in training or inference?

Not yet. Basically, u can insert some cast layer to transform input(data and label) into fp16, so the network flows in fp16. Currently, mxnet has compatible issue with fp16, so I cancelled my previous PR #2564.

If you are interested in fp16, u can follow my branch to enable fp16 param init and single machine training, the multi machine one depends ps-lite which is a little bit hard to get it work.

U also need to make some modification on optimizer, which use float type to update weights and convert them back to fp16 in network.

For the lstm case, the provided data type is need, which is painful. If you have any better solution, please let me know. 😆

BTW, the DType BN is only functional using cudnn.

@Godricly Thanks very much.

Pooling and Dropout have been merged.

@vchuravy Can you update the todo list please? Or create a new issue to track the progress?
DType regression is submitted in #3018.

Updated it. What is your current status on BatchNorm? #2562 Is my latest stand but you mentioned that you made some updates?

There is a branch under my mxnet.If you are only using cuDNN BN, it should be good enough to start with.

  • It is functional with cuDNN. But not with the native mshadow one.
  • The cudnn BN of FP16 is using float for mean and variance. I haven't figure out how to get infer_type compatible both with and without cuDNN. The marco I used will break non-cuDNN version.

Considering these two issues, I didn't submit it.

what's the situation of fp16 support in mxnet

@lygstate inference or trainning acceleration on mobile or embed system device etc.

I means the progress of fp16 support, if it's not finished what I can do for it?

@lygstate you can train a fp16 model from scratch by using cast function in symbol file.
How to set int8 or float16 to predict? · Issue #5822 · dmlc/mxnet
https://github.com/dmlc/mxnet/issues/5822

I want do traning in fp32, but predict in fp16。 are that possible?

@piiswrong @Godricly

@lygstate
There is a fp16 example of image classification. You can refer to that one.
You can predict with trained fp32 model using fp16 with proper clipping, But I think the performance will drop.
However I don't think you can deploy fp16 on mobile devices with mxnet. The current one relies on cudnn backend.

@Godricly , Yeap, I want using fp16 with cudnn for performance reason:) Thanks a lot

Quick fix for monitoring weights on float16: https://github.com/apache/incubator-mxnet/issues/8506

Seems like crop, slice_channel, softmax_activation are all deprecated operators, I think maybe we can skip the support for FP16 for those operators?

@eric-haibin-lin Since we have quite a few separate (specific) requests for FP16 support. Do we merge one together and close out the redundant ones? or we keep the issues the way they are?

@ChaiBapchya this list might actually be out-dated now..

Do you recommend closing this issue in that case?

I do see that many ops are going to be deprecated. Closing it now. Please file separate github issue when an unsupported fp16 op is encountered.

Was this page helpful?
0 / 5 - 0 ratings