Describe the feature and the current behavior/state.
Currently, some functions in tfa.image that support data_format args like equalize, cutout, and random_cutout, while most of them do not. Moreover, all functions in tf.image support only channel-last images.
Therefore, I wonder if we can drop the support for data_format argument and channel-first image in tfa.image for some reasons:
tf.image: Usually, we compose multiple functions in tf.image and tfa.image. So if one of the function does not support channel-first image, there must exist at least one tf.transpose operation to change the data format. In this case, the support of channel-first image seems vague and useless as we have to explicitly change it into channel-last format in order for input pipeline.tfa.image just selects different axis to unpack, compute and pack.to_4D_image and from_4D_image to make most operations compatible with tf.data.Dataset, while they operate only on channel last images. We can indeed extend those functions for channel first images, but I am also confused if it's worthwhile.Relevant information
Which API type would this fall under (layer, metric, optimizer, etc.)
image
Who will benefit with this feature?
developers who use tfa.image
Any other info.
I prefer to be uniform with tf.image and also with keras_image in the future if it will follow this same "standard" format.
Be consistent with tf.image: Usually, we compose multiple functions in tf.image and tfa.image. So if one of the function does not support channel-first image, there must exist at least one tf.transpose operation to change the data format. In this case, the support of channel-first image seems vague and useless as we have to explicitly change it into channel-last format in order for input pipeline.
I am not sure if we are uniform also about the casting operations.
Thanks for bringing this up! This much very much needs to be decided upon and standardized.
I also agree we should be consistant with what tf.image supports, but just want to confirm that NHWC is all that tf.image plans to support in the future. Seeing as keras has data_format args, I want to verify that support isn't on the roadmap for tf.image ops given the new Keras preprocessing pipelines.
@tanzhenyu Would you be able to confirm that our understanding is correct or point us to someone who could?
For conv ops:
As of today, NHWC is what CPU only supports. In Cuda, it will mostly convert to NCHW first and call an efficient cudnn, except if it's float16 and > Volta V100, but this is done by default.
For tf.image ops:
They're all based on NHWC format, in the kernel level. This is not something we can change at python level at all.
So yeah I think we should be consistent with tf.image
+1, it sounds great to keep tfa.image simple enough.
Thanks for the comments! I also recommend that tfa.image should be compatible with 3-D image because the common use case is to use .map(lambda x: tfa.image.foo(x)) to process a batch of images.
Shape casting: we do to_4D_image and from_4D_image to make most operations compatible with tf.data.Dataset, while they operate only on channel last images. We can indeed extend those functions for channel first images, but I am also confused if it's worthwhile.
@WindQAQ I don't understand in the current status if we need to do this or not cause in https://github.com/tensorflow/addons/pull/1840 we was going to set the shape bypassing the 4d utils.
Shape casting: we do to_4D_image and from_4D_image to make most operations compatible with tf.data.Dataset, while they operate only on channel last images. We can indeed extend those functions for channel first images, but I am also confused if it's worthwhile.
@WindQAQ I don't understand in the current status if we need to do this or not cause in #1840 we was going to set the shape bypassing the
4dutils.
@bhack I suppose we should at least support 3D image as most of functions in tf.image do that. So we can have another PR to fix it.
@WindQAQ What Is supporting the official Vision Model garden refactoring https://github.com/tensorflow/models/blob/master/official/vision/image_classification/augment.py#L37?
Do you think that users could mix that with tfa.image or tf.image?
@WindQAQ What Is supporting the official Vision Model garden refactoring https://github.com/tensorflow/models/blob/master/official/vision/image_classification/augment.py#L37?
Do you think that users could mix that withtfa.imageortf.image?
When migrating everything to addons (or others repo), it should stick to the rule of addons IMO. So the implementation should be either refactored or discarded.
Thanks for the comments! I also recommend that
tfa.imageshould be compatible with 3-D image because the common use case is to use.map(lambda x: tfa.image.foo(x))to process a batch of images.
In most cases, it might be better to use data.batch(xx).map(xxx) for performance reason :-)
You can check Gsoc progress at https://github.com/tanzhenyu/image_augmentation
Reopen this as the data_format will be dropped in Addons 0.12.
Most helpful comment
For conv ops:
As of today, NHWC is what CPU only supports. In Cuda, it will mostly convert to NCHW first and call an efficient cudnn, except if it's float16 and > Volta V100, but this is done by default.
For tf.image ops:
They're all based on NHWC format, in the kernel level. This is not something we can change at python level at all.
So yeah I think we should be consistent with tf.image