Addons: bug when using "num_parallel_calls" when mapping dataset to tfa function

Created on 31 Aug 2020 · 10Comments · Source: tensorflow/addons

Tested with tensorflow versions 2.2 and 2.3, and tensorflow addons 0.11.1 and 0.10.0
On Google Colab Pro gpu env, python3.8

I presenced this bug when using "num_parallel_calls=tf.data.experimental.AUTOTUNE" inside the .map call from my dataset, no exception is thrown and my code runs eternally.

Code to reproduce the issue:

parsed_dataset = parsed_dataset.map(translate_and_crop,num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(tf.data.experimental.AUTOTUNE)
parsed_dataset = parsed_dataset.unbatch()
iterator = tf.compat.v1.data.make_one_shot_iterator(parsed_dataset)
image,label = iterator.get_next()

Mapped Function

translate = lambda image,label: tf.py_function(func=translate_pipeline,inp=[image,label],Tout=[tf.float32,tf.int64])
def translate_pipeline(original_image,label):
  print(1)
  height = tf.shape(original_image)[0].numpy()    
  width = tf.shape(original_image)[1].numpy()
  y_fraction  = tf.convert_to_tensor(height * 0.2, dtype=tf.float32)
  x_fraction = tf.convert_to_tensor(width * 0.2,dtype=tf.float32)
  print(2)
  batched_image = tf.tile(tf.expand_dims(original_image,axis=0),[4,1,1,1]) # Create 4 copied versions from the original image and add to a batch
  translated_images = tfa.image.translate_ops.translate(images=batched_image,translations=[[x_fraction,-y_fraction],[-x_fraction,y_fraction],[-x_fraction,-y_fraction],[x_fraction,y_fraction]])
  augmented_images = tf.concat([tf.expand_dims(original_image,axis=0),translated_images],axis=0)
  print(3)
  label = tf.reshape(label,[1,1])
  labels = tf.tile(label,[5,1])
  print(4)
  return augmented_images, labels

Output

1
2

Output when removing "num_parallel_calls=tf.data.experimental.AUTOTUNE"

1
2
3
4

image

Source

FalsoMoralista

All 10 comments

Do you have a very minimal complete example that we could copy, paste an run to reproduce this?

bhack on 31 Aug 2020

Do you have a very minimal complete example that we could copy, paste an run to reproduce this?

Link to example TFRecord: https://drive.google.com/drive/folders/1dc6ehBGL_mwGTuSy71VhUYVp0eMdHADP?usp=sharing

Code to Reproduce the issue:

test_dataset = tf.data.TFRecordDataset(num_parallel_reads=tf.data.experimental.AUTOTUNE,filenames=DRIVE_DIR+'/tf_issue/test_0.tfrecord').map(parsing_fn,num_parallel_calls=tf.data.experimental.AUTOTUNE)
test_dataset = test_dataset.map(translate,num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(tf.data.experimental.AUTOTUNE) 
test_dataset = test_dataset.unbatch()
iterator = tf.compat.v1.data.make_one_shot_iterator(test_dataset)
for i in range(5):
  image,label = iterator.get_next()

Auxiliary functions

def parsing_fn(serialized):
    features = \
        {
            'image': tf.io.FixedLenFeature([], tf.string),
            'label': tf.io.FixedLenFeature([], tf.int64)            
        }
    parsed_example = tf.io.parse_single_example(serialized=serialized,
                                             features=features)
    image_raw = parsed_example['image']
    image = tf.io.decode_jpeg(image_raw)    
    image = tf.image.resize(image,size=[224,224])    
    label = parsed_example['label']    
    return image, label

translate = lambda image,label: tf.py_function(func=translate_pipeline,inp=[image,label],Tout=[tf.float32,tf.int64])
def translate_pipeline(original_image,label):
  print(1)
  height = tf.shape(original_image)[0].numpy()    
  width = tf.shape(original_image)[1].numpy()
  y_fraction  = tf.convert_to_tensor(height * 0.2, dtype=tf.float32)
  x_fraction = tf.convert_to_tensor(width * 0.2,dtype=tf.float32)
  print(2)
  batched_image = tf.tile(tf.expand_dims(original_image,axis=0),[4,1,1,1]) # Create 4 copied versions from the original image and add to a batch
  translated_images = tfa.image.translate_ops.translate(images=batched_image,translations=[[x_fraction,-y_fraction],[-x_fraction,y_fraction],[-x_fraction,-y_fraction],[x_fraction,y_fraction]])
  augmented_images = tf.concat([tf.expand_dims(original_image,axis=0),translated_images],axis=0)
  print(3)
  label = tf.reshape(label,[1,1])
  labels = tf.tile(label,[5,1])
  print(4)
  return augmented_images, labels

Output

1
1
2
2

FalsoMoralista on 31 Aug 2020

/cc @WindQAQ It seems to me not a problem with TFA but more with TF https://www.tensorflow.org/api_docs/python/tf/raw_ops/ImageProjectiveTransformV2
I've tried to substitute with tf.python.keras.layers.preprocessing.image_preprocessing.transform directly in the translate_ops.py and the deadlock seems the same.

bhack on 31 Aug 2020

👍1

Have you tried not to wrap your pipeline with tf.py_function?

WindQAQ on 31 Aug 2020

no but i need it in order to use the image numpy values

FalsoMoralista on 31 Aug 2020

There Is https://github.com/tensorflow/tensorflow/issues/32454 but It Is not exactly the same.

But e.g. on resize it doesn't hang with the example in this ticket.
It Is why seems to me that is this specific to this kernel with auto.

bhack on 31 Aug 2020

How about change

height = tf.shape(original_image)[0].numpy()    
width = tf.shape(original_image)[1].numpy()
y_fraction  = tf.convert_to_tensor(height * 0.2, dtype=tf.float32)
x_fraction = tf.convert_to_tensor(width * 0.2,dtype=tf.float32)

into

height = tf.cast(tf.shape(original_image)[0], dtype=tf.float32)
width = tf.cast(tf.shape(original_image)[1], dtype=tf.float32)
y_fraction  = height * 0.2
x_fraction = width * 0.2

In this way, you can bypass the need to wrap it with tf.py_function.

WindQAQ on 31 Aug 2020

I will verify it, which may even fix some retracing warnings i had, thank you by the way. Although, as i mentioned, i was able to manage by removing the 'tf.data.experimental.AUTOTUNE' which wasn't a big deal, though i thought of creating this issue as a way of reporting this possible bug.

FalsoMoralista on 1 Sep 2020

Closing as this do to underlying core functionality. Please feel free to comment if you feel otherwise and we can re-open. Thanks for bringing the issue up!