Tested with tensorflow versions 2.2 and 2.3, and tensorflow addons 0.11.1 and 0.10.0
On Google Colab Pro gpu env, python3.8
I presenced this bug when using "num_parallel_calls=tf.data.experimental.AUTOTUNE" inside the .map call from my dataset, no exception is thrown and my code runs eternally.
parsed_dataset = parsed_dataset.map(translate_and_crop,num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(tf.data.experimental.AUTOTUNE)
parsed_dataset = parsed_dataset.unbatch()
iterator = tf.compat.v1.data.make_one_shot_iterator(parsed_dataset)
image,label = iterator.get_next()
translate = lambda image,label: tf.py_function(func=translate_pipeline,inp=[image,label],Tout=[tf.float32,tf.int64])
def translate_pipeline(original_image,label):
print(1)
height = tf.shape(original_image)[0].numpy()
width = tf.shape(original_image)[1].numpy()
y_fraction = tf.convert_to_tensor(height * 0.2, dtype=tf.float32)
x_fraction = tf.convert_to_tensor(width * 0.2,dtype=tf.float32)
print(2)
batched_image = tf.tile(tf.expand_dims(original_image,axis=0),[4,1,1,1]) # Create 4 copied versions from the original image and add to a batch
translated_images = tfa.image.translate_ops.translate(images=batched_image,translations=[[x_fraction,-y_fraction],[-x_fraction,y_fraction],[-x_fraction,-y_fraction],[x_fraction,y_fraction]])
augmented_images = tf.concat([tf.expand_dims(original_image,axis=0),translated_images],axis=0)
print(3)
label = tf.reshape(label,[1,1])
labels = tf.tile(label,[5,1])
print(4)
return augmented_images, labels
1
2Output when removing "num_parallel_calls=tf.data.experimental.AUTOTUNE"
1
2
3
4
Do you have a very minimal complete example that we could copy, paste an run to reproduce this?
Do you have a very minimal complete example that we could copy, paste an run to reproduce this?
Link to example TFRecord: https://drive.google.com/drive/folders/1dc6ehBGL_mwGTuSy71VhUYVp0eMdHADP?usp=sharing
test_dataset = tf.data.TFRecordDataset(num_parallel_reads=tf.data.experimental.AUTOTUNE,filenames=DRIVE_DIR+'/tf_issue/test_0.tfrecord').map(parsing_fn,num_parallel_calls=tf.data.experimental.AUTOTUNE)
test_dataset = test_dataset.map(translate,num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(tf.data.experimental.AUTOTUNE)
test_dataset = test_dataset.unbatch()
iterator = tf.compat.v1.data.make_one_shot_iterator(test_dataset)
for i in range(5):
image,label = iterator.get_next()
def parsing_fn(serialized):
features = \
{
'image': tf.io.FixedLenFeature([], tf.string),
'label': tf.io.FixedLenFeature([], tf.int64)
}
parsed_example = tf.io.parse_single_example(serialized=serialized,
features=features)
image_raw = parsed_example['image']
image = tf.io.decode_jpeg(image_raw)
image = tf.image.resize(image,size=[224,224])
label = parsed_example['label']
return image, label
translate = lambda image,label: tf.py_function(func=translate_pipeline,inp=[image,label],Tout=[tf.float32,tf.int64])
def translate_pipeline(original_image,label):
print(1)
height = tf.shape(original_image)[0].numpy()
width = tf.shape(original_image)[1].numpy()
y_fraction = tf.convert_to_tensor(height * 0.2, dtype=tf.float32)
x_fraction = tf.convert_to_tensor(width * 0.2,dtype=tf.float32)
print(2)
batched_image = tf.tile(tf.expand_dims(original_image,axis=0),[4,1,1,1]) # Create 4 copied versions from the original image and add to a batch
translated_images = tfa.image.translate_ops.translate(images=batched_image,translations=[[x_fraction,-y_fraction],[-x_fraction,y_fraction],[-x_fraction,-y_fraction],[x_fraction,y_fraction]])
augmented_images = tf.concat([tf.expand_dims(original_image,axis=0),translated_images],axis=0)
print(3)
label = tf.reshape(label,[1,1])
labels = tf.tile(label,[5,1])
print(4)
return augmented_images, labels
1
1
2
2
/cc @WindQAQ It seems to me not a problem with TFA but more with TF https://www.tensorflow.org/api_docs/python/tf/raw_ops/ImageProjectiveTransformV2
I've tried to substitute with tf.python.keras.layers.preprocessing.image_preprocessing.transform directly in the translate_ops.py and the deadlock seems the same.
Have you tried not to wrap your pipeline with tf.py_function?
no but i need it in order to use the image numpy values
There Is https://github.com/tensorflow/tensorflow/issues/32454 but It Is not exactly the same.
But e.g. on resize it doesn't hang with the example in this ticket.
It Is why seems to me that is this specific to this kernel with auto.
How about change
height = tf.shape(original_image)[0].numpy()
width = tf.shape(original_image)[1].numpy()
y_fraction = tf.convert_to_tensor(height * 0.2, dtype=tf.float32)
x_fraction = tf.convert_to_tensor(width * 0.2,dtype=tf.float32)
into
height = tf.cast(tf.shape(original_image)[0], dtype=tf.float32)
width = tf.cast(tf.shape(original_image)[1], dtype=tf.float32)
y_fraction = height * 0.2
x_fraction = width * 0.2
In this way, you can bypass the need to wrap it with tf.py_function.
I will verify it, which may even fix some retracing warnings i had, thank you by the way. Although, as i mentioned, i was able to manage by removing the 'tf.data.experimental.AUTOTUNE' which wasn't a big deal, though i thought of creating this issue as a way of reporting this possible bug.
Closing as this do to underlying core functionality. Please feel free to comment if you feel otherwise and we can re-open. Thanks for bringing the issue up!
@FalsoMoralista I still suggest you to open an issue in tensorflow repo mentioning this one.