Recently we've seen that #53 is causing flaky failures in the CI. See:
https://source.cloud.google.com/results/invocations/8f31faef-505a-440e-b75f-e6edf1071269/targets/tensorflow_addons%2Fubuntu%2Fgpu%2Fpy3%2Fpresubmit/log
Do you mind taking a look when time allows @WindQAQ ?
Seems that the behavior of tf.debugging.assert* that we investigated in #53 results in the error.
But I'm not so sure why the test fails even if it runs in eager mode only.
Okay, I find that the following script also fails in tf 2.0.0-dev20190403 on my machine. CC @tomerk for visibility.
import tensorflow as tf
from tensorflow.python.framework import test_util
@tf.function
def foo(x):
tf.debugging.assert_greater_equal(x.shape[0], 2, message="fail")
y = x[1]
return y
class TestAssert(tf.test.TestCase):
def test_assert(self):
with self.assertRaisesRegexp(tf.errors.InvalidArgumentError, "fail"):
self.evaluate(foo(tf.random.uniform(shape=(1, 2, 3))))
if __name__ == "__main__":
tf.test.main()
@WindQAQ So while this does look like a bug... is there any reason we need to be using tf.debugger for the checks in dense_image_warp? I believe the uses can all be checked using the python interpreter (Or in graph mode with tf.function and typical python syntax)?
The checks are all about shape inference. If tf.function can handle the task of None shape or this is even not a problem in TF 2.0, I think the checks can definitely use simple assert, if and raise instead.
Could we try a PR that uses just if and raise and then make sure there is a test case that will fail if it doesn't properly trap shape issue?
Sure, I'll create a PR in one or two days :-)
In the latest docker image (tf==2.0.0-dev20190405), I find that the error is caused by test_zero_flows. The workaround is to disable test in graph mode. Still working on replacing tf.debugging with pure if and raise statement for future extension.
@WindQAQ one more is failing now, sorry it was not failing earlier:
[ FAILED ] DenseImageWarpTest.test_size_exception
[ RUN ] DenseImageWarpTest.test_zero_flows
[ OK ] DenseImageWarpTest.test_zero_flows
======================================================================
FAIL: test_size_exception (__main__.DenseImageWarpTest)
test_size_exception (__main__.DenseImageWarpTest)
Make sure it throws an exception for images that are too small.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/absl/third_party/unittest3_backport/case.py", line 37, in testPartExecutor
yield
File "/usr/local/lib/python2.7/dist-packages/absl/third_party/unittest3_backport/case.py", line 162, in run
testMethod()
File "/root/.cache/bazel/_bazel_root/e051f2f195071208ea7f081f88730f4f/execroot/__main__/bazel-out/k8-opt/bin/tensorflow_addons/image/dense_image_warp_test.runfiles/__main__/tensorflow_addons/image/dense_image_warp_test.py", line 223, in test_size_exception
self._check_interpolation_correctness(shape, "float32", "float32")
AssertionError: "Grid width must be at least 2." does not match "indices[0,0] = -1 is not in [0, 2)
[[{{node dense_image_warp/StatefulPartitionedCall/interpolate_bilinear/gather-top_left/GatherV2}}]] [Op:__inference_dense_image_warp_9536]"
----------------------------------------------------------------------
Ran 8 tests in 52.030s
FAILED (failures=1)
================================================================================
@armando-fandango I cannot reproduce this error now. Is this on the CI build or your local machine? Thanks!
@seanpmorgan Hi Sean, I'd suppose it is not easy due to tf.function's state. Seems that the following code snippet will not result in multiple distinct graphs being traced, and thus, ValueError is also raised when shape=[3, 3, 3]. Sorry that I am not familiar with tf.function and not sure if the use case is correct.
import tensorflow as tf
@tf.function
def foo(x):
if tf.shape(x)[0] < 3:
raise ValueError("fail")
return x[2]
for shape in [[1, 1, 1], [2, 2, 2], [3, 3, 3]]:
try:
foo(tf.random.uniform(shape=shape))
except ValueError:
print(shape)
for shape in [[3, 3, 3], [2, 2, 2], [1, 1, 1]]:
try:
foo(tf.random.uniform(shape=shape))
except ValueError:
print(shape)
'''
output:
[1, 1, 1]
[2, 2, 2]
[3, 3, 3]
[3, 3, 3]
[2, 2, 2]
[1, 1, 1]
'''
This is on latest custom-ops container with master branch. I have two macbooks, and it fails only on one of them.
Do we need to add make clean in case something is getting cached ?
It's weird. Seems that the behavior varies from machine to machine. I think another workaround is not to use tf.function on this temporarily, but this might not meet the requirements for addons. Or we can just wait for tf.debugging.assert* working as expected...
Yeah so I believe it'll fail on every machine with enough runs. I'm fine with a removal of tf.function for the time being... but we should look into a removal of tf.debugging as a long term fix
Hmmm tests appear to be passing repeatedly. Possible there has been an upstream fix... such is life building against an alpha release.
Closing, but happy to re-open if we see issues again.
Welp... too quick on calling that (nightly fail). We should either remove tf.debugging or remove the tf.funcion decorator
Most helpful comment
Sure, I'll create a PR in one or two days :-)