Addons: Remove numpy implementation in test case?

Created on 10 Mar 2019 · 14Comments · Source: tensorflow/addons

I find, in most of test cases of Addons / tf.contrib, we are used to write a duplicated numpy function to verify the correction of our implementations. It's straightforward, but the drawbacks are:

both tf and numpy implications could be incorrect;
numpy might have a bug, or will introduce a bug in the future;
difficult to maintain test cases, and might also too complex to understand it.

Hence I propose to remove all numpy implementations, and replace them by explicit numerical value (and add necessary comments to describe how to figure it out, say intermediate result of calculation) in test cases.

# No
def np_add(a, b):
    return a + b

self.assertEqual(np_add(1, 2), tf_add(1, 2)) 

# Yes
# Add some comments if necessary:
# c = a + b = 1 + 2 = 3
self.assertEqual(3, tf_add(1, 2))

cc @seanpmorgan @karmel

discussion needed

Source

facaiy

Most helpful comment

The "correct" answer here may depend on context. In general, we tend towards just putting in numbers to avoid the tests getting cluttered. That said, that only works if the numbers are principled/understood in some way. That is, a classic bad habit is for a dev to run the code once, use the output as the test, and never validate that the produced output is actually the technically correct output. So, I would argue, as long as the derivation of the numbers is obvious enough that they can be re-derived, numbers are fine. If the way to derive the numbers in question is to run the same through numpy, may as well include that code for future devs, rather than having magic numbers in there. That said, all of this is opinion, and either way has advantages/disadvantages.

karmel on 11 Mar 2019

👍2

All 14 comments

By the way, do we need to post those discussion needed issues to mailing list? Perhaps we could gain more attention, and collect more feedback there.

facaiy on 10 Mar 2019

👍2

Some initial thoughts...

If possible I would prefer our unit tests not include another library
TF Keras testing does extensively use it, doesn't mean we have to but worth considering
I do think the likelihood of a bug in such fundamental pieces of numpy is probably low
Without having thoroughly checked all usages of np; I would be in favor of removing usage when there are TF equivalents.

I also think that emailing dicussion needed tags is a great idea. It could probably be made a bot if someone was willing.

seanpmorgan on 10 Mar 2019

I think the chances of hitting such a bug are quite low. (Of course as long as you stay with the „standard“ functions and don’t do something crazy)

Implementing something wrong twice might happen. But it’s still better then implementing it once, since you have to express the algo in two different apis.

Also the amount of Tests you can create without numpy might be quite limited.

Of course the dependencie to numpy it self is bad, but the usefulness of numpy is quite big.

Greets :)

Smokrow on 11 Mar 2019

If an input X is given, then its output Y has been determinate. We should explicitly claim what we want, and what the result should be.

Say, Y = f(X) = X + 1, then Y is always 6 when X is 5. It looks strange that you create another function Z = g(X) = X + 1, then assert f(X) == g(X) when X = 5. Worse still, we have to maintain two functions here: f(X) and g(X).

Comparing keras.MeanSquaredErrorTest with addons.LiftedStructLossTest, I think keras.MeanSquaredErrorTest is more easy to understand, less error-prone.

My point is: we don't need to write a numpy function to calculate the result. Use assertEqual(Y, f(X)), rather than assertEqual(g(X), f(X)).

@Smokrow Hi, Moritz:

Also the amount of Tests you can create without numpy might be quite limited.

Would you mind giving an example to help us understand why? Oh, don't get me wrong, I think we can use numpy library in the test case. But we shouldn't write an duplicated numpy implementation like https://github.com/tensorflow/addons/blob/f29c204ef40d2dd84bf92436de438463fcfb38c5/tensorflow_addons/losses/python/triplet_test.py#L56

facaiy on 11 Mar 2019

Let's make it clear: take triplet_loss as an example, we implement it with tf ops. And numpy implementation meanstriplet_loss_np function we created by ourselves in its test python file.

Here the term "numpy implementation" is used for those functions created by Addons (our loss, layer, optimizer ,etc ), rather than for native tensorflow / numpy ops (like tf.exp and np.exp).

facaiy on 11 Mar 2019

@facaiy
Hey Yan.
I think it is okay to use a function like that as long as you make sure that every subpart of the function is already tested.
For example if you have some transformation followed by a multiplication:
You could write tests for both parts (Multiplication and transformation) and make sure each part works correctly.

Afterwards I combine these parts into one separate numpy function to make it easier to test higher level functions for larger scale computations, edge cases or other weird stuff. Especially randomized initialization can become a problem here. I for example used a hybrid approach at normalizations where I initialised the weights with tf and used numpy to calculate the correct output afterwards.

Smokrow on 11 Mar 2019

But i get your point. It does not make sense to implement everything 2 times since both implementations could be wrong

Smokrow on 11 Mar 2019

karmel on 11 Mar 2019

👍2

@facaiy Does it seem reasonable to replace numpy calls when the expected values are understood (e.g. simple matrix math), but to continue to use numpy when it is a pre-requiste to getting an answer? (e.g. large matrix math or as an input to a model evaluation)

seanpmorgan on 12 Mar 2019

Thanks for the great input, @Smokrow @karmel :-)

@Smokrow Yes, random data/variable is a good example, because its result is non-deterministic, so we have to write python function to calculate the result dynamically. Make sense for me, thank you :-)

@karmel

a classic bad habit is for a dev to run the code once, use the output as the test, and never validate that the produced output is actually the technically correct output.

Similarly, another bad habit here is that, developer writes a tf implementation and a corresponding numpy implementation, and then just check whether its results are the same by feeding random data. I don't think it's fair to impute bad habit to method/tool.

And to avoid magic number, how about adding detailed comments to describe how to figure it out (say intermediate result of calculation) ? That's what I learned from https://github.com/tensorflow/tensorflow/pull/15443/files#diff-e37fe49087784a68c4a0d4417d941382R6730

facaiy on 14 Mar 2019

What I think is not a good habit is, to use a numpy custom implementation with random data to verify the correctness of tf implementation, for example, TripletSemiHardLossTest. Those tests are what I propose to revise.

And I agree with @karmel @seanpmorgan @Smokrow: it depends on context. And the criterion is easy to understand and maintain.

Thank you all for your great feedback.
That's all what I can think of after the discussion:

Always try to create a deterministic unit test if possible:
- random data => replace it by: valid values + boundary values + invalid values.
- random variable =>
  - could we use a constant initial value?
  - set random seed.
For deterministic test case:
- Try to write out the expected (numerical) value explicitly, rather than creating a duplicated numpy implementation to calculate the result.
  - If the value is too magic, try to write some detailed comments to explain how to get it.
  - If mathematical derivation is too complex / boring, and we find it (or key steps) can be replaced by numpy/scipy official (native) function easily. Use it for simplicity!

facaiy on 14 Mar 2019

I don't know if this is overkill, but the test-numpy function could be implemented by a different developer. In that case we would have 2 different opinions of the same paper. The test implementation would be written first and afterwards somebody else implements the paper in Tensorflow. (Maybe even without looking at the test case)

Of course the drawback would be a lot of extra work. (both on organization and coding)

Smokrow on 14 Mar 2019

would be a lot of extra work. (both on organization and coding)

We have to take into account maintainability. If the numpy implementation is so good, I'm afraid it would be better to contribute it to numpy community.

facaiy on 15 Mar 2019

We have to take into account maintainability. If the numpy implementation is so good, I'm afraid it would be better to contribute it to numpy community.

True 😄

Smokrow on 15 Mar 2019

😄1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Merging tfa.callbacks.tqdm_progress_bar with tqdm.keras

shun-lin · 4Comments

euclidean_distance_transform_op.cc build falied on GPU

facaiy · 3Comments

Keras model save using WeightedKappaLoss errors, not json serializable

ConnorBarnhill · 3Comments

Use tf.saved_model with tfa.activations.mish produce error: "Op type not registered 'Addons>Mish' in binary running"

pikaliov · 3Comments

WeightNormalization data init fails with Keras experimental_run_tf_function

seanpmorgan · 4Comments