Deeplearning4j: SameDiff/Libnd4j - multiple c++ op issues

Created on 8 Jan 2019  路  6Comments  路  Source: eclipse/deeplearning4j

I'm going to use this as a summary/burndown list of known problems, in no particular order.
More to come.


Issue 1: broadcast_dynamic_shape
https://github.com/deeplearning4j/deeplearning4j/issues/6322~~

Confirmed fixed: 08/04/2019


Issue 2: zeta
https://github.com/deeplearning4j/deeplearning4j/issues/6182~~


Issue 3: Strided slice empty arrays not shape with 0s
Returning invalid shape - [0] not empty.
https://gist.github.com/AlexDBlack/afb4e65cc2f218452fa550d86419386c~~

Update 09/04/2019: confirmed fixed


Issue 4: space_to_batch / batch_to_space
https://github.com/deeplearning4j/deeplearning4j/issues/6447~~

Note: I'm assuming our space_to_batch is equivalent to tensorflow's space_to_batch_nd (which is what tf's space_to_batch uses internally anyway)
space_to_batch test case - https://gist.github.com/AlexDBlack/df902f07033e909b1f8f18ae86bd18e6~~

Confirmed passing 08/04/2019


Issue 5: Choice Op
Choice seems to always return the last value.
https://gist.github.com/AlexDBlack/601b14c7ae73b08923b93423189cefda
~~See also, RngValidationTests.validateRngDistributions()

Update 08/04/2019: seems to be bad test (mixed float/double arrays)


Issue 6: Mirror Pad
https://gist.github.com/AlexDBlack/ffa70545d4289359fddac19a83dd151a
~~Test cases are identical to: https://www.tensorflow.org/api_docs/python/tf/pad
~~(note that tf.pad with symmetric/reflect modes maps to our mirror pad op)

Edit: rank 1 + reflect case issue - https://gist.github.com/AlexDBlack/fe43ea2c32d08f04eb3f204a17b53202~~


Issue 7: Reverse (rank 1 edge case)
From TF import. Most cases pass fine, with default of isLegacy (iArg[0] == 1) (most fail with iArg[0]=0, so I don't think it's just that, but I'm not exactly sure what the behaviour difference here is).
https://gist.github.com/AlexDBlack/d3cc4b247dc38b756793d8d31bb43c63~~


Issue 8: Resize bilinear op edge case
Blocker for importing 2 deeplab models.
https://gist.github.com/AlexDBlack/102ec696421f8ec2bd3f4bf67f2c7030~~


Issue 9: Dynamic linspace required
Still not implemented: https://github.com/deeplearning4j/deeplearning4j/issues/6723~~


Issue 10: LogMatrixDeterminant should also output sign

The TF docs for the logdet (LogMatrixDeterminant) op don't actually match the schema:
https://github.com/deeplearning4j/deeplearning4j/blob/a2176bed41440203ce7f660d84759064cd59b931/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/resources/ops.proto#L15632-L15658

Specifically, the LogMatrixDeterminant op should output both the sign and the absolute value; in the case of TF logdet op both exist in the graph (i.e., both are calculated) but only the value variable is returned to the user in the public API (sign is calculated but ignored from perspective of user when creating graph).
https://www.tensorflow.org/api_docs/python/tf/linalg/logdet


Issue 11: BinCount op should support min/max as dynamic array args
Mainly for TF import.
Currently, optional min/maxLength args can only be specified as integer args; for TF import they need to be settable via (optional) array args.
https://github.com/deeplearning4j/deeplearning4j/blob/master/libnd4j/include/ops/declarable/generic/parity_ops/bincount.cpp#L49-L53
https://www.tensorflow.org/api_docs/python/tf/math/bincount~~

Update 08/04/2019: Looks like this was added at some point.


Issue 12: Gather shape can be wrong (rank 1 not rank 2)
https://gist.github.com/AlexDBlack/0e4b18d2459de122697ec412faacb9a7~~


Issue 13: Nth element
https://gist.github.com/AlexDBlack/2885f71ce6dca1c578f63667a10f0617~~

Update 08/04/2019: confirmed fixed


Issue 14: strided_slice validation issue
https://www.tensorflow.org/api_docs/python/tf/strided_slice~~

If the ith bit of shrink_axis_mask is set, it implies that the ith specification shrinks the dimensionality by 1, taking on the value at index begin[i]. end[i] and strides[i] are ignored in this case.

In our implementation, end[i] and strides[i] validation is still occurring when shrink_axis[i] is 1
https://gist.github.com/AlexDBlack/1a94e55c4bc5ac1482a9d3b001722869~~

Update 09/04/2019: confirmed fixed


Issue 15: Where op
https://www.tensorflow.org/api_docs/python/tf/where
Another case where TF behaviour doesn't match docs.
For Where op, if only the "condition" arg is provided, the condition tensor can be non-boolean. If it is non-boolean, it returns the coordinates of the non-zero elements.
https://gist.github.com/AlexDBlack/ca38538082804ff3085a8baff4ae2e38~~

Update 08/04/2019: confirmed fixed


Issue 16: LRN differences
2 failing test cases, "lrn/dr3_b05_a05_b02" and "lrn/dr5_b1_a1_b05"
Reproducible with these TF import tests in ND4J, but this needs to be isolated further if possible...


Issue 17: Loss function failures
Abs_diff weighted sum: https://gist.github.com/AlexDBlack/bd7efe5a5a150ca1359834947a12df55
~~Seems correct for weighted mean (iarg(0)=2), but also fails for "weighted_sum_by_nonzero_weight" (iarg(0)=3)

Also huber, similarly returning 0.0: https://gist.github.com/AlexDBlack/23832ab6af03071e9f861d20747d7ec1~~

Also sigmoid cross entropy returning 0.0... https://gist.github.com/AlexDBlack/c4701bc132ef8cbc0b4f57889b2fe11d~~

https://github.com/deeplearning4j/deeplearning4j/pull/7031


Issue 18: extract_image_patches does not match TF
If I had to guess, I think this is just a difference in order in which we're stacking along the depth dimension.

Now passing, 27/02/2019: https://gist.github.com/AlexDBlack/0e7370c10f27c4c38f1f32260279995b~~

20/02/2019: Looks like we still have some more work to do here:

Confirmed passing - 08/04/2019


Issue 19: Unsorted Segment Prod Backprop

https://github.com/deeplearning4j/deeplearning4j/issues/6952~~


Issue 20: Local Response Normalization gradient check failure
Updated 27/02/2019: still failing after recent PR/changes, but looks closer - https://gist.github.com/AlexDBlack/51b1b9099c93d0c43fdffef929b5dd4c

nd4j/nd4j-backends/nd4j-tests/src/test/java/org/nd4j/autodiff/opvalidation/LayerOpValidation.java#L278-L280
https://gist.github.com/AlexDBlack/241d74002c68dc9dcffa339267272bf0

Edit: also fails to match DL4J LRN implementation (difference is quite big).


Issue 20: Tensor Mmul - Wrong Shape
https://gist.github.com/AlexDBlack/cc88d18e6f1562e54363b8893961cc36
See also: https://github.com/deeplearning4j/deeplearning4j/issues/7262~~

Aha! Link: https://skymindai.aha.io/features/ND4J-14

Bug

All 6 comments

zeta fix just merged

Hi, Alex

issue 6 (mirror_pad) - not confirmed on c++ side. Looks like there is typo in java test
INDArray x = Nd4j.createFromArray(new double[][]{{1,2,3},{4.5,6}});
should be 4,5 not 4.5
raver going to fix this test and add corresponding exception on java side

issue 7: (reverse): has been fixed. Also isLegacy parameter has been removed since it is not actual any more. Now op acts in a manner which assumes isLegacy is always 1.

@shyrma Thanks - you are right about the typo - sorry about that. I've added checks for ragged arrays to ND4J, so we shouldn't see that in the future.
I've provided a new mirror pad (reflect, rank 1) test case (taken from TF import) that actually isolates the issue I was previously seeing.

Reverse is confirmed fixed.

Hi, Alex

Issue 6: Mirror Pad: yes indeed there was bug when we have deal with 1 rank vector in case of reflect mode, now it is fixed, thanks.

Hi, Alex

Issue 12: shape problem in gather op has been fixed and updated version is already merged

Closing as mostly resolved and otherwise outdated

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hanbaoan123 picture hanbaoan123  路  4Comments

AlexanderSerbul picture AlexanderSerbul  路  3Comments

tintinxue1 picture tintinxue1  路  5Comments

sshepel picture sshepel  路  4Comments

AlexDBlack picture AlexDBlack  路  5Comments