Deeplearning4j: SameDiff/Libnd4j - multiple c++ op issues

Created on 8 Jan 2019 · 6Comments · Source: eclipse/deeplearning4j

I'm going to use this as a summary/burndown list of known problems, in no particular order.
More to come.

~~Issue 1: broadcast_dynamic_shape~~
~~https://github.com/deeplearning4j/deeplearning4j/issues/6322~~~~

Confirmed fixed: 08/04/2019

~~Issue 2: zeta~~
~~https://github.com/deeplearning4j/deeplearning4j/issues/6182~~~~

Issue 3: Strided slice empty arrays not shape with 0s
Returning invalid shape - [0] not empty.
https://gist.github.com/AlexDBlack/afb4e65cc2f218452fa550d86419386c~~

Update 09/04/2019: confirmed fixed

Issue 4: space_to_batch / batch_to_space
https://github.com/deeplearning4j/deeplearning4j/issues/6447~~

Note: I'm assuming our space_to_batch is equivalent to tensorflow's space_to_batch_nd (which is what tf's space_to_batch uses internally anyway)
space_to_batch test case - https://gist.github.com/AlexDBlack/df902f07033e909b1f8f18ae86bd18e6~~

Confirmed passing 08/04/2019

~~Issue 5: Choice Op~~
~~Choice seems to always return the last value.~~
https://gist.github.com/AlexDBlack/601b14c7ae73b08923b93423189cefda
~~~~See also, RngValidationTests.validateRngDistributions()~~

Update 08/04/2019: seems to be bad test (mixed float/double arrays)

~~Issue 6: Mirror Pad~~
https://gist.github.com/AlexDBlack/ffa70545d4289359fddac19a83dd151a
~~Test cases are identical to: https://www.tensorflow.org/api_docs/python/tf/pad
~~~~(note that tf.pad with symmetric/reflect modes maps to our mirror pad op)~~

~~Edit: rank 1 + reflect case issue - https://gist.github.com/AlexDBlack/fe43ea2c32d08f04eb3f204a17b53202~~~~

~~Issue 7: Reverse (rank 1 edge case)~~
From TF import. Most cases pass fine, with default of isLegacy (iArg[0] == 1) (most fail with iArg[0]=0, so I don't think it's just that, but I'm not exactly sure what the behaviour difference here is).
~~https://gist.github.com/AlexDBlack/d3cc4b247dc38b756793d8d31bb43c63~~~~

~~Issue 8: Resize bilinear op edge case~~
~~Blocker for importing 2 deeplab models.~~
~~https://gist.github.com/AlexDBlack/102ec696421f8ec2bd3f4bf67f2c7030~~~~

~~Issue 9: Dynamic linspace required~~
~~Still not implemented: https://github.com/deeplearning4j/deeplearning4j/issues/6723~~~~

Issue 10: LogMatrixDeterminant should also output sign

The TF docs for the logdet (LogMatrixDeterminant) op don't actually match the schema:
https://github.com/deeplearning4j/deeplearning4j/blob/a2176bed41440203ce7f660d84759064cd59b931/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/resources/ops.proto#L15632-L15658

Specifically, the LogMatrixDeterminant op should output both the sign and the absolute value; in the case of TF logdet op both exist in the graph (i.e., both are calculated) but only the value variable is returned to the user in the public API (sign is calculated but ignored from perspective of user when creating graph).
https://www.tensorflow.org/api_docs/python/tf/linalg/logdet

~~Issue 11: BinCount op should support min/max as dynamic array args~~
Mainly for TF import.
Currently, optional min/maxLength args can only be specified as integer args; for TF import they need to be settable via (optional) array args.
https://github.com/deeplearning4j/deeplearning4j/blob/master/libnd4j/include/ops/declarable/generic/parity_ops/bincount.cpp#L49-L53
https://www.tensorflow.org/api_docs/python/tf/math/bincount~~

Update 08/04/2019: Looks like this was added at some point.

~~Issue 12: Gather shape can be wrong (rank 1 not rank 2)~~
~~https://gist.github.com/AlexDBlack/0e4b18d2459de122697ec412faacb9a7~~~~

~~Issue 13: Nth element~~
~~https://gist.github.com/AlexDBlack/2885f71ce6dca1c578f63667a10f0617~~~~

Update 08/04/2019: confirmed fixed

Issue 14: strided_slice validation issue
https://www.tensorflow.org/api_docs/python/tf/strided_slice~~

If the ith bit of shrink_axis_mask is set, it implies that the ith specification shrinks the dimensionality by 1, taking on the value at index begin[i]. end[i] and strides[i] are ignored in this case.

In our implementation, end[i] and strides[i] validation is still occurring when shrink_axis[i] is 1
https://gist.github.com/AlexDBlack/1a94e55c4bc5ac1482a9d3b001722869~~

Update 09/04/2019: confirmed fixed

~~Issue 15: Where op~~
https://www.tensorflow.org/api_docs/python/tf/where
Another case where TF behaviour doesn't match docs.
For Where op, if only the "condition" arg is provided, the condition tensor can be non-boolean. If it is non-boolean, it returns the coordinates of the non-zero elements.
https://gist.github.com/AlexDBlack/ca38538082804ff3085a8baff4ae2e38~~

Update 08/04/2019: confirmed fixed

~~Issue 16: LRN differences~~
~~2 failing test cases, "lrn/dr3_b05_a05_b02" and "lrn/dr5_b1_a1_b05"~~
~~Reproducible with these TF import tests in ND4J, but this needs to be isolated further if possible...~~

~~Issue 17: Loss function failures~~
Abs_diff weighted sum: https://gist.github.com/AlexDBlack/bd7efe5a5a150ca1359834947a12df55
~~~~Seems correct for weighted mean (iarg(0)=2), but also fails for "weighted_sum_by_nonzero_weight" (iarg(0)=3)~~

~~Also huber, similarly returning 0.0: https://gist.github.com/AlexDBlack/23832ab6af03071e9f861d20747d7ec1~~~~

~~Also sigmoid cross entropy returning 0.0... https://gist.github.com/AlexDBlack/c4701bc132ef8cbc0b4f57889b2fe11d~~~~

https://github.com/deeplearning4j/deeplearning4j/pull/7031

Issue 18: extract_image_patches does not match TF
If I had to guess, I think this is just a difference in order in which we're stacking along the depth dimension.
Now passing, 27/02/2019: ~~https://gist.github.com/AlexDBlack/0e7370c10f27c4c38f1f32260279995b~~~~

~~20/02/2019: Looks like we still have some more work to do here:~~

Confirmed passing - 08/04/2019

~~Issue 19: Unsorted Segment Prod Backprop~~

~~https://github.com/deeplearning4j/deeplearning4j/issues/6952~~~~

Issue 20: Local Response Normalization gradient check failure
Updated 27/02/2019: still failing after recent PR/changes, but looks closer - https://gist.github.com/AlexDBlack/51b1b9099c93d0c43fdffef929b5dd4c

nd4j/nd4j-backends/nd4j-tests/src/test/java/org/nd4j/autodiff/opvalidation/LayerOpValidation.java#L278-L280
https://gist.github.com/AlexDBlack/241d74002c68dc9dcffa339267272bf0

Edit: also fails to match DL4J LRN implementation (difference is quite big).

~~Issue 20: Tensor Mmul - Wrong Shape~~
https://gist.github.com/AlexDBlack/cc88d18e6f1562e54363b8893961cc36
See also: https://github.com/deeplearning4j/deeplearning4j/issues/7262~~

~~Aha! Link: https://skymindai.aha.io/features/ND4J-14~~

Bug
Source

~~AlexDBlack~~

All 6 comments

zeta fix just merged

raver119 on 8 Jan 2019

👍1

Hi, Alex

issue 6 (mirror_pad) - not confirmed on c++ side. Looks like there is typo in java test
INDArray x = Nd4j.createFromArray(new double[][]{{1,2,3},{4.5,6}});
should be 4,5 not 4.5
raver going to fix this test and add corresponding exception on java side

issue 7: (reverse): has been fixed. Also isLegacy parameter has been removed since it is not actual any more. Now op acts in a manner which assumes isLegacy is always 1.

shyrma on 9 Jan 2019

@shyrma Thanks - you are right about the typo - sorry about that. I've added checks for ragged arrays to ND4J, so we shouldn't see that in the future.
I've provided a new mirror pad (reflect, rank 1) test case (taken from TF import) that actually isolates the issue I was previously seeing.

Reverse is confirmed fixed.

AlexDBlack on 10 Jan 2019

Hi, Alex

Issue 6: Mirror Pad: yes indeed there was bug when we have deal with 1 rank vector in case of reflect mode, now it is fixed, thanks.

shyrma on 10 Jan 2019

👍1

Hi, Alex

Issue 12: shape problem in gather op has been fixed and updated version is already merged

shyrma on 17 Jan 2019

👍1

Closing as mostly resolved and otherwise outdated

AlexDBlack on 8 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Type inconsistency

hanbaoan123 · 4Comments

XOR simple nn model fits on 0.5.0 and does not fit on 0.6.0

AlexanderSerbul · 3Comments

DL4J KerasSequentialImport problem

tintinxue1 · 5Comments

Wrong path to include/ops/ops.h in libnd4j tests

sshepel · 4Comments

ND4J: CUDA + Spark dependency conflict

AlexDBlack · 5Comments