I'm going to use this as a summary/burndown list of known problems, in no particular order.
More to come.
Issue 1: broadcast_dynamic_shape
https://github.com/deeplearning4j/deeplearning4j/issues/6322~~
Confirmed fixed: 08/04/2019
Issue 2: zeta
https://github.com/deeplearning4j/deeplearning4j/issues/6182~~
Issue 3: Strided slice empty arrays not shape with 0s
Returning invalid shape - [0] not empty.
https://gist.github.com/AlexDBlack/afb4e65cc2f218452fa550d86419386c~~
Update 09/04/2019: confirmed fixed
Issue 4: space_to_batch / batch_to_space
https://github.com/deeplearning4j/deeplearning4j/issues/6447~~
Note: I'm assuming our space_to_batch is equivalent to tensorflow's space_to_batch_nd (which is what tf's space_to_batch uses internally anyway)
space_to_batch test case - https://gist.github.com/AlexDBlack/df902f07033e909b1f8f18ae86bd18e6~~
Confirmed passing 08/04/2019
Issue 5: Choice Op
Choice seems to always return the last value.
https://gist.github.com/AlexDBlack/601b14c7ae73b08923b93423189cefda
~~See also, RngValidationTests.validateRngDistributions()
Update 08/04/2019: seems to be bad test (mixed float/double arrays)
Issue 6: Mirror Pad
https://gist.github.com/AlexDBlack/ffa70545d4289359fddac19a83dd151a
~~Test cases are identical to: https://www.tensorflow.org/api_docs/python/tf/pad
~~(note that tf.pad with symmetric/reflect modes maps to our mirror pad op)
Edit: rank 1 + reflect case issue - https://gist.github.com/AlexDBlack/fe43ea2c32d08f04eb3f204a17b53202~~
Issue 7: Reverse (rank 1 edge case)
From TF import. Most cases pass fine, with default of isLegacy (iArg[0] == 1) (most fail with iArg[0]=0, so I don't think it's just that, but I'm not exactly sure what the behaviour difference here is).
https://gist.github.com/AlexDBlack/d3cc4b247dc38b756793d8d31bb43c63~~
Issue 8: Resize bilinear op edge case
Blocker for importing 2 deeplab models.
https://gist.github.com/AlexDBlack/102ec696421f8ec2bd3f4bf67f2c7030~~
Issue 9: Dynamic linspace required
Still not implemented: https://github.com/deeplearning4j/deeplearning4j/issues/6723~~
Issue 10: LogMatrixDeterminant should also output sign
The TF docs for the logdet (LogMatrixDeterminant) op don't actually match the schema:
https://github.com/deeplearning4j/deeplearning4j/blob/a2176bed41440203ce7f660d84759064cd59b931/nd4j/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/resources/ops.proto#L15632-L15658
Specifically, the LogMatrixDeterminant op should output both the sign and the absolute value; in the case of TF logdet op both exist in the graph (i.e., both are calculated) but only the value variable is returned to the user in the public API (sign is calculated but ignored from perspective of user when creating graph).
https://www.tensorflow.org/api_docs/python/tf/linalg/logdet
Issue 11: BinCount op should support min/max as dynamic array args
Mainly for TF import.
Currently, optional min/maxLength args can only be specified as integer args; for TF import they need to be settable via (optional) array args.
https://github.com/deeplearning4j/deeplearning4j/blob/master/libnd4j/include/ops/declarable/generic/parity_ops/bincount.cpp#L49-L53
https://www.tensorflow.org/api_docs/python/tf/math/bincount~~
Update 08/04/2019: Looks like this was added at some point.
Issue 12: Gather shape can be wrong (rank 1 not rank 2)
https://gist.github.com/AlexDBlack/0e4b18d2459de122697ec412faacb9a7~~
Issue 13: Nth element
https://gist.github.com/AlexDBlack/2885f71ce6dca1c578f63667a10f0617~~
Update 08/04/2019: confirmed fixed
Issue 14: strided_slice validation issue
https://www.tensorflow.org/api_docs/python/tf/strided_slice~~
If the ith bit of shrink_axis_mask is set, it implies that the ith specification shrinks the dimensionality by 1, taking on the value at index begin[i]. end[i] and strides[i] are ignored in this case.
In our implementation, end[i] and strides[i] validation is still occurring when shrink_axis[i] is 1
https://gist.github.com/AlexDBlack/1a94e55c4bc5ac1482a9d3b001722869~~
Update 09/04/2019: confirmed fixed
Issue 15: Where op
https://www.tensorflow.org/api_docs/python/tf/where
Another case where TF behaviour doesn't match docs.
For Where op, if only the "condition" arg is provided, the condition tensor can be non-boolean. If it is non-boolean, it returns the coordinates of the non-zero elements.
https://gist.github.com/AlexDBlack/ca38538082804ff3085a8baff4ae2e38~~
Update 08/04/2019: confirmed fixed
Issue 16: LRN differences
2 failing test cases, "lrn/dr3_b05_a05_b02" and "lrn/dr5_b1_a1_b05"
Reproducible with these TF import tests in ND4J, but this needs to be isolated further if possible...
Issue 17: Loss function failures
Abs_diff weighted sum: https://gist.github.com/AlexDBlack/bd7efe5a5a150ca1359834947a12df55
~~Seems correct for weighted mean (iarg(0)=2), but also fails for "weighted_sum_by_nonzero_weight" (iarg(0)=3)
Also huber, similarly returning 0.0: https://gist.github.com/AlexDBlack/23832ab6af03071e9f861d20747d7ec1~~
Also sigmoid cross entropy returning 0.0... https://gist.github.com/AlexDBlack/c4701bc132ef8cbc0b4f57889b2fe11d~~
https://github.com/deeplearning4j/deeplearning4j/pull/7031
Issue 18: extract_image_patches does not match TF
If I had to guess, I think this is just a difference in order in which we're stacking along the depth dimension.
Now passing, 27/02/2019: https://gist.github.com/AlexDBlack/0e7370c10f27c4c38f1f32260279995b~~
20/02/2019: Looks like we still have some more work to do here:
Confirmed passing - 08/04/2019
Issue 19: Unsorted Segment Prod Backprop
https://github.com/deeplearning4j/deeplearning4j/issues/6952~~
Issue 20: Local Response Normalization gradient check failure
Updated 27/02/2019: still failing after recent PR/changes, but looks closer - https://gist.github.com/AlexDBlack/51b1b9099c93d0c43fdffef929b5dd4c
nd4j/nd4j-backends/nd4j-tests/src/test/java/org/nd4j/autodiff/opvalidation/LayerOpValidation.java#L278-L280
https://gist.github.com/AlexDBlack/241d74002c68dc9dcffa339267272bf0
Edit: also fails to match DL4J LRN implementation (difference is quite big).
Issue 20: Tensor Mmul - Wrong Shape
https://gist.github.com/AlexDBlack/cc88d18e6f1562e54363b8893961cc36
See also: https://github.com/deeplearning4j/deeplearning4j/issues/7262~~
Aha! Link: https://skymindai.aha.io/features/ND4J-14
zeta fix just merged
Hi, Alex
issue 6 (mirror_pad) - not confirmed on c++ side. Looks like there is typo in java test
INDArray x = Nd4j.createFromArray(new double[][]{{1,2,3},{4.5,6}});
should be 4,5 not 4.5
raver going to fix this test and add corresponding exception on java side
issue 7: (reverse): has been fixed. Also isLegacy parameter has been removed since it is not actual any more. Now op acts in a manner which assumes isLegacy is always 1.
@shyrma Thanks - you are right about the typo - sorry about that. I've added checks for ragged arrays to ND4J, so we shouldn't see that in the future.
I've provided a new mirror pad (reflect, rank 1) test case (taken from TF import) that actually isolates the issue I was previously seeing.
Reverse is confirmed fixed.
Hi, Alex
Issue 6: Mirror Pad: yes indeed there was bug when we have deal with 1 rank vector in case of reflect mode, now it is fixed, thanks.
Hi, Alex
Issue 12: shape problem in gather op has been fixed and updated version is already merged
Closing as mostly resolved and otherwise outdated