Working with LSTMLayer testcases was detected following problem:
setting of returning last cell output as false (retC = false) raise exception on cpp level
original gist (testcase1, 2 and 3 where anomaly appears) here
Also reprodusing bug using DynamicCustopOp directly (to avoid Java configs bug) shows the same behaviour
INDArray x = Nd4j.rand(DataType.FLOAT, sL, bS, nIn);
INDArray Wx = Nd4j.rand(DataType.FLOAT, nIn, 4 * numUnits);
INDArray Wr = Nd4j.rand(DataType.FLOAT, numUnits, 4 * numUnits);
INDArray icLast =Nd4j.zeros(DataType.FLOAT, bS, numUnits);
INDArray iyLast = Nd4j.zeros(DataType.FLOAT, bS, numUnits);
INDArray output = Nd4j.rand(DataType.FLOAT, sL, bS, numUnits);
INDArray retC = Nd4j.rand(DataType.FLOAT, bS, numUnits);;
DynamicCustomOp op = DynamicCustomOp.builder("lstmLayer")
.addInputs(x, Wx, Wr,iyLast,icLast)
.addBooleanArguments(false,false,true,true,false,true,false,false)
.addIntegerArguments(0,0,2,0,0)
.addFloatingPointArguments(0.0)
.addOutputs(output)
.build();
Nd4j.exec(op);
System.out.println(output);
screencasts with anomaly behaviour
also reproduced on kraken (only cpu usage)
Hi
should be fixed now
Confirmed passing on CPU without MKLDNN.
Running with MKLDNN gives this:
o.n.BaseND4JTest - LayerOpValidation.LSTMLayerTestCase2[0: backend(org.nd4j.linalg.cpu.nativecpu.CpuBackend)={1}]
o.n.l.c.n.o.NativeOpExecutioner - Failed to execute op lstmLayer. Attempted to execute with 5 inputs, 1 outputs, 1 targs,8 bargs and 5 iargs. Inputs: [(FLOAT,[10,5,3],c), (FLOAT,[3,28],c), (FLOAT,[7,28],c), (FLOAT,[5,7],c), (FLOAT,[5,7],c)]. Outputs: [(FLOAT,[10,5,7],c)]. tArgs: [0.0]. iArgs: [0, 0, 2, 0, 0]. bArgs: [false, false, true, true, false, true, false, false]. Input var names: [in, weights, rWeights, cLast, yLast]. Output var names: [lstmLayer] - Please see above message (printed out from c++) for a possible cause of error.
o.n.BaseND4JTest - LayerOpValidation.LSTMLayerTestCase2[0: backend(org.nd4j.linalg.cpu.nativecpu.CpuBackend)={1}]: 485 ms, threadCount: (9->11), jvmTotal=92274688, jvmMax=8589934592, totalBytes=952, maxBytes=8589934592, currPhys=136601600, maxPhys=17179869184
java.lang.RuntimeException: could not create a primitive descriptor iterator
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:2058)
at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6575)
Reproducible with these tests:
(i.e., tests pass if I add Nd4j.getEnvironment().allowHelpers(false); to the start of the test method)
https://github.com/KonduitAI/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-tests/src/test/java/org/nd4j/autodiff/opvalidation/LayerOpValidation.java#L1533-L1635
Looks like we have one more nuance in MKLDNN lstm: bias presence is obligatory for this op.
When I don't pass biases into arguments list it throws exception "dnnl_unimplemented" here:
https://github.com/oneapi-src/oneDNN/blob/0f79bfc704e7add3673f0cfa3d4aacae516da393/include/dnnl.hpp#L3498
to work around this I added code passing biases in any way, and they are zeros when user doesn't provide them to us
corresponding changes are in master already
Most helpful comment
Looks like we have one more nuance in MKLDNN lstm: bias presence is obligatory for this op.
When I don't pass biases into arguments list it throws exception "dnnl_unimplemented" here:
https://github.com/oneapi-src/oneDNN/blob/0f79bfc704e7add3673f0cfa3d4aacae516da393/include/dnnl.hpp#L3498
to work around this I added code passing biases in any way, and they are zeros when user doesn't provide them to us