Deeplearning4j: op lstmLayer anomaly detected

Created on 6 Apr 2020 · 4Comments · Source: eclipse/deeplearning4j

Working with LSTMLayer testcases was detected following problem:
setting of returning last cell output as false (retC = false) raise exception on cpp level
original gist (testcase1, 2 and 3 where anomaly appears) here
Also reprodusing bug using DynamicCustopOp directly (to avoid Java configs bug) shows the same behaviour

INDArray x = Nd4j.rand(DataType.FLOAT, sL, bS, nIn);
        INDArray Wx = Nd4j.rand(DataType.FLOAT, nIn, 4 * numUnits);
        INDArray Wr =  Nd4j.rand(DataType.FLOAT, numUnits, 4 * numUnits);
        INDArray icLast =Nd4j.zeros(DataType.FLOAT, bS, numUnits);
        INDArray iyLast = Nd4j.zeros(DataType.FLOAT, bS, numUnits);


        INDArray output = Nd4j.rand(DataType.FLOAT, sL, bS, numUnits);
        INDArray retC = Nd4j.rand(DataType.FLOAT, bS, numUnits);;

        DynamicCustomOp op = DynamicCustomOp.builder("lstmLayer")
                .addInputs(x, Wx, Wr,iyLast,icLast)
                .addBooleanArguments(false,false,true,true,false,true,false,false)
                .addIntegerArguments(0,0,2,0,0)
                .addFloatingPointArguments(0.0)
                .addOutputs(output)
                .build();

        Nd4j.exec(op);
        System.out.println(output);

screencasts with anomaly behaviour

using LSTMLayerOutputs (Changing of retC true-false change behaviour for this concrete configuration)
Directly DynamicCustopOp

Version Information

Deeplearning4j: version latest master
xubuntu 18.04
cpu

also reproduced on kraken (only cpu usage)

Bug LIBND4J

Source

atuzhykov

Most helpful comment

Looks like we have one more nuance in MKLDNN lstm: bias presence is obligatory for this op.
When I don't pass biases into arguments list it throws exception "dnnl_unimplemented" here:
https://github.com/oneapi-src/oneDNN/blob/0f79bfc704e7add3673f0cfa3d4aacae516da393/include/dnnl.hpp#L3498
to work around this I added code passing biases in any way, and they are zeros when user doesn't provide them to us

shyrma on 6 May 2020

👍2

All 4 comments

Hi
should be fixed now

shyrma on 13 Apr 2020

Confirmed passing on CPU without MKLDNN.

Running with MKLDNN gives this:

o.n.BaseND4JTest - LayerOpValidation.LSTMLayerTestCase2[0: backend(org.nd4j.linalg.cpu.nativecpu.CpuBackend)={1}]
o.n.l.c.n.o.NativeOpExecutioner - Failed to execute op lstmLayer. Attempted to execute with 5 inputs, 1 outputs, 1 targs,8 bargs and 5 iargs. Inputs: [(FLOAT,[10,5,3],c), (FLOAT,[3,28],c), (FLOAT,[7,28],c), (FLOAT,[5,7],c), (FLOAT,[5,7],c)]. Outputs: [(FLOAT,[10,5,7],c)]. tArgs: [0.0]. iArgs: [0, 0, 2, 0, 0]. bArgs: [false, false, true, true, false, true, false, false]. Input var names: [in, weights, rWeights, cLast, yLast]. Output var names: [lstmLayer] - Please see above message (printed out from c++) for a possible cause of error.
o.n.BaseND4JTest - LayerOpValidation.LSTMLayerTestCase2[0: backend(org.nd4j.linalg.cpu.nativecpu.CpuBackend)={1}]: 485 ms, threadCount: (9->11), jvmTotal=92274688, jvmMax=8589934592, totalBytes=952, maxBytes=8589934592, currPhys=136601600, maxPhys=17179869184

java.lang.RuntimeException: could not create a primitive descriptor iterator

    at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:2058)
    at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6575)

Reproducible with these tests:
(i.e., tests pass if I add Nd4j.getEnvironment().allowHelpers(false); to the start of the test method)
https://github.com/KonduitAI/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-tests/src/test/java/org/nd4j/autodiff/opvalidation/LayerOpValidation.java#L1533-L1635