Deeplearning4j: ND4J: cudaEventSynchronize(...) failed

Created on 2 Nov 2018  路  11Comments  路  Source: eclipse/deeplearning4j

Hello,

Lately I've been struggling with the following error:

Exception in thread "UniGC thread 4" Exception in thread "UniGC thread 2" Exception in thread "UniGC thread 3" Exception in thread "UniGC thread 5" Exception in thread "UniGC thread 1" java.lang.RuntimeException: cudaEventSynchronize(...) failed
    at org.nd4j.nativeblas.Nd4jCuda$NativeOps.eventSynchronize(Native Method)
    at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:69)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:132)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:229)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78)
    at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:716)
java.lang.RuntimeException: cudaEventSynchronize(...) failed
    at org.nd4j.nativeblas.Nd4jCuda$NativeOps.eventSynchronize(Native Method)
    at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:69)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:132)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:229)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78)
    at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:716)
java.lang.RuntimeException: cudaEventSynchronize(...) failed
    at org.nd4j.nativeblas.Nd4jCuda$NativeOps.eventSynchronize(Native Method)
    at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:69)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:132)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:229)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78)
    at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:716)
java.lang.RuntimeException: cudaEventSynchronize(...) failed
    at org.nd4j.nativeblas.Nd4jCuda$NativeOps.eventSynchronize(Native Method)
    at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:69)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:132)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:229)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78)
    at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:716)
java.lang.RuntimeException: cudaEventSynchronize(...) failed
    at org.nd4j.nativeblas.Nd4jCuda$NativeOps.eventSynchronize(Native Method)
    at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:69)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:132)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:229)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78)
    at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:716)
Exception in thread "UniGC thread 0" java.lang.RuntimeException: cudaEventSynchronize(...) failed
    at org.nd4j.nativeblas.Nd4jCuda$NativeOps.eventSynchronize(Native Method)
    at org.nd4j.jita.allocator.pointers.cuda.cudaEvent_t.synchronize(cudaEvent_t.java:69)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillFinished(SynchronousFlowController.java:132)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillFinished(GridFlowController.java:63)
    at org.nd4j.jita.flow.impl.SynchronousFlowController.waitTillReleased(SynchronousFlowController.java:229)
    at org.nd4j.jita.flow.impl.GridFlowController.waitTillReleased(GridFlowController.java:78)
    at org.nd4j.jita.allocator.impl.AtomicAllocator$UnifiedGarbageCollectorThread.run(AtomicAllocator.java:716)

I haven't found a pattern to determine why my neural network stops working during training, as it has been runnin within an optimization problem. The dataset is not big at all (a file of 400 KB). In fact I've been using the same neural network to train larger files without a lot of problems (sometimes the same error used to appear with the larger files but it was a sparse error, with the short dataset it is much more frequent to the fact I can't finish the optimization). And the neural network is quite simple:

Branch 1: Input 1-> Sequential Embeddings + GlobalPooling -> Out1
Branch 2: Input 2
Common branch: Merger(Out1 + Input2) + Dense Layer + Output Layer

As you can see, I'm using a ComputationGraph.

At the beginning I though I was running out of memory, but I'm not sure that it is the case as the data set is smaller (although the richness of the input might be greater).

Is there any way to determine if the causes are due to lack of memory?

Some information that you might need:

  • Deeplearning4j-core 1.0.0-beta2
  • Deeplearning4j-nlp 1.0.0-beta2
  • Nd4j-cuda-9.2 1.0.0-beta2
  • Cuda compilation tools release 9.2 (V9.2.148)
  • Quadro M4000 (dedicated video memory 8192 MB, total available graphics memory 24536 MB)

Aha! Link: https://skymindai.aha.io/features/ND4J-149

Bug ND4J

Most helpful comment

New concat implementation was merged, issue should be resolved now.

All 11 comments

this exception means CUDA kernel crashed.
Post gist of console output please, and neural network you have there. Pom.xml as well.

Have the same issue with Dl4j beta3, Win 10 and RTX 2080ti, Cuda 10.0.

@sascha08-15 can you send me some code that reproduces your problem?

Could recreate a similar (same) issue under Linux (Ubuntu).
Invoking (many times) concat seems to be related (see https://github.com/deeplearning4j/deeplearning4j/issues/6479 incl. unit test to recreate the bug). Trying to isolate the problem mentioned here next.

Hm.

Are you 100% sure your issue caused by repeated concat calls?

Could recreate a similar (same) issue under Linux (Ubuntu).
Invoking (many times) concat seems to be related (see https://github.com/deeplearning4j/deeplearning4j/issues/6479 incl. unit test to recreate the bug).

`void testBug6663(){

    Nd4j.setDataType(DataBuffer.Type.DOUBLE);

    INDArray arr = Nd4j.rand(14000,1);
    INDArray row = Nd4j.ones(1);
    INDArray newArr = arr;

    for(int i=0;i<20_000;i++){
        newArr = Nd4j.vstack(newArr, row);
    }

}`

Recreates the situation

Thanks. We're testing new concat impl right now.

New concat implementation was merged, issue should be resolved now.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings