Deeplearning4j: ND4j: Memory leak if calling INDArray.getRow()

Created on 7 Feb 2019  路  13Comments  路  Source: eclipse/deeplearning4j

Issue Description

Hi,
I'm working on the current snapshot build of DL4J/ND4J. If I call INDArray.getRow() this creates LongBuffer, AtomicBoolean and ArrayList instances which the garbage collector can't discard. This behavior leads finally to an OutOfMemoryException in the heap space. I encounter this problem if running a CpuBackend.

This error is not happening in Beta3. I have not tested if it is present on CUDA.

You can find minimal example code for reproduction here: https://gist.github.com/SchmaR/5291bf21d812ac020307251eb4cdcf97

While I was investigating the problem, I noticed that it might be linked to a reference in DirectShapeInfoProvider.longCache, which seems to hold the last existing reference to the LongBuffer objects. Attached you find screenshots showing the reference chain as well as the stack trace of the initialitation of the LongBuffer objects.

Backend Configuration Log
16:29:34.633 [main] INFO  org.nd4j.linalg.factory.Nd4jBackend - Loaded [CpuBackend] backend
16:29:35.346 [main] INFO  org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for NativeOps: 4
16:29:35.608 [main] INFO  org.nd4j.nativeblas.Nd4jBlas - Number of threads used for BLAS: 4
16:29:35.616 [main] INFO  o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CPU]; OS: [Linux]
16:29:35.617 [main] INFO  o.n.l.a.o.e.DefaultOpExecutioner - Cores: [8]; Memory: [1.8GB];
16:29:35.617 [main] INFO  o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [MKL]

basendbufferstacktrace

longbufferrefferingobjects

longbufferprofiler

Version Information

Please indicate relevant versions, including, if relevant:

  • Deeplearning4j version 1.0.0 SNAPSHOT build: nd4j-api-1.0.0-20190207.151315-14704.jar
  • platform information: Ubuntu 18.10, 64 Bit, Intel I7
  • ~CUDA version, if used~
  • ~NVIDIA driver version, if in use~

Contributing

If I can help to figure out what's going wrong let me know.

Aha! Link: https://skymindai.aha.io/features/ND4J-57

Bug High Priority ND4J Release

Most helpful comment

Hi,
I've further investigated this issue.

Every call of Shape.shapeOf() results in a new LongBuffer. (NDArrayIndex.java:329, NDArrayIndex.java:300, BaseNDArray.java:3070) Let's name it "Shape LongBuffer". The (underlying) DataBuffer of the INDArray which calls it holds a reference to this new "Shape LongBuffer". The "underlying DataBuffer" is the result of INDArray.shapeInfoDataBuffer() of the original INDArray on which we call getRow(). (ShapeOffsetResolution.java:365) This "underlying LongBuffer" is referenced by the "Shape LongBuffer" as wrappedBuffer. (BaseDataBuffer.java:176) I assume that, because of these cyclic references, the garbage collection can't remove the no longer needed local instances. This holds even if INDArray.close() is called and null is assigned to the INDArray variable. This reference cycle seems to be not cleaned up at any point.

Before the creation of the Shape "LongBuffer" it is not checked if already a suitable Buffer exists. There are a lot of equal objects (DataBuffer references seem to be all equal screenshot) with different System.identityHashCode (DataBuffer references System.identityHashCode screenshot ) building up.

Here is a visualization of the cyclic references dataBufferReferences.png

I've tested this using the following code based on @AlexDBlack code. (Gist MemoryLeakTest.java)

I've further investigated the matter and isolated a call of Shape.shapeOf() (Gist MemoryLeakShapeTest.java)
After Garbage Collection I end up with the following memory consumption:

DONE GC
Max memory:7090
total memory:1365
free memory:684
used memory:680

Do you see a workaround or a way to fix this issue properly?

Stack Traces for MemoryLeakTest.java

A call of INDArray.getRow results in the following stack traces that can be categorized in Shape Information Views and Data Views. I track every call to BaseDataBuffer(DataBuffer underlyingBuffer, long length, long offset) because its the only point I'm are aware of which adds new elements to either INDArray.data.references or INDArray.shapeInformation.references.

Shape Information Views

At first this stack is generated:

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
resolve:329, NDArrayIndex (org.nd4j.linalg.indexing)
get:4993, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)
<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
isRowVectorShape:1594, Shape (org.nd4j.linalg.api.shape)
resolve:336, NDArrayIndex (org.nd4j.linalg.indexing)
get:4993, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

Secondly this one:

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
resolve:329, NDArrayIndex (org.nd4j.linalg.indexing)
exec:365, ShapeOffsetResolution (org.nd4j.linalg.indexing)
get:4995, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

Third:

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
isRowVectorShape:1594, Shape (org.nd4j.linalg.api.shape)
resolve:336, NDArrayIndex (org.nd4j.linalg.indexing)
exec:365, ShapeOffsetResolution (org.nd4j.linalg.indexing)
get:4995, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

Fourth

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
shapeOf:3070, BaseNDArray (org.nd4j.linalg.api.ndarray)
subArray:2579, BaseNDArray (org.nd4j.linalg.api.ndarray)
get:5035, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

Data Views

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:79, FloatBuffer (org.nd4j.linalg.api.buffer)
create:67, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
<init>:192, BaseNDArray (org.nd4j.linalg.api.ndarray)
<init>:80, NDArray (org.nd4j.linalg.cpu.nativecpu)
create:390, CpuNDArrayFactory (org.nd4j.linalg.cpu.nativecpu)
create:4375, Nd4j (org.nd4j.linalg.factory)
create:2172, BaseNDArray (org.nd4j.linalg.api.ndarray)
subArray:2589, BaseNDArray (org.nd4j.linalg.api.ndarray)
get:5035, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

All 13 comments

Not a bug.

Shape buffers are cached, reused. And never released. And since they are very small, you have to do something VERY unusual to make a problem out of that.

So, please tell us what you're doing there in order to help you.

Or, you mean that cache is broken somehow?

No, i've checked with debugger - after 100k iterations there's only 4 shapes in cache, and they are perfectly reused.

nomachine - dev box 2_7_2019 19_38_57

Hmm, if I run the code in the gist on my machine I end up with a large amount of LongBuffers which are consuming all of my heap memory. I don't face the behavior on beta3 but in the current snapshot.

Is there any information I could provide which could help to figure out what's going wrong?

Where exactly you see those LongBuffers?

If I either use Visual JVM or the Memory profiling view in IntelliJ. According to Visual JVM the buffers are consuming the most of my memory. If I trigger a garbage collection step in Visual JVM none of the LongBuffers are cleaned up. (in the snapshot build) If I do the same on beta3 the LongBuffers get discarded and I don't run out of heap memory while iterating over indArray.getRow(i % 3).

Obviously, i'm using snapshots too, and i can't reproduce behavior you're describing. Maybe you're running some different code?

This is what I'm seeing (Snapshots, master, windows 10, native):
https://gist.github.com/AlexDBlack/83041fa0511954f97cd85e9a6544f6cb
image

---

Here's the same thing slightly modified to add GC. The memory snapshot was collected after "DONE GC" was printed in the code.

https://gist.github.com/AlexDBlack/db9b37bd041628f6c32b4acb64c7a1b2

image

Hmmm.... wrappedBuffer abused somewhere?

Hi,
I've further investigated this issue.

Every call of Shape.shapeOf() results in a new LongBuffer. (NDArrayIndex.java:329, NDArrayIndex.java:300, BaseNDArray.java:3070) Let's name it "Shape LongBuffer". The (underlying) DataBuffer of the INDArray which calls it holds a reference to this new "Shape LongBuffer". The "underlying DataBuffer" is the result of INDArray.shapeInfoDataBuffer() of the original INDArray on which we call getRow(). (ShapeOffsetResolution.java:365) This "underlying LongBuffer" is referenced by the "Shape LongBuffer" as wrappedBuffer. (BaseDataBuffer.java:176) I assume that, because of these cyclic references, the garbage collection can't remove the no longer needed local instances. This holds even if INDArray.close() is called and null is assigned to the INDArray variable. This reference cycle seems to be not cleaned up at any point.

Before the creation of the Shape "LongBuffer" it is not checked if already a suitable Buffer exists. There are a lot of equal objects (DataBuffer references seem to be all equal screenshot) with different System.identityHashCode (DataBuffer references System.identityHashCode screenshot ) building up.

Here is a visualization of the cyclic references dataBufferReferences.png

I've tested this using the following code based on @AlexDBlack code. (Gist MemoryLeakTest.java)

I've further investigated the matter and isolated a call of Shape.shapeOf() (Gist MemoryLeakShapeTest.java)
After Garbage Collection I end up with the following memory consumption:

DONE GC
Max memory:7090
total memory:1365
free memory:684
used memory:680

Do you see a workaround or a way to fix this issue properly?

Stack Traces for MemoryLeakTest.java

A call of INDArray.getRow results in the following stack traces that can be categorized in Shape Information Views and Data Views. I track every call to BaseDataBuffer(DataBuffer underlyingBuffer, long length, long offset) because its the only point I'm are aware of which adds new elements to either INDArray.data.references or INDArray.shapeInformation.references.

Shape Information Views

At first this stack is generated:

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
resolve:329, NDArrayIndex (org.nd4j.linalg.indexing)
get:4993, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)
<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
isRowVectorShape:1594, Shape (org.nd4j.linalg.api.shape)
resolve:336, NDArrayIndex (org.nd4j.linalg.indexing)
get:4993, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

Secondly this one:

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
resolve:329, NDArrayIndex (org.nd4j.linalg.indexing)
exec:365, ShapeOffsetResolution (org.nd4j.linalg.indexing)
get:4995, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

Third:

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
isRowVectorShape:1594, Shape (org.nd4j.linalg.api.shape)
resolve:336, NDArrayIndex (org.nd4j.linalg.indexing)
exec:365, ShapeOffsetResolution (org.nd4j.linalg.indexing)
get:4995, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

Fourth

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:105, LongBuffer (org.nd4j.linalg.api.buffer)
create:72, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
shapeOf:2834, Shape (org.nd4j.linalg.api.shape)
shapeOf:3070, BaseNDArray (org.nd4j.linalg.api.ndarray)
subArray:2579, BaseNDArray (org.nd4j.linalg.api.ndarray)
get:5035, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

Data Views

<init>:162, BaseDataBuffer (org.nd4j.linalg.api.buffer)
<init>:79, FloatBuffer (org.nd4j.linalg.api.buffer)
create:67, DefaultDataBufferFactory (org.nd4j.linalg.api.buffer.factory)
createBuffer:1122, Nd4j (org.nd4j.linalg.factory)
<init>:192, BaseNDArray (org.nd4j.linalg.api.ndarray)
<init>:80, NDArray (org.nd4j.linalg.cpu.nativecpu)
create:390, CpuNDArrayFactory (org.nd4j.linalg.cpu.nativecpu)
create:4375, Nd4j (org.nd4j.linalg.factory)
create:2172, BaseNDArray (org.nd4j.linalg.api.ndarray)
subArray:2589, BaseNDArray (org.nd4j.linalg.api.ndarray)
get:5035, BaseNDArray (org.nd4j.linalg.api.ndarray)
getRow:5076, BaseNDArray (org.nd4j.linalg.api.ndarray)
main:19, MemoryLeakTest (org.nd4j)

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings