Tfjs: Memory leak during training

Created on 18 Apr 2018  路  8Comments  路  Source: tensorflow/tfjs

TensorFlow.js version

"@tensorflow/tfjs@^0.9.1":
version "0.9.1"
dependencies:
"@tensorflow/tfjs-core" "0.7.1"
"@tensorflow/tfjs-layers" "0.4.1"

Browser version

68.0.3397.0 (Official Build) canary (64-bit)

Describe the problem or feature request

When training a simple model with fit(), I'm experiencing a huge memory leak. See below image. It grows to eventually consume my GPU's memory, and then it will consume my system's shared memory.

I've looked at the examples, and fit() is never surrounded by tidy(). I suppose that's because it already uses tidy() internally?

Code to reproduce the bug / link to feature request

This code is enough to reproduce the issue:

const inputs = tf.layers.input({ shape: [256], dtype: DType.float32 });
const dense1 = tf.layers.dense({ units: 128, activation: 'relu' }).apply(inputs);
const dense2 = tf.layers.dense({ units: 64, activation: 'relu' }).apply(dense1);
const outputs = tf.layers.dense({ units: 6, activation: 'softmax' }).apply(dense2);
const model = tf.model({ inputs, outputs });

model.compile({
    loss: 'categoricalCrossentropy',
    optimizer: 'adam',
});

const numSamples = 25632;
const xData = new Float32Array(numSamples * 256);
const yData = new Int32Array(numSamples);

const x = tf.tensor2d(xData, [numSamples, 256], 'float32');
const y = tf.oneHot(tf.tensor1d(yData, 'float32'), 6);

for (let i = 0; i < 100; i++) {
    await model.fit(x, y, {
        batchSize: 64,
        epochs: 5,
        shuffle: true,
        validationSplit: .2,
    });
}

tensorflowjs

layers bug

All 8 comments

@caisq would you be able to take a look at this?

What's the status on this?

Several memory leaks in layers have been patched up (tfjs-layers/#186). Not clear if this fixes the root cause though. Can you retry the test using layers 0.6.1 or better? Thanks!

I'm using 0.11.1. Not sure what version of layers that implies. Still seeing a leak (judging from the number of tensors from tf.memory).

The problem could be elsewhere, of course, but the only potential culprit as far as I can see is setting values in a tensorBuffer.

tfjs 0.11.1 uses tfjs-layers 0.6.1
I figured this out by looking at the package.json file
https://github.com/caisq/tfjs-1/blob/dc599fdf73f9eb9e4940590d137546570c9012b4/package.json

Sounds like we need to dig deeper to find the root cause. In the mean time, if this is blocking for you can you wrap your call to model.fit() in a tidy() block?

@bileschi The memory optimization that went into the latest release is orthogonal to this issue. I have it on my TODO list to look at this issue soon.

Hey there folks.
I am currently working on a tf.js (version 11.1) project and came across the memory leak issue. I looped 10 times over a model.fit function with a given epoch. As you can see on the screenshots below the number of tensors grew each epoch by a factor of [#data/batchSize]*epoch. I don't know if this information is of any value, but I felt like adding it here!

32_1_64_1

32_3_128_3

For example:
batchSize 32 and epoch 1.
Each Iteration increases the number of tensors by 360 which is approx. [11518/32]

Fix is on the way. See the referred PR above.

Was this page helpful?
0 / 5 - 0 ratings