Here is a fully self-contained reproduction of the issue, which was also reported in #2386. When the program is run, the GPU memory usage increases linearly until it is exhausted. When the line containing GC.Collect(); is uncommented, the memory leak goes away.
using System;
using System.Collections.Generic;
namespace CNTK.GPUMemoryLeak
{
class Program
{
static void Main(string[] args)
{
var device = DeviceDescriptor.GPUDevice(0);
var dim = 1 << 10;
/* Set up a dummy model to train */
var inputShape = NDShape.CreateNDShape(new int[] { dim });
var inputVar = Variable.InputVariable(inputShape, DataType.Double, "input");
var labelShape = NDShape.CreateNDShape(new int[] { 1 });
var labelVar = Variable.InputVariable(labelShape, DataType.Double, "label");
var weightsShape = NDShape.CreateNDShape(new int[] { dim, 1 });
var weights = new Parameter(weightsShape, DataType.Double, CNTKLib.GlorotUniformInitializer(), device, "weights");
var model = CNTKLib.Sigmoid(CNTKLib.TransposeTimes(weights, inputVar).Output, "model");
var cost = CNTKLib.BinaryCrossEntropy(model.Output, labelVar, "cost");
var error = CNTKLib.ClassificationError(model.Output, labelVar, "error");
/* Create random training data */
var rng = new Random();
var inputData = new double[1 << 22];
for (var i = 0; i < inputData.Length; i++)
inputData[i] = rng.NextDouble();
var labelData = new double[inputData.Length / dim];
for (var i = 0; i < labelData.Length; i++)
labelData[i] = (rng.Next(2) == 0) ? 0.0 : 1.0;
/* Setup training objects */
var learner = Learner.SGDLearner(cost.Parameters(), new TrainingParameterScheduleDouble(0.01));
var trainer = Trainer.CreateTrainer(model.Output, cost, error, new Learner[] { learner });
var arguments = new Dictionary<Variable, Value>();
/* Run training */
var count = 0;
while (count < 100)
{
using (var inputBatch = Value.CreateBatch(inputShape, inputData, 0, inputData.Length, device))
{
using (var labelBatch = Value.CreateBatch(labelShape, labelData, 0, labelData.Length, device))
{
arguments[inputVar] = inputBatch;
arguments[labelVar] = labelBatch;
trainer.TrainMinibatch(arguments, false, device);
Console.WriteLine("{0}", ++count);
}
}
arguments.Clear();
// !!! Uncomment the line below, and the GPU memory leak goes away !!!!
//GC.Collect();
}
Console.WriteLine("done");
}
}
}
Each Value.CreateBatch allocates a buffer from GPU, and without GC it will exhaust GPU memory. Please use MinibatchSource if GC.Collect() is not desired.
But shouldn't the Dispose method on the Value object free the corresponding bit of GPU memory?
Seems not, but you may try Value.Erase(). Also, note that GPU computation is running asynchronously to CPU, but GPU memory allocation/free are synchronous. Freeing value object too eager might lead to GPU errors with bad address.
Thanks, indeed calling Value.Erase() works. I think most developers would expect that calling IDisposable::Dispose would release the unmanaged resources. Would you consider updating the Dispose implementation to call Erase?
Good suggestion, will look into that on SWIG side.
@mjmckp Thank you so much for this helpful repro. It was fixed with this commit:
https://github.com/Microsoft/CNTK/commit/3e83c56b8fc4d0e2878a710b09c8d61f3a9e76d3
Most helpful comment
Thanks, indeed calling
Value.Erase()works. I think most developers would expect that callingIDisposable::Disposewould release the unmanaged resources. Would you consider updating theDisposeimplementation to callErase?