Machinelearning: Potential memory leak when training?

Created on 21 May 2019  路  3Comments  路  Source: dotnet/machinelearning

I modified a project in the samples repo to repro:

https://github.com/daholste/machinelearning-samples/commit/577946224747959759ca1b4da9fd86f77eea81a9

If breakpoint at https://github.com/daholste/machinelearning-samples/blob/577946224747959759ca1b4da9fd86f77eea81a9/samples/csharp/getting-started/BinaryClassification_SentimentAnalysis/SentimentAnalysis/SentimentAnalysisConsoleApp/Program.cs#L52 , each model that is trained, the memory usage of the process consistently increases by about 80 MB (especially after the first few iterations of the loop).

When the model is compressed & saved to disk, the size of the model file is only about 4.5 MB.

When loading the saved model back into memory, memory of the process appears to jump by around 50 MB. (When loading the model several times back from disk to memory in the same process, average size of the model appears to be around 40 MB in memory. Not sure why. Perhaps string pooling? Not sure.)

Is this a memory leak?
80 MB of memory taken up by training model - 50 MB megabytes of same model serialized / deserialized = 30 MB of leakage?
Or, does serializing / deserializing the model potentially restructure the data structures to use memory more efficiently?

P0

Most helpful comment

There doesn't seem to be a memory leak. There may be more memory used than needed, and there may be ways we can optimize the memory consumed, but there doesn't seem to be a leak.

I have been working with the code in your branch for a few days now. Your initial estimate is correct. The memory usage jumps with every iteration when trainingPipeline.Fit(trainingData) is called. About 380MB gets allocated when it is called the first time and thereafter it increases by about 80MB with each iteration. But every object allocated seems to have a valid root that is still in scope. Most objects, in fact, are rooted on the trainer object. The memory that the trainer object holds on to on the heap does increase with each training iteration.

It may be possible to optimize his by holding a more sparse representation and shedding the other runtime objects within the classifier at the end of the Fit() function. But when the trainer object does go out of scope AND the garbage collector is run, this memory does get freed up. (I have noticed that the GC doesn't immediately reclaim the memory allocated for the trainer after it has gone out of scope. It does so at a later time unless forced to reclaim right away)

To verify the above, you can refactor steps 2 and 3 in your code into a separate function such that it returns only the trained model and letting the trainer object go out of scope. After that, if you force a garbage collection, you will see heap size drop dramatically irrespective of the number of iterations of the training loop.

It is possible I may be missing something. So please try out the above and let me know. I will close the issue after that if this explanation is satisfactory.

If this issue is affecting AutoML scenarios, I would recommend a similar approach in the short term. That is, to refactor the code such that we no longer hold on the trainer object after the necessary model is created. (Saving and reloading the model could be another further optimization).

All 3 comments

This bug was originally seen in AutoML where we create many models and much of the memory usage was from (apparently) memory leaks. Very small linear models were holding references to a large amount of memory. Currently AutoML mitigates this by always writing the model to disk (and reloading it upon usage).

If we wrote the models to disk, deleted references to the model, then reloaded the models to memory, a large quantity of memory would be freed. (same as the above example of @daholste)

At the time, we took a heap dump and it looked to be intermediate data being held, but we'd have to investigate again as I don't recall the specifics.

There doesn't seem to be a memory leak. There may be more memory used than needed, and there may be ways we can optimize the memory consumed, but there doesn't seem to be a leak.

I have been working with the code in your branch for a few days now. Your initial estimate is correct. The memory usage jumps with every iteration when trainingPipeline.Fit(trainingData) is called. About 380MB gets allocated when it is called the first time and thereafter it increases by about 80MB with each iteration. But every object allocated seems to have a valid root that is still in scope. Most objects, in fact, are rooted on the trainer object. The memory that the trainer object holds on to on the heap does increase with each training iteration.

It may be possible to optimize his by holding a more sparse representation and shedding the other runtime objects within the classifier at the end of the Fit() function. But when the trainer object does go out of scope AND the garbage collector is run, this memory does get freed up. (I have noticed that the GC doesn't immediately reclaim the memory allocated for the trainer after it has gone out of scope. It does so at a later time unless forced to reclaim right away)

To verify the above, you can refactor steps 2 and 3 in your code into a separate function such that it returns only the trained model and letting the trainer object go out of scope. After that, if you force a garbage collection, you will see heap size drop dramatically irrespective of the number of iterations of the training loop.

It is possible I may be missing something. So please try out the above and let me know. I will close the issue after that if this explanation is satisfactory.

If this issue is affecting AutoML scenarios, I would recommend a similar approach in the short term. That is, to refactor the code such that we no longer hold on the trainer object after the necessary model is created. (Saving and reloading the model could be another further optimization).

I am closing the issue. Please reopen if you still see a memory leak.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

samueleresca picture samueleresca  路  3Comments

ddobric picture ddobric  路  4Comments

neven10 picture neven10  路  3Comments

pgovind picture pgovind  路  3Comments

bs6523 picture bs6523  路  4Comments