NLog version: 4.5.10
Platform: .Net 4.5
Current NLog config
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<targets async="true">
<target name="logfile" xsi:type="File" fileName="log.txt">
<layout xsi:type="CsvLayout">
<withHeader>true</withHeader>
<column name="Date" layout="${date:universalTime=True:format=o}" />
<column name="Server" layout="${machinename}" />
<column name="Logger" layout="${logger}" />
<column name="Source" layout="${callsite}" />
<column name="Level" layout="${level}" />
<column name="Message" layout="${message}" />
<column name="Exception" layout="${exception:format=ToString}" />
</layout>
</target>
<target name="logfile_crash" xsi:type="File" fileName="log_crash.txt">
<layout xsi:type="CsvLayout">
<withHeader>true</withHeader>
<column name="Date" layout="${date:universalTime=True:format=o}" />
<column name="Server" layout="${machinename}" />
<!--<column name="ThreadId" layout="${threadid}" />-->
<column name="Application" layout="${mdlc:item=Application}" />
<!--<column name="Version" layout="${gdc:item=Version}" />-->
<column name="Logger" layout="${logger}" />
<column name="Source" layout="${callsite}" />
<column name="Level" layout="${level}" />
<column name="Message" layout="${message}" />
<column name="Exception" layout="${exception:format=ToString}" />
</layout>
</target>
</targets>
<rules>
<logger name="PerfTest" minlevel="Info" writeTo="logfile" />
<logger name="PerfTestCrash" minlevel="Info" writeTo="logfile_crash" />
</rules>
</nlog>
Hi,
We encounter a memory leak in our application in production, and the exploitation of memory dump led us to suspect NLog.
After many tests, we realized that using an asynchronous target with a ${threadid}, GDC or MLDC in the layout causes the leak.
The current thread hang, the memory increases and the application finally crash.
We can reproduce quickly by logging large messages quickly.
You can find a test project here with the demonstration : https://github.com/staufour/LoggerPerfTest
Thanks for your help.
We can reproduce quickly by logging large messages quickly.
Could you please define large messages?
We have to look into this. Do you know the "async" is a shorthand for the AsyncWrapper? You will have more control with the AsyncWrapper. See docs: https://github.com/NLog/NLog/wiki/AsyncWrapper-target#async-attribute-will-discard-by-default. Maybe that could help as a quickfix
Large = ~2Mo, 4Mo in the sample.
Yes we know about AsyncWrapper, but same result.
If we drastically reduce the size of the queue we see a decrease in memory usage.
But when we remove problematic items from the layout we do not have problem.
And why the current hang with threadid or MDLC in the layout ? Without it is instantaneous with a thread in background that writes the logs and huge memory consumption but no crash.
@staufour Instead of using <targets async="true"> then try this:
<targets>
<default-wrapper xsi:type="AsyncWrapper" timeToSleepBetweenBatches="0" overflowAction="Block" />
</targets>
Instead of opening / closing file for each write operation, then add keepFileOpen="true":
<target name="logfile" xsi:type="File" fileName="log.txt" keepFileOpen="true">
@staufour Could you also skip the use of the Common-Logging-Interface? Just to make the case a little simpler.
Have seen people getting bitten by Logger-wrappers before, because the wrapper-logic has huge side-effects.
@staufour Could you also skip the use of the Common-Logging-Interface? Just to make the case a little simpler.
Good one. I don't like supporting Common.Logging
@staufour And why the current hang with threadid or MDLC in the layout ? Without it is instantaneous with a thread in background
You are seeing the result of the NLog internal optimizations. When it doesn't have to capture the state of the current thread logging, then it can do both deferred async-formatting AND async-writing.
But when the target-layout specifies that thread-context-state has to be captured (Ex. current threadid). Then the thread doing the logging has to pay the price and will perform sync formatting. The async-writing is still deferred.
@staufour Could you also try and configure the CsvLayout to have the option:
<withHeader>false</withHeader>
Can see that my optimizations in #2653 are disabled for the CsvHeaderLayout (Will fix this)
@staufour Created #2934 that optimizes the CsvLayout when used by concurrent threads.
If everything goes well, then there should be pre-release nuget-package you can try in the artifacts-output (NLog.4.5.10.8452-PR2934.nupkg):
https://ci.appveyor.com/project/nlog/nlog/build/4.5.0-8452/artifacts
If everything goes well, then there should be pre-release nuget-package you can try in the artifacts-output:
I removed some builds from the queue, expected eta 1-2 hours
Some tests later...
I added the default wrapper :
<default-wrapper xsi:type="AsyncWrapper" timeToSleepBetweenBatches="0" overflowAction="Block" />
And it resolve the memory leak for target like
<target name="csv_crash" xsi:type="File" fileName="csv_crash.txt" keepFileOpen="true">
<layout xsi:type="CsvLayout">
<withHeader>true</withHeader>
<column name="Date" layout="${date:universalTime=True:format=o}" />
<column name="Server" layout="${machinename}" />
<!--<column name="ThreadId" layout="${threadid}" />-->
<column name="Application" layout="${mdlc:item=Application}" />
<!--<column name="Version" layout="${gdc:item=Version}" />-->
<column name="Logger" layout="${logger}" />
<column name="Source" layout="${callsite}" />
<column name="Level" layout="${level}" />
<column name="Message" layout="${message}" />
<column name="Exception" layout="${exception:format=ToString}" />
</layout>
</target>
But the current thread always remains blocked.
I test with different targets :
With file target with plain text layout like
<target name="plain_text_crash" xsi:type="File" fileName="plain_text_crash.txt" keepFileOpen="true"
layout="${date:universalTime=True:format=o} ${machinename} ${mdlc:item=Application} ${logger} ${callsite} ${level} ${message} ${exception:format=ToString}">
</target>
The application crash with OutOfMemory even with the default-wrapper.
With a database target we don't have the problem, neither memory leak nor thread blocking.
With a network target we encounter the same kind of problem as with file targets.
I removed the usage of Common.Logging : does not change anything.
The same for CsvLayout with withHeader true/false.
NLog tries to optimize for the usecase, where the user writes normal-length log-message, and configures the output-target to include a lot of context-info using many different layoutrenderers.
There are two ways to perform capture of context-info for async writing. One way is to capture each context-info in separate string-outputs. Another way is to capture the entire output-string in one single string-output. NLog prefers the later method to avoid many small string-allocation and object-references, and this gives good performance for most users.
But for your case, then the "giant" log-message will cause the single string-output to be huge. Leaving lots of work for the garbage collector, because it is hard for it to keep up with all the allocations.
@staufour But the current thread always remains blocked.
When using JsonLayout or CsvLayout, then there are special characters that needs encoding. There is an overhead of scanning (and maybe replacing) these characters in a giant string.
But I guess the conclusion is that writing 4 MByte Text messages, that are mixed together with async-capture-state does not perform very well in NLog by default.
I guess a workaround could be to abuse LogEventInfo.Properties, and create a special-logger-method, that captures the needed thread-context and puts them into the LogEventInfo.Properties (Use the ${event-properties} instead of ${threadid}). See also #2496
Another work-around could be to specify a queueLimit on the default-wrapper AsyncWrapper (Blocking the producer from taking all memory):
<default-wrapper xsi:type="AsyncWrapper" timeToSleepBetweenBatches="0" overflowAction="Block" queueLimit="100" batchSize="5" />
@staufour Made some more optimizations of the CsvLayout, that should reduce the memory allocations:
https://ci.appveyor.com/project/nlog/nlog/build/4.5.0-8455/artifacts
Please try NLog.4.5.10.8455-PR2934.nupkg
When using JsonLayout or CsvLayout, then there are special characters that needs encoding. There is an overhead of scanning (and maybe replacing) these characters in a giant string.
When I say blocked, it's blocked until the end of writing logs.
One thing I don't understand is why the addition of <default-wrapper xsi:type="AsyncWrapper" timeToSleepBetweenBatches="0" overflowAction="Block" /> fixed the memory leak while I only log 1000 elements which is less than the default size of the queue.
timeToSleepBetweenBatches=0 makes it much more eager to empty its queue.
When running as 32bit process then it crashed independent of config until I added queueLimit.
But I guess the conclusion is that writing 4 MByte Text messages, that are mixed together with async-capture-state does not perform very well in NLog by default.
Our application only very rarely produces logs larger than 16kb and logs of 2 to 4 MB even less.
But we search search for a memory leak who crash our application, so we stress a little the log system to quickly see if it's related.
We analyse dump file with a Microsoft guy and find that a large volume of memory is retained by NLog over long periods of time and linked to many System.Threading.TimerQueueueTimer.
By testing the mitigation of the problem with the default async wrapper and the update of the Nlog version in our application I found a new issue.
We have 2 targets : database & network
I try to log 10 000 logs of ~100ko.
No problem with only database target.
No problem with only network target.
Memory escalation with both (~3Go) and OutOfMemory crash with 150Ko logs.
update of the Nlog version
So you have tested with NLog.4.5.10.8455-PR2934.nupkg ?
Memory escalation with both (~3Go) and OutOfMemory crash with 150Ko logs.
Can only suggest that you use the following AsyncWrapper-configuration for targets receiving very large messages (This will reduce performance in handling small messages, but will prevent out-out-memory):
<default-wrapper xsi:type="AsyncWrapper" timeToSleepBetweenBatches="0" overflowAction="Block" queueLimit="150" batchSize="10" fullBatchSizeWriteLimit="10" />
queueLimit and batchSize can be increased if normal max-message-size is less than 4 Mbyte (gives better performance). And it is a very good idea to use keepFileOpen="true"
So you have tested with NLog.4.5.10.8455-PR2934.nupkg ?
Not yet. We upgrade to the last stable version for production. We were to the 4.2.3...
I will test your configuration to see impacts between memory consumption and performance.
And test with a test set more similar to our production environment (between 200k and 300k log / day and mostly < 60Ko)
I think we have to use keepFileOpen=true with an openFileCacheTimeout.
Thanks for your support :)
@staufour Should have mentioned that timeToSleepBetweenBatches="0" only works with NLog ver. 4.3.10 and newer (Using zero on older versions gives issues #1069)
But a lot of optimizations has been added to NLog from ver. 4.3.10 (and newer). Just thought that you were using 4.5.10 since it was in the original post.
@staufour Should have mentioned that
timeToSleepBetweenBatches="0"only works with NLog ver. 4.3.10 and newer (Using zero on older versions gives issues #1069)But a lot of optimizations has been added to NLog from ver. 4.3.10 (and newer). Just thought that you were using 4.5.10 since it was in the original post.
Yes all test here are done with 4.5.10.
The upgrade to 4.5.10 (and the settings change) for production is the first quick & risk-free step to see if it makes the IIS pool crashes disappear.
We want to eliminate NLog from potential guilty parties and continue our investigations on other leads.
@staufour Made some performance tests to compare the PR2934-build with NLog 4.5.10 (Master): https://github.com/NLog/NLog/pull/2934#issuecomment-424873344
@staufour Guess this issue can be resolved. Since NLog will never be able to keep up with the same prepared 4 MByte message that is sent over and over again and has to be checked for CSV-escaping.
It also seems like the tweaks you have made to the NLog-config have resolved your production issue.
NLog 4.6 will include the optimizations made to CsvLayout that will reduce memory allocations. Together with options to tweak performance for CSV-escaping of individual columns.
I also think this could be closed. Thanks @snakefoot
@staufour NLog 4.6.1 has now been released that includes several performance improvements to CsvLayout: https://www.nuget.org/packages/NLog
Most helpful comment
@staufour Made some more optimizations of the CsvLayout, that should reduce the memory allocations:
https://ci.appveyor.com/project/nlog/nlog/build/4.5.0-8455/artifacts
Please try
NLog.4.5.10.8455-PR2934.nupkg