On Linux, when running in an unrestricted environment, the GC uses sysconf(SYSCONF_PAGES) * sysconf(_SC_PAGE_SIZE)
to evaluate the total memory consumption of the system (https://github.com/dotnet/coreclr/blob/master/src/pal/src/misc/sysinfo.cpp#L368).
SYSCONF_PAGES
is mapped on _SC_AVPHYS_PAGES
. Unfortunately, it counts the memory used by the page cache (which is automatically freed by the OS as needed), and therefore overestimates the system load.
$ free -h
total used free shared buff/cache available
Mem: 62G 26G 30G 1.5M 6.1G 35G
Swap: 0B 0B 0B
$ getconf _AVPHYS_PAGES
7968847
$ getconf PAGESIZE
4096
We can see here that _AVPHYS_PAGES * PAGESIZE is 32 GB, even though only 26 GB of resident memory is actually used. We've seen instances where the GC incorrectly concludes that more of 90% of the memory is used, and start doing blocking collections even though it shouldn't be needed.
/cc: @janvorli @Maoni0
Wow. Hope the fix for this can make 3.0.
@kevingosse the _SC_AVPHYS_PAGES represents number of available pages , not the number of used pages. The free
command above reported 30GB of free memory and _AVPHYS_PAGES * PAGESIZE from your numbers is 30GB, not 32. So the value we are using seems correct.
Since the used / free in your case is close to each other, I guess it has mislead you. Here is in example from my machine:
$free -h
total used free shared buff/cache available
Mem: 23G 3,4G 15G 5,0M 4,6G 19G
Swap: 23G 1,1G 22G
$ getconf _AVPHYS_PAGES
4074733
$ getconf PAGESIZE
4096
_AVPHYS_PAGES * PAGESIZE = 16690106368 B = 16298932 KB = 15917 MB = 15 GB
My point is that the "available" column should be used instead of the "free" column (reported by _AVPHYS_PAGES). A large part of the memory reported in "buff/cache" is automatically freed by the OS when needed, and therefore shouldn't be counted by the GC when evaluating the system load.
Ah, got it. It seems we would need to parse the /proc/meminfo
file to get the Buffers and Cached size then. I wonder if the Slab
should be included too. Based on a Redhat article (https://access.redhat.com/solutions/406773), the free reports buff/cache as a sum of Buffers, Cached and Slab from /proc/meminfo
.
In /proc/meminfo
, MemAvailable has been added precisely so that users wouldn't have to ask themselves this kind of questions: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34e431b0ae398fc54ea69ff85ec700722c9da773
I don't think it's available on RHEL6 though, so if it's still supported for .net core 3.0 you may need to add a fallback path where you manually compute the value.
Edit: Never mind, it has been backported: https://access.redhat.com/solutions/776393
looks like it's important one, could you consider porting fix in 3.0.x/3.1?
Yes, we do want to try to get this change approved for 3.1.
Getting into 3.0.x servicing will be very hard.
Could you explain further why this miscalculation would be hard to address in a 3.0.x release?
3.0 is a very short-lived release. It will be quickly superseded by 3.1 LTS in November. 3.1 LTS is where it is important to get bugs like this one fixed and it is what we are focused on.
Most helpful comment
Wow. Hope the fix for this can make 3.0.