Openj9: Footprint regressions when using -XX:+TransparentHugePage

Created on 17 Jun 2019  路  15Comments  路  Source: eclipse/openj9

Transparent huge page support was added to OMR via https://github.com/eclipse/omr/pull/3647. Subsequent performance testing indicates that using this support (which may be enabled by default on Linux until the introduction of #6013) can negatively impact footprint measurements. In particular, a footprint regression of 8% was found on pLinux with the OS setting of /sys/kernel/mm/transparent_hugepage set to madvise, compared to a build without madvise(..., MADV_HUGEPAGE) support.

vm externals perf

Most helpful comment

We should enable this on both p/z Linux, absorbing the footprint regression which appeared to be less than 10% on DT. Percentage-wise would be less on bigger heap. Keeping all Linux consistent is another benefit in doing this.

All 15 comments

6157 disables the change to use TransparentHugePages when the OS setting was madvise (https://github.com/eclipse/openj9/pull/6013) until the footprint regression is investigated. FYI @vijaysun-omr

@fengxue-IS Can you take a look at this? We need to figure out why it's regressing footprint - see #6157 for detials

Moving this to the 0.16 milestone as it was found late.

This is an important issue to resolve I feel since @mpirvu found that running on Ubuntu (where the default OS setting is "madvise") led to 10% loss in throughput that was regained either when the OS setting was changed to "always" or when this VM feature that was disabled due to footprint regression was in use. If we can find a way of reducing the footprint regression in this issue, we can re-enable the "madvise" feature by default. FYI @andrewcraik since this has a pretty big impact on out of the box throughput on Ubuntu (and other machines where the setting is "madvise").

I am suspecting the footprint regression that was noticed on plinux platform is likely due to how THP is set up on the machine, quick check showed that the max_ptes_none value is set to 255 (how many empty/unused page allowed to during transparent hugepage allocation) combined with the 64kb default page size, this likely will cause kernel to create 16MB THP on the machine (cannot confirm as the actual THP size is not expose outside the kernel).

As the benchmark we are running only consumes ~120MB of memory, overcommitting 1-2 pages will have a significate difference.

I will be continue to run some benchmark to confirm that this.
@mpirvu can you provide the test setup in more detail? (xlinux by default is running on 4k(2mb thp) pages vs plinux with 64k(16mb thp ??) pages, we could consider enabling thp by default only on xlinux)

FYI, @zl-wang and @gita-omr see the last comment on potentially different tradeoff/behaviour on Power and express an opinion if you have a preference.

@mpirvu can you provide the test setup in more detail?

Both Daytrader7 and AcmeAir throughput runs were improved substantially when the JVM could take advantage of the transparent huge pages.

@mpirvu can you provide the test setup in more detail?

Both Daytrader7 and AcmeAir throughput runs were improved substantially when the JVM could take advantage of the transparent huge pages.

Is this only on xlinux or does it also apply to plinux as well?

I have only measured on xlinux Ubuntu 16.04

Significant performance gains were seen on Skylake when we added the madvise support. Before saying we should only enable on x a throughput study on the other platofrms (including p) would be warranted - if it is worth 10% throughput for 5% footprint that is likely a tradeoff that is acceptable given how hard it is to get 10% on some of these benchmarks.

In general I would prefer that we keep configurations and support consistent on all Linux architectures if at all possible to reduce surprises and behavior differences in the future.

Interested in seeing an evaluation on pLinux, although I expected the performance benefit is less (compared to xLinux), since the default page size on pLinux is 64KB already.

HugePage on p will be either 16MB (pre-POWER9 or POWER9 Hashed Page table based) or 2MB (POWER9 radix page-table based).

you can tell if it is hash based or radix page table by looking at the bottom of /proc/cpuinfo:
timebase : 512000000
platform : pSeries
model : IBM pSeries (emulated by qemu)
machine : CHRP IBM pSeries (emulated by qemu)
MMU : Hash <=== Hashed page table (16MB hugepage)

or

timebase : 512000000
platform : PowerNV
model : 9006-22C
machine : PowerNV 9006-22C
firmware : OPAL
MMU : Radix <=== Radix page table (2MB hugepage)

or, checking /proc/meminfo: (e.g.)
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB

Keeping this open to track any decision on whether to enable on power / zlinux again

We should enable this on both p/z Linux, absorbing the footprint regression which appeared to be less than 10% on DT. Percentage-wise would be less on bigger heap. Keeping all Linux consistent is another benefit in doing this.

fyi @DanHeidinga see last comment on this feature for power/z

Was this page helpful?
0 / 5 - 0 ratings