Nextflow: Inconsistency of memory usage values within the report and timeline

Created on 28 Dec 2018  路  44Comments  路  Source: nextflow-io/nextflow

Bug report

Expected behavior and actual behavior

expected behavior

Whatever we look at the Tasks table in _Raw values mode_ or _Human readable mode_ we expect to read the exact same values for the memory usage if the decimal system is used. We also expect these values to be the same in the Timeline.

actual behavior

  • in the Memory usage plot from the report, the memory value reported is 1051574272 (raw value extracted from Plotly) and displayed as 1.052G in the boxplot

  • in the Tasks table from the report in the Raw values mode, the memory value reported in the field vmem is 1051574272 which is consistent the previous value in the plot

  • in the Tasks table from the report in the Human readable mode, the memory value reported in the field vmem is 1.1GB seems to be ~ 1051574272 x (1.024 x 1.024). This value should be 0.97935GB = 1051574272/(1024 x 1024 x 1024) if the binary unit is used. Same behavior holds for peak_vmem and allocated memory fields but not for rss and peak_rss.

  • in the Timeline, the memory displayed is 1002.9 MB which is ~ 1051574272/(1024 x 1024) = 1002.859375. This is correct if the binary unit is used.

It should be mentioned whether you use decimal or binary unit to display the memory usage.

Steps to reproduce the problem

Install the program stress:
sudo apt-get install stress or sudo yum install stress

Create the following nextflow.nf script:

#!/usr/bin/env nextflow

process TwoCpus4mn {

    cpus 1
    memory '1 GB'

    """
    /usr/bin/time -v stress -c 2 -t 15 -m 1 --vm-bytes 1000000000
    """
}

Launch:

nextflow nextflow.nf -with-report report.html -with-timeline timeline.html

Environment

  • Nextflow version: [18.10.1]
  • Java version: [OpenJDK 64-Bit Server VM (build 10.0.2+13-Ubuntu-1ubuntu0.18.04.4, mixed mode)]
  • Operating system: [Linux 4.15.0-43-generic x86_64]
kinbug

All 44 comments

reply to myself:

Actually, the memory value reported in the field vmem with the value 1.1GB is due to the rounding performed in line 154 of the file modules/nextflow/src/main/resources/nextflow/trace/assets/ReportTemplate.js: return bytes.toFixed(1)+' '+units[u];

Changing this line to return bytes.toFixed(3)+' '+units[u]; gives 1.052 GB which is consistent with what is displayed in the boxplot. I suggest to apply this modification in ReportTemplate.js.

It seems that the report uses decimal unit for the memory while the timeline uses the binary unit. Same unit should be used for both.

Related to this I've also noted that the fields read_bytes and write_bytes are not consistent with the activity of stress, it should be used rchar and wchar instead. http://man7.org/linux/man-pages/man5/proc.5.html

This is interesting, I started seeing issues in the reported memory usage on our new cluster as well (RHEL 7, Singularity 2.5.2), was not sure if its related. Example:

screen shot 2019-01-09 at 1 39 14 pm

Obviously, nothing was using 1.5TB of RAM, but I have been seeing this in the reports. The trace from these tasks:

task_id hash    process %mem    rss vmem    peak_rss    peak_vmem   rchar   wchar   syscr   syscw
1129    89/416c95   custom_sample_report    0.0 82096128    1664303104  82231296    1576106508288   9044907 44077   1315    190
1135    bd/ea7a7c   custom_sample_report    0.0 82849792    1674637312  82984960    1576106508288   9567085 20826   1194    158
1137    1c/b5b8bc   custom_sample_report    0.0 84549632    1674637312  84684800    1576106508288   9566701 20809   1198    158
1120    9f/202012   custom_sample_report    0.0 82272256    1674637312  82407424    1576106508288   9567187 20977   1194    160
1124    7e/00070c   custom_sample_report    0.0 81670144    1675296768  81805312    1576106512384   9567045 20743   1191    158
1127    33/c31a21   custom_sample_report    0.0 81707008    1674637312  81842176    1576106508288   9566915 20805   1196    158
1123    dc/d4da94   custom_sample_report    0.0 81707008    1674637312  81842176    1576106508288   9567153 20807   1194    158
1131    fb/eaaaf4   custom_sample_report    0.0 81702912    1666248704  81838080    1576106508288   9566945 20656   1194    156
1148    8c/5b4a56   custom_sample_report    0.0 81666048    1666101248  81801216    1576106512384   9566807 20591   1195    156
1136    b3/c74a1e   custom_sample_report    0.0 81707008    1674637312  81842176    1576106508288   9566947 20809   1194    158
1138    80/677109   custom_sample_report    0.0 81707008    1674637312  81842176    1576106508288   9566767 20805   1195    158
1125    8c/795dc5   custom_sample_report    0.0 81707008    1674637312  81842176    1576106508288   9567329 20810   1194    158
1126    48/c6dae0   custom_sample_report    0.0 82714624    1674625024  82849792    1576106508288   9566919 20703   1194    158
1128    f0/309dc5   custom_sample_report    0.0 80650240    1664303104  80785408    1576106508288   9044899 44081   1312    190
1139    a9/3a2d55   custom_sample_report    0.0 81702912    1666248704  81838080    1576106508288   9566733 20658   1195    156
1130    d5/f3bbb7   custom_sample_report    0.0 81702912    1666248704  81838080    1576106508288   9567065 20657   1195    156
1132    43/19700d   custom_sample_report    0.0 81707008    1674637312  81842176    1576106508288   9566913 20813   1195    158
1133    0f/e99f8b   custom_sample_report    0.0 81702912    1666248704  81838080    1576106508288   9567085 20661   1195    156
1134    4b/800aee   custom_sample_report    0.0 81707008    1674637312  81842176    1576106508288   9566921 20809   1198    158
1142    be/6004f2   custom_sample_report    0.0 73023488    1576062324736   73158656    1576106508288   8943090 10795   1085    124
1143    f5/254075   custom_sample_report    0.0 72916992    1576068993024   73052160    1576106508288   8943104 10805   1085    124
1144    2e/a49717   custom_sample_report    0.0 74350592    1576019144704   74485760    1576106508288   8943098 10795   1087    124
1141    e5/c12f4d   custom_sample_report    0.0 72916992    1576062324736   73052160    1576106508288   8943090 10795   1085    124
1145    20/78905c   custom_sample_report    0.0 72916992    1576068997120   73052160    1576106508288   8943090 10795   1085    124
1140    99/234a34   custom_sample_report    0.0 72916992    1576073678848   73052160    1576106508288   8943090 10795   1085    124
1149    59/feb997   custom_sample_report    0.0 72912896    1576065224704   73048064    1576106512384   8942978 10715   1085    124
1146    e7/bc0bca   custom_sample_report    0.0 72916992    1576064217088   73052160    1576106508288   8943090 10795   1085    124
1147    59/70ae9f   custom_sample_report    0.0 72916992    1576073678848   73052160    1576106508288   8943090 10795   1085    124

edit: this might be more relevant for #980 actually

@stevekm Work in progress, stay in touch.

@phupe I've slightly modified the RSS figure, adding as a third tab in the Memory chart instead of having a separate chart for RSS memory. I think this a bit more consistent, also because I don't think it has much sense to report the RSS as percentage of memory request. My understanding is that RSS is much smaller than allocated (virtual) memory.

Included in version 19.01.0-edge.

@pditommaso:

This disccusion is actually related to Issue #980.

I see your point but let me clarified why it is important to have the percentage of memory requested for the peak RSS. Thus I really think that 2 different charts are needed for both vmem and peak_rss.

The total amount of memory used be a processs is the Virtual memory (vmem). The vmem contains all memory areas whether they are in the physical memory (RAM), in the Swap space, on the disk or shared with other processes.

The resident set size (RSS) is the amount of space of physical memory (RAM) held by a process.

The relationship is: vmem >= RSS + Swap.

The following page gives an interesting example (https://docs.hpc.qmul.ac.uk/using/memory/) to illustrate the difference of vmem and RSS.

Let's assume you have a newtflow process with the memory directive set to memory '1 GB'.

Let's assume we submit a job using PBS/Torque. According to the PbsExecutor.groovy file in nextflow source code, the job will be submitted using the following command: qsub -l mem=1gb.

The mem option of PBS/Torque corresponds to the maximum amount of physical memory used by the job, in other words, it is the maximum RSS value allowed by the process. PBS/Torque is generally set to kill the job if the process uses more memory than the job requested in the mem option.

Therefore, it is really important to figure out what is the peak RSS of your process for 2 main objectives:

  • you don't want your job to be killed for exceeding the requested RSS, thus you need a security threshold
  • you want to optimize the memory ressources booked on your node to allow the submission of other jobs since this is the value of the mem option that PBS/Torque will consider to submit or queued a job.

A boxplot with the percentage of peak RSS / requested RSS is definitively important to calibrate what amount of memory in the nextflow directive to meet these 2 objectives.

Maybe this is the terminology % Allocated which is misleading since it refers to memory allocation. I suggest maybe to simply use % of requested or % of booked in the legend.

I see your point. The main issue is that, it's not defined what kind of memory the directory memory should specify: virtual - or - physical. PBS is physical, SGE is virtual, K8s? Batch? frankly I don't. I guess it depends on how docker limit the memory. Looking the docker docs it seems suggesting that's the physical memory.

If we decide that NF memory represents the physical memory, then we can use it to represent the % Allocated chart instead of the virtual mem.

Does make sense ?

To be honest I've never really understood what "% Allocated" means, not sure what allocation it's referring to. In all cases in my experience the physical memory in RAM is what you would expect to report and is the useful value, as @phupe describes, since it more directly impacts the success of your tasks and your workflow configuration. Whatever solutions you all come to, I hope it includes making this a more prominent part of the report

I see your point. The main issue is that, it's not defined what kind of memory the directory memory should specify: virtual - or - physical.

Indeed.

If we decide that NF memory represents the physical memory, then we can use it to represent the % Allocated chart instead of the virtual mem.

Does make sense ?

In my opinion, it really makes sense that memory represents the physical memory, or at least this should be the default. This means that the right memory option for all executors must identified. This would be the scenario1 (By the way, virtual_free seems to represent the amount of unused virtual memory according to http://www.univa.com/resources/files/univa_user_guide_univa__grid_engine_854.pdf, but I am not familiar with Grid Engine, other options are h_rss/s_rss for resident set size, or h_vmem/s_vmem for virtual memory size). In this scenario the only one chart for memory resource usage would be displayed with the peak_rss.

We could imagine a scenario2 where a memory.type directive is offered by nextflow. The value could be either rss or vmem, the option of all executors would selected according to this directive. Only one chart for the memory resource usage in the report would be displayed using either the vmem or rss depending on the type of memory set by the user. This scenario may take some time to be developed and I have absolutely no idea how many users would be interested to use it.

As a straightforward scenario0, I would display the 2 charts for both vmem and peak_rss as proposed in my pull request #985. By the way, I would suggest that, even for vmem, you use the peak value used by the process. Only peak values seem interested to me. I think that only 2 lines should be modified as follows in the file ReportSummary.groovy:

mappers.mem = { TraceRecord record -> record.get('peak_vmem') as Double }

final vmem = record.get('peak_vmem') as Double

I think the best solution would be scenario 1.

sounds good!
Thanks

I have tested the following release: 845e6d1 [origin/master] [release 19.01.0-edge]

That's great you have added the RSS in the timeline html report.

However, I have still noticed some issues:

  • The memory reported in the timeline is in binary unit (power of 1024) while in the report it is in decimal unit (power of 1000)
  • The memory report in binary unit is 10 times bigger as it should be. For example, the raw value in the html report says that the peak_rss was 201965568 for task OneCpu200MBRAM. It corresponds to 201965568/1024/1024 = 192.6094 MiB, while in the timeline, the value is 1926 MB. Same thing hold for vmem for the same task. For the task OneCpu2GBRAM, the vmem value in the timeline is correct in binary unit while for peak_rss it is written 19GB. The is the same problem in the trace.csv file. See attached files below.

To reproduce the issue, use the following NF script:

#!/usr/bin/env nextflow

process OneCpu2GBRAM {

    cpus 1

    """
    stress --vm 1 --vm-bytes 2000000000 -t 120
    """
}

process OneCpu200MBRAM {

    cpus 1

    """
    stress --vm 1 --vm-bytes 200000000 -t 120
    """
}

trace.csv.txt
report.html.txt
timeline.html.txt

Umm, I'm not able to replicate this. Using your script I'm getting these numbers

task_id  hash       native_id  name            status     exit  submit                   duration  realtime  %cpu    rss       peak_rss  peak_vmem  vmem      rchar  wchar  read_bytes  write_bytes
2        b1/f21e08  13597      OneCpu2GBRAM    COMPLETED  0     2019-01-14 11:22:28.266  2m 3s     2m        102.2%  1.9 GB    1.9 GB    1.9 GB     1.9 GB    39 KB  329 B  284 KB      0
1        a3/ecbc0c  13596      OneCpu200MBRAM  COMPLETED  0     2019-01-14 11:22:28.256  2m 3s     2m        101.8%  194.3 MB  194.5 MB  220.8 MB   220.8 MB  39 KB  332 B  96 KB       0

Once executed, could you change in the work dir of one task and run NXF_DEBUG=1 bash .command.run and attach here the output (including the file .command.trace)?

find attached the requested information.


>> cat .command.trace
nextflow.trace/v2
realtime=120062
%cpu=1000
rchar=53284
wchar=324
syscr=120
syscw=14
read_bytes=36864
write_bytes=0
%mem=1
vmem=323108
rss=194096
peak_vmem=323204
peak_rss=197224

output-debug.txt

This was the 200 MB process, right? The peak_rss=197224 looks fine.

fyi, I reproduce the same problem on my side.

This was the 200 MB process, right? The peak_rss=197224 looks fine.

sorry for not mentioning it, but Yes, it was the 200MB process.

therefore where's the problem? in the timeline/html report files? In my case are fine (except the power of 10 vs power of 2 problem).

In the timeline html, I have : OneCpu200MBRAM 2m 1s / 3156 MB (1926 MB)

but I should have instead: OneCpu200MBRAM 2m 1s / 315.6 MB (192.6 MB)`

The values are 10 times bigger as they should be.

Also for the run reporting peak_rss=197224 in the trace file ?

I have the same problem with version 18.10.1, I obtained OneCpu200MBRAM 2m 14s / 3155 MB instead of 315.5 MB.

I removed the $HOME/.nextflow dir and restarted with 19.01.0-edge but still the same problem.

Could this be related to java? Here is what I have:

>> java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
Was this page helpful?
0 / 5 - 0 ratings