K6: Optimize memory consumption of Trend Metrics

Created on 29 Jun 2019  Â·  7Comments  Â·  Source: loadimpact/k6

Running a test that's 12 hours long crashes the program around 6 hours in, the memory consumption is tremendous, it can eat 64gb RAM. While searching for a memory leak, I've noticed, that the engine stores all samples for Trend metrics in memory until the end of the test. I think this is the main thing that leads to such a high memory consumption. I propose refactoring of the TrendSink, so that we wouldn't need to keep all the samples.

The Avg, Med and percentiles can be calculated by using Moving percentile algorithm, it only needs the last sample, new sample and current value. The only problem is that we have to know before-hand what percentiles we want to calculate, but looking at the current code there are only two percentiles in the output, they are 90th and 95th percentile.

bug duplicate performance

All 7 comments

Yeah, we know about the Trend issues with long-running tests... My proposal to completely solve this was to refactor the TrendSink to use HdrHistograms (https://github.com/loadimpact/k6/issues/763), and/or for k6 to do smarter threshold and summary value calculations (https://github.com/loadimpact/k6/issues/764) to reduce their impact. Unfortunately, those tasks have gotten delayed a few times, because of other bugs or important features :disappointed:

As a potential workaround, there's currently a way to completely avoid using any TrendSinks in k6, but it only works if you're not interested in the end-of-test summary stats and you don't have any thresholds. This is done by running k6 with --no-thresholds --no-summary and is usually only useful when you use an external output like InfluxDB or Load Impact Insights.

Your proposal is great, it would fix the memory leak and won't cost you much data. The thing is, right now we have all the data in memory and the only things that we need from that data are averages and percentiles. Maybe we can use the Rolling Average algorithm to fix it now? And if we want to show more data in the summary, we will implement the Hdr Histogram for that.

The averages are easy to calculate without storing all the values, that has never been the issue. Percentiles on the other hand are much trickier... I remember researching some memory-efficient percentile algorithms when I originally found the issue (I was searching for a memory leak :sweat_smile: ), but I then concluded that the HdrHistogram would be both easier to implement and more flexible. Can you share what specific moving percentile algorithm you recommend?

I am talking about this algorithm: https://mjambon.com/2016-07-23-moving-percentile/

If xi<m_i−1 then
    m_i=m_{i−1}−δ/p
else if xi>mi−1 then
    m_i=m_{i−1}+δ/(1−p)
else xi=m_i−1 and keep the previous value:
    m_i=m_{i−1}

Here m_i is the current value of the percentile, p is the percentile itself and δ is a user chosen constant that expresses how fast the calculated percentile will converge to the real value, this constant matters only if you want to see realtime results, because in the end it converges to the real value anyway.

It seems much easier to implement than an Hdr Histogram, though not as flexible.

Ah, I remembered something else that precluded the use of such a moving percentile algorithm...To use it, we need to 1) know the specific percentiles we need to calculate from the start and 2) refactor the code quite a bit so that the various Trend sinks "know" from the start that they need to calculate these specific percentiles.

The first part, knowing the needed percentiles at the start of the test, is probably the harder one... There are currently two things that may require percentiles - the end-of-test summary stats and the thresholds.

The summary stats at the end of the test, as you've said, it's relatively easy - we know what we need. It's p(90) and p(95) by default, though that can be changed by the summaryTrendStats option, e.g. k6 run --summary-trend-stats="min,med,avg,max,p(99),p(99.9),p(99.99)" script.js. But in either case, we can easily know them at the start of the test.

The thresholds are the problem here... For example, if we have this k6 script:

import http from "k6/http";

export let options = {
  thresholds: {
    http_req_duration: ["p(95)<500"],
  }
};

export default function () {
  http.get("https://httpbin.org/");
}

k6 won't actually parse the expression p(95)<500 properly, in a way that we'll know that we need p(95). Instead, in the local k6 execution, we actually evaluate the thresholds by executing them in a restricted JS runtime that has the sink as a local variable and p() is just a simple helper function there: https://github.com/loadimpact/k6/blob/bdd417e8a4206c6bbbef36d7ce7bea48f34a0804/stats/thresholds.go#L33-L35

I assume that this was done as a quick fix to get thresholds working, since there was already a JS runtime in k6 anyway... But it prevents us from properly reasoning about what percentile values we'll need at the start of the test... :man_facepalming: There's a separate issue to fix this (https://github.com/loadimpact/k6/issues/961), but we can't use the simple moving percentiles algorithms before we do... So, the HdrHistogram seemed (and it still does) like the simpler solution we can just plug into the current k6 architecture (because with it we won't need to know the percentiles we require, among other benefits) and it will allow us to fix the thresholds architecture separately, at a later time.

Ah, that's what I was missing. I didn't take into account that you can specify any percentile in the thresholds.
Looks like HdrHistogram is the best solution then. Sorry for wasting your time and thank you for explanation, now the code makes more sense to me.

Was this page helpful?
0 / 5 - 0 ratings