K6: Track and improve k6 memory usage and performance

Created on 20 Sep 2019  路  18Comments  路  Source: loadimpact/k6

We currently don't have a good idea how changes in k6 affects its performance. In particular, we knew that core-js used a lot of memory (https://github.com/loadimpact/k6/issues/1036#issue-450791743), but we never systematically tracked that, so we lack any baseline to compare against if we ever want to update our currently bundled version. I did some very unscientific measurements with /usr/bin/time on my laptop, and I've shared the formatted results below. I used a lot of current and previous k6 versions to run a script with 200 VUs executing a million iterations of a default function that just makes a simple http.get() operation against a localhost Go server. Here are the results:

| k6 version | Used Memory | RPS [1] | Times | Inputs and pagefaults |
|------------|-------------|-----|-------|-----------------------|
| 0.16.0 | 271700k | 21739 | 0:47.80elapsed - 91.32user 21.64system (236%CPU) | 0major+70139minor, 0inputs+0outputs
| 0.17.0 | 274496k | 18185 | 0:57.17elapsed - 90.21user 23.45system (198%CPU) | 0major+72619minor, 0inputs+0outputs
| 0.17.1 | 272884k | 17857 | 0:58.07elapsed - 88.25user 24.53system (194%CPU) | 0major+72161minor, 0inputs+0outputs
| 0.17.2 | 914512k [2] | 17241 | 1:04.02elapsed - 104.05user 26.49system (203%CPU) | 0major+314970minor, 0inputs+0outputs
| 0.18.0 | 905072k | 15765 | 1:08.58elapsed - 109.52user 31.23system (205%CPU) | 0major+305646minor, 0inputs+0outputs
| 0.18.2 | 904848k | 15745 | 1:08.75elapsed - 110.31user 31.64system (206%CPU) | 0major+307379minor, 0inputs+0outputs
| 0.19.0 | 987608k | 14812 | 1:12.95elapsed - 118.03user 35.35system (210%CPU) | 0major+339240minor, 0inputs+0outputs
| 0.20.0 | 1011588k | 14419 | 1:15.08elapsed - 122.31user 33.90system (208%CPU) | 103major+339912minor, 40232inputs+0outputs
| 0.21.0 | 1006776k | 14170 | 1:16.20elapsed - 123.62user 33.52system (206%CPU) | 121major+335868minor, 42240inputs+0outputs
| 0.21.1 | 988204k | 14026 | 1:17.11elapsed - 124.65user 33.28system (204%CPU) | 115major+334937minor, 41656inputs+0outputs
| 0.22.0 | 1915272k [3] | 14910 | 1:17.75elapsed - 128.98user 33.49system (208%CPU) | 0major+616641minor, 0inputs+0outputs
| 0.22.1 | 1928812k | 14949 | 1:18.53elapsed - 129.59user 33.54system (207%CPU) | 114major+622426minor, 29792inputs+0outputs
| 0.23.0 | 1940520k | 14427 | 1:20.25elapsed - 131.79user 33.97system (206%CPU) | 115major+627730minor, 29968inputs+0outputs
| 0.23.1 | 1950936k | 15220 | 1:17.29elapsed - 128.74user 33.22system (209%CPU) | 131major+615768minor, 34048inputs+0outputs
| 0.24.0 | 1904352k | 14202 | 1:21.48elapsed - 133.21user 33.30system (204%CPU) | 139major+570873minor, 35696inputs+0outputs
| 0.25.0 | 1096308k [4] | 14050 | 1:16.88elapsed - 131.29user 32.66system (213%CPU) | 145major+341927minor, 37144inputs+0outputs
| 0.25.1 | 1129756k | 14882 | 1:12.81elapsed - 127.30user 31.59system (218%CPU) | 10major+354609minor, 1832inputs+0outputs
| latest master | 1115896k | 14760 | 1:13.01elapsed - 122.58user 31.42system (210%CPU) | 0major+343543minor, 0inputs+0outputs
| new executors | 1168584k | 14809 | 1:12.07elapsed - 132.81user 34.58system (232%CPU) | 0major+349069minor, 0inputs+0outputs
| new executors, no core-js [5] | 346344k | 16683 | 1:01.75elapsed - 101.36user 29.45system (211%CPU) | 0major+63926minor, 0inputs+0outputs
| new executors, no core-js, no babel [5] | 337876k | 17737 | 0:58.24elapsed - 101.83user 29.48system (225%CPU) | 0major+62637minor, 0inputs+0outputs
| new executors, core-js updated from 2.5.1 to 3.2.1 | 1295056k | 15331 | 1:10.01elapsed - 129.55user 35.52system (235%CPU) | 0major+385981minor, 0inputs+0outputs

[1] Not actually very representative, because these tests were run against a localhost server on my laptop, but still useful as a measure to track the overhead that the rest of k6 introduces. As a comparison, I created this simple script that just hammers a URL, and I got out ~42000 requests per second on my machine out of it.

I'm currently willing to chalk up most of the difference between this simple script and k6 to the overhead of JS execution, since when I also replaced the JS VUs with a lib.MiniRunnerVU so that each iteration didn't have any JS execution and just made a single request to the localhost server, I also got out ~40000 RPS out of k6.

And to repeat, none of that is representative or matters much in the real world, since when I tested a server on the local network, both k6 and the simple Go script achieved the same result of ~7000 RPS, likely because then the test was IO and not CPU bound. Probably the only use for tracking that metric is to monitor if it suddenly drops, since that would likely mean we introduced something that worsened the performance significantly.

[2] This memory increase was caused by the addition of core-js in every VU.

[3] Again, this was caused by a core-js and babel update

[4] The huge drop was caused by https://github.com/loadimpact/k6/pull/1038 :tada: but again, core-js was responsible

[5] just commented out these lines, but useful to know, because we plan to allow the possibility of users choosing not to use core-js: https://github.com/loadimpact/k6/issues/1049#issuecomment-533025605

So yeah, in summary, we should have some sort of a CI task that monitors the memory consumption and performance of k6, so that we can spot sudden changes in either. Whether that's on every PR or once daily or something else needs more discussion and investigation. And we probably should avoid CircleCI and do this either in Azure Pipelines or GitHub Actions, whichever turns out to be better (https://github.com/loadimpact/k6/issues/959). For the performance part, if we track it through the iterations per second metric, we'd need https://github.com/loadimpact/k6/issues/355, which is currently in progress.

ci high prio performance

Most helpful comment

I decided to throw my test webpack packager for k6 for a test.
I did test both the simple script above (custom_main.js), the same script, but using es6 import/export syntax(main.js) and the webpack transform of it(dist/app.bundle.js) and here are the results:

$ /usr/bin/time k6 run --vus 3500 --duration 60s custom_main.js
http_reqs..................: 47070  784.498669/s
147.14user 21.68system 2:09.00elapsed 130%CPU (0avgtext+0avgdata 9650724maxresident)k
0inputs+0outputs (0major+3015214minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base custom_main.js
http_reqs..................: 47431  790.515908/s
50.68user 23.59system 1:01.78elapsed 120%CPU (0avgtext+0avgdata 1333832maxresident)k
168inputs+0outputs (4major+390294minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s main.js
http_reqs..................: 48057  800.949499/s
151.07user 20.54system 2:12.46elapsed 129%CPU (0avgtext+0avgdata 9279780maxresident)k
1416inputs+0outputs (2major+2909761minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s build/app.bundle.js
http_reqs..................: 45230  753.832637/s
142.08user 20.62system 2:11.20elapsed 124%CPU (0avgtext+0avgdata 9676844maxresident)k
63600inputs+0outputs (517major+3031793minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base build/app.bundle.js
http_reqs..................: 45808  763.461697/s
51.19user 17.97system 1:01.87elapsed 111%CPU (0avgtext+0avgdata 1432368maxresident)k
0inputs+0outputs (0major+417146minor)pagefaults 0swaps

Table:

| test script | time elapsed | memory | rps |
| -- | -- | -- | -- |
| custom_main.js + corejs & babel | 2:09.00 | 9650724 | 784 |
| custom_main.js + base | 1:01.78 | 1333832 | 790 |
| main.js + corejs & babel | 2:12.46 | 9279780 | 800 |
| main.js + webpack + corejs & babel | 2:11.20 | 9676844 | 753 |
| main.js + webpack base | 1:01.87 | 1432368 | 763 |

All the number fluctuate but they generally stay in the same ballpark and the RPS is apparently not affected in this particular case.

For me this confirms that the webpack setup I propose is now viable way to make k6 scripts much more performant when using --compatibility-mode=base

All 18 comments

Do you have any idea why the RPS numbers are dropping from 21k to 15-16k over time?

Btw, I just tested (with k6 0.25.1) if starting new repetitions involved any kind of large overhead. I compared a script doing a single http request per iteration with one doing a hundred:

export function default() {
  for (var i=0; i<100; i++) {
    http.get(URL);
  }
};

At least for k6 0.25.1, the difference doesn't seem to be big. Using the above script that loops through a hundred requests seems to result in 5-10% higher RPS numbers.

No, I currently don't have a good idea why the iterations per second (and thus, the RPS) count dropped from 21k to 15-16k. The biggest drop was between 0.16.0 and 0.17.0, and I tried to spot something suspicious in the diff, but there are just too many changes to tell that way, so we'd have to do it commit by commit... :disappointed:

I initially thought it might be some scheduling change, since it turns out that k6 v0.16.0 was the last k6 version that still used iterations per VU. In v0.17.0 the iterations were switched to be shared across all VUs. But I tried the new per-vu-iterations executor from #1007, which I know doesn't have any unnecessary synchronization overhead, and I still got only 16.5-17.9k iterations per second (when I disabled core-js), so at best it's a slight improvement.

Regarding multiple requests per iteration, I also haven't tested that, I planned to explore it today. But I think http.batch() should have a better performance than a for loop with http.get().

Just to confirm @ragnarlonn's results, tweaking the script in various ways (for loops inside of the default function, http.batch(), reducing the number of VUs, bigger metricSamplesBufferSize, enabling discardResponseBodies) produces 5-15% improvements at best.

@MStoykov, I ran the same benchmark with an updated core-js library. I added it at the bottom of the table, as you can see, there's a ~600KB increase in memory usage per VU :disappointed:

For posterity/comparison, here's an issue about a previous performance investigation: https://github.com/loadimpact/k6/issues/540

Just to add more statistics: I did a test with 1K VUs for 30 minutes and it needed 30 Gb to run. Otherwise, I have " dial tcp XX.XXX.XXX.XX:80: socket: too many open files "

@hassanbakerrts, can you give some more details about the script you ran? What was the size of the JS code you were executing (either in KB or lines of code)? Did you import some browserified library? Or open a big data file? I'm asking because 1k VUs taking 30GB of RAM means each VUs was ~30MB - very different from what we've observed with simple scripts. :confused: Also, which k6 version and OS did you use, and did you run the k6 docker container or a native k6 executable?

Regarding too many open files, you can increase that limit at the OS level: https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux

@na-- @hassanbakerrts Could it be results storage that take up the extra space? At least my tests have been 30-second ones, but if you run it for 30 minutes I guess it will collect a lot more results data.

@ragnarlonn, it could be, though usually that becomes a problem for longer running tests. Still, it mostly depends on the number of HTTP requests, so with 1k VUs and little sleep, it might be the cause... Here are the related issues: https://github.com/loadimpact/k6/issues/763 and https://github.com/loadimpact/k6/issues/1068

@na-- @ragnarlonn I imported lodash during setup but not when VUs were running. My VUs code was to ask for four GET requests. There is a sleep(random(1,4)) where random returns seconds for each VU. We had to run it on ECU (AWS) cluster, I think the OS there is linux but not sure which one (those things were handled by DevOps team). I used docker image to run k6.

Ah, given that each k6 VU is its own separate runtime, just importing lodash would add significant memory overhead to all VUs, even if you only use in the setup() function. Since you imported in the global scope (also called init context in k6), each VU would load and evaluate that JS code, even if it doesn't subsequently use it. And unfortunately, you currently can't import lodash only in the setup function :disappointed:

@na-- That sounds like a perhaps interesting feature to add: being able to import & use things in the setup() context only, which means it will not affect memory usage during the test. Would it be hard to implement? (although perhaps it would be necessary to stagger the execution of the setup function for all the VUs, as otherwise peak memory usage would not change. It would just not be constantly high anymore like it is if lodash is present all the time in each VU context)

@ragnarlonn, I though about this some yesterday, but there a big problem with having different imports/dependencies for setup() or anything else - cloud/distributed execution.

As you probably recall, one of the first things k6 cloud script.js does when you run it on your machine, is to execute the init context of your script once. This is done in order to get the exported script options, but also to build a .tar archive of your script and all of its dependencies (similar to k6 archive script.js). Crucially, it doesn't and can't execute setup() at that point - it's executed on a single machine after we start the test in our cloud. So, yes, it would be very hard to implement different imports for setup(), since in order to do it, we'd have to figure out some completely different way of defining setup than the current user-friendly export function setup() { /* ... */ }, and that way would have to allow explicit definition of dependencies without actually executing them.

Unless we can think of a clever and easy solution, I think we shouldn't spend a lot of time on this compared to the other memory-saving improvements we want to add to k6 - HDR histograms, reduction of core.js, goja improvements, shared read-only data files, streaming reading of big files, etc. The reason for that is that this seems like it won't deliver very big benefits, but would require substantial changes.

After all, I'd argue that most people who import hefty JS libraries don't use them only in their setup. And also, I forgot to mention yesterday, but I tried to run a script that imports lodash (this version) and running it with k6 run --paused --vus 1000 script.js takes only around 4GB of RAM. For comparison, running the exact same script without importing lodash takes only ~2.4GB... So the majority of the memory usage in the @hassanbakerrts's case wasn't caused by that library...

@na-- Oh yeah, I didn't think about the distributed execution...And I can't think of a simple and user-friendly way to do this either. The only thing that comes to mind is adding some special import statement to init() that allows you to specify which imports you want available in other contexts (i.e. export function init() {聽import setup::lodash.js }), which may not be very user-friendly/intuitive. Sounds like a wise decision to focus on the other improvements you mention.

The problem with having custom import/exports, or custom syntax of any kind, is that would be running afoul of ECMAScript parsers (specifically in our case, Babel). We could do it with an exported method that returns a simple JS string array with extra files that need to be included in any archive bundle, or have a property in the exported options, but that again doesn't solve all of the issues of how to then import one of these files only in the VU that's executing setup(). To be clear, none of the issues are unsolvable if we spend enough time and accept some sub-optimal UX (or rather, developer experience) at times, I just don't think this should be high on the optimization priority list right now.

Now that https://github.com/loadimpact/k6/pull/1206 was merged in master, and somewhat prompted by https://community.k6.io/t/k6-cpu-use-during-a-test/303, I decided to check how performance differed for a simple script with the two compatibility modes. Here's the ultra-simple script I used:

var http = require("k6/http");
var sleep = require("k6").sleep;

exports.default = function () {
    http.get("https://httpbin.test.loadimpact.com/get");
    sleep(Math.random() * 2 + 3);
}

Running it with /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=extended produced:

100.54user 10.45system 1:53.56elapsed 97%CPU (0avgtext+0avgdata 9113532maxresident)k
0inputs+0outputs (0major+2869751minor)pagefaults 0swaps

and /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base produced:

21.59user 6.59system 1:01.27elapsed 46%CPU (0avgtext+0avgdata 1147084maxresident)k
0inputs+0outputs (0major+335435minor)pagefaults 0swaps

So, roughly an 8-fold memory consumption improvement (from ~9.11GB to ~1.14GB), much less CPU usage and a lot faster initialization.

The initialization time was actually the most noticeable difference, if you have RAM to spare. Running the same commands again, but with --paused, and immediately aborting k6 as soon as the status switched from init to paused (i.e. as soon as all VUs were initialized) produced very stark results. For /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=extended --paused:

72.90user 5.37system 0:54.90elapsed 142%CPU (0avgtext+0avgdata 7855780maxresident)k
0inputs+0outputs (0major+2502677minor)pagefaults 0swaps

And for /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base --paused:

1.80user 0.55system 0:02.28elapsed 103%CPU (0avgtext+0avgdata 483576maxresident)k
0inputs+0outputs (0major+147906minor)pagefaults 0swaps

Almost immediate initialization of 3500 distinct JS runtimes is quite impressive :smile: But yeah, for the context of this particular issue, we should have CI tasks that monitor both the extended (i.e. Babel + CoreJS) and base performance, since they're completely different beasts.

I decided to throw my test webpack packager for k6 for a test.
I did test both the simple script above (custom_main.js), the same script, but using es6 import/export syntax(main.js) and the webpack transform of it(dist/app.bundle.js) and here are the results:

$ /usr/bin/time k6 run --vus 3500 --duration 60s custom_main.js
http_reqs..................: 47070  784.498669/s
147.14user 21.68system 2:09.00elapsed 130%CPU (0avgtext+0avgdata 9650724maxresident)k
0inputs+0outputs (0major+3015214minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base custom_main.js
http_reqs..................: 47431  790.515908/s
50.68user 23.59system 1:01.78elapsed 120%CPU (0avgtext+0avgdata 1333832maxresident)k
168inputs+0outputs (4major+390294minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s main.js
http_reqs..................: 48057  800.949499/s
151.07user 20.54system 2:12.46elapsed 129%CPU (0avgtext+0avgdata 9279780maxresident)k
1416inputs+0outputs (2major+2909761minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s build/app.bundle.js
http_reqs..................: 45230  753.832637/s
142.08user 20.62system 2:11.20elapsed 124%CPU (0avgtext+0avgdata 9676844maxresident)k
63600inputs+0outputs (517major+3031793minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base build/app.bundle.js
http_reqs..................: 45808  763.461697/s
51.19user 17.97system 1:01.87elapsed 111%CPU (0avgtext+0avgdata 1432368maxresident)k
0inputs+0outputs (0major+417146minor)pagefaults 0swaps

Table:

| test script | time elapsed | memory | rps |
| -- | -- | -- | -- |
| custom_main.js + corejs & babel | 2:09.00 | 9650724 | 784 |
| custom_main.js + base | 1:01.78 | 1333832 | 790 |
| main.js + corejs & babel | 2:12.46 | 9279780 | 800 |
| main.js + webpack + corejs & babel | 2:11.20 | 9676844 | 753 |
| main.js + webpack base | 1:01.87 | 1432368 | 763 |

All the number fluctuate but they generally stay in the same ballpark and the RPS is apparently not affected in this particular case.

For me this confirms that the webpack setup I propose is now viable way to make k6 scripts much more performant when using --compatibility-mode=base

I tested the latest goja with some es6 support with and without corejs.
There is ... an increase in the binary which is definitely because of goja which also increased by more than 10k lines

29836   k6-noes6
31640   k6-es6
31552   k6-es6-nocorejs

The tested script was pretty simple:

import http from "k6/http";
import { sleep } from "k6";

export default function() {
    http.get("https://httpbin.test.loadimpact.com/get");
    sleep(Math.random() * 2 + 3);
}

With --vus 3500 --duration 60s
results:

| k6 | run time | kb memory per VU | CPU used | rps |
| -- | -- | -- | -- | -- |
| k6-noes6 | 1:36.21 | 2871 | 270% | 759 |
|k6-es6| 1:26.18|2673|232%|743|
|k6-es6-nocorejs| 1:07.37| 531|95%|734|

The numbers move somewhat but are around those magnitudes. As the script always takes 1m05.2s (the sleep is apparently pretty unrandom ;) ), it looks like the no corejs version takes only 2 seconds to initialize while the others take 20 or 30 seconds respectively, which for 3500 VUs isn't bad, but isn't as instant.
The memory difference of 5x is probably the most significant benefit.
The CPU difference is probably (I haven't had time to extensively test this) because of the faster initialization

Was this page helpful?
0 / 5 - 0 ratings