K6: possible bug with connection re-use in Kubernetes

Created on 2 Dec 2020 · 3Comments · Source: loadimpact/k6

Using the loadimpact/k6:0.28.0 Docker image in Kubernetes, it seems like the options to disable TCP connection re-use are having no effect. Before I start packet tracing the container or tearing up my cluster, does anyone else have this issue?

All my k6 requests run from Kubernetes are going to the same load balancer, as if they are using the same connection or source port number, but when I run the container locally on my desktop machine's Docker daemon, the requests are hashed evenly across multiple servers, which is the behavior I would expect from connection re-use.

My basic load test:

import http from 'k6/http';
import { Rate, sleep } from 'k6/metrics';

export let successRate = new Rate('http_2xx_responses');

export let options = {
  discardResponseBodies: true,
  noConnectionReuse: true,
  noVUConnectionReuse: true,
  stages: [
    { duration: '3m',  target: 30 },
    { duration: '5m',  target: 100 },
  ],
  hosts: {
    [__ENV.HOST]: __ENV.ADDR,
  },
};

export default function() {
  let resp = http.get('https://' + __ENV.HOST + '/');
  if (resp.status === 200) {
    successRate.add(1);
  } else {
    successRate.add(0);
  }
}

locally running docker with:

docker run -it --rm --env HOST="example.com" --env ADDR="0.0.0.0" -v $(pwd):/load loadimpact/k6:0.28.0 run --no-usage-report --log-output stderr /load/k6_loadtest.js

and the same environment variables and options in Kubernetes.

output is:

  execution: local
     script: newtest.js
     output: -
  scenarios: (100.00%) 1 scenario, 100 max VUs, 8m30s max duration (incl. graceful stop):
           * default: Up to 100 looping VUs for 8m0s over 2 stages (gracefulRampDown: 30s, gracefulStop: 30s)
    data_received..............: 190 MB  394 kB/s
    data_sent..................: 25 MB   53 kB/s
    http_2xx_responses.........: 100.00% ✓ 38448 ✗ 0    
    http_req_blocked...........: avg=7.45ms   min=2.56ms  med=6.88ms   max=113.45ms p(90)=10.57ms  p(95)=12.18ms 
    http_req_connecting........: avg=203.09µs min=87.56µs med=187.64µs max=98.37ms  p(90)=224.7µs  p(95)=241.58µs
    http_req_duration..........: avg=566.63ms min=17.45ms med=578.08ms max=1.45s    p(90)=764.01ms p(95)=799.53ms
    http_req_receiving.........: avg=68.02µs  min=16.86µs med=56.94µs  max=85.21ms  p(90)=104.45µs p(95)=126.99µs
    http_req_sending...........: avg=192.94µs min=68.37µs med=184.56µs max=83.71ms  p(90)=231.35µs p(95)=256.52µs
    http_req_tls_handshaking...: avg=7.05ms   min=2.31ms  med=6.51ms   max=113.08ms p(90)=10.17ms  p(95)=11.79ms 
    http_req_waiting...........: avg=566.37ms min=17.17ms med=577.83ms max=1.45s    p(90)=763.7ms  p(95)=799.26ms
    http_reqs..................: 38448   80.01982/s
    iteration_duration.........: avg=574.13ms min=22.62ms med=585.51ms max=1.47s    p(90)=771.19ms p(95)=806.94ms
    iterations.................: 38448   80.01982/s
    vus........................: 99      min=1   max=99 
    vus_max....................: 100     min=100 max=100

bug

Source

sdhoward

👍1

Most helpful comment

I was able to identify a routing rule in my cluster causing my issue. In the process I ran tcpdump and I did see new connections being created for each request, so I am satisfied #732 is resolved as well.

Thanks for your troubleshooting help!

sdhoward on 3 Dec 2020

❤1 👍1

All 3 comments

Can you try running the same script with loadimpact/k6:0.29.0 and even the loadimpact/k6:master docker image (built from the yet unreleased master git branch) and see if the problem persists?

As you can see from the v0.29.0 release notes, we made a lot of changes in how DNS resolution in k6 works, and if you hadn't specified options.hosts in your test, I'd have been sure that the problem was caused by https://github.com/loadimpact/k6/issues/726. With hosts I doubt that is the cause, but we should rule out as many things as possible. And I'm asking you to also try the yet unreleased master docker image (that will become v0.31.0 early next year), because we recently realized our HTTP2 dependency is quite old and updated it (https://github.com/loadimpact/k6/pull/1734).

If using the newer k6 versions doesn't resolve the issue, we probably should reopen and investigate https://github.com/loadimpact/k6/issues/732 more thoroughly.