Using the loadimpact/k6:0.28.0 Docker image in Kubernetes, it seems like the options to disable TCP connection re-use are having no effect. Before I start packet tracing the container or tearing up my cluster, does anyone else have this issue?
All my k6 requests run from Kubernetes are going to the same load balancer, as if they are using the same connection or source port number, but when I run the container locally on my desktop machine's Docker daemon, the requests are hashed evenly across multiple servers, which is the behavior I would expect from connection re-use.
My basic load test:
import http from 'k6/http';
import { Rate, sleep } from 'k6/metrics';
export let successRate = new Rate('http_2xx_responses');
export let options = {
discardResponseBodies: true,
noConnectionReuse: true,
noVUConnectionReuse: true,
stages: [
{ duration: '3m', target: 30 },
{ duration: '5m', target: 100 },
],
hosts: {
[__ENV.HOST]: __ENV.ADDR,
},
};
export default function() {
let resp = http.get('https://' + __ENV.HOST + '/');
if (resp.status === 200) {
successRate.add(1);
} else {
successRate.add(0);
}
}
locally running docker with:
docker run -it --rm --env HOST="example.com" --env ADDR="0.0.0.0" -v $(pwd):/load loadimpact/k6:0.28.0 run --no-usage-report --log-output stderr /load/k6_loadtest.js
and the same environment variables and options in Kubernetes.
output is:
execution: local
script: newtest.js
output: -
scenarios: (100.00%) 1 scenario, 100 max VUs, 8m30s max duration (incl. graceful stop):
* default: Up to 100 looping VUs for 8m0s over 2 stages (gracefulRampDown: 30s, gracefulStop: 30s)
data_received..............: 190 MB 394 kB/s
data_sent..................: 25 MB 53 kB/s
http_2xx_responses.........: 100.00% ✓ 38448 ✗ 0
http_req_blocked...........: avg=7.45ms min=2.56ms med=6.88ms max=113.45ms p(90)=10.57ms p(95)=12.18ms
http_req_connecting........: avg=203.09µs min=87.56µs med=187.64µs max=98.37ms p(90)=224.7µs p(95)=241.58µs
http_req_duration..........: avg=566.63ms min=17.45ms med=578.08ms max=1.45s p(90)=764.01ms p(95)=799.53ms
http_req_receiving.........: avg=68.02µs min=16.86µs med=56.94µs max=85.21ms p(90)=104.45µs p(95)=126.99µs
http_req_sending...........: avg=192.94µs min=68.37µs med=184.56µs max=83.71ms p(90)=231.35µs p(95)=256.52µs
http_req_tls_handshaking...: avg=7.05ms min=2.31ms med=6.51ms max=113.08ms p(90)=10.17ms p(95)=11.79ms
http_req_waiting...........: avg=566.37ms min=17.17ms med=577.83ms max=1.45s p(90)=763.7ms p(95)=799.26ms
http_reqs..................: 38448 80.01982/s
iteration_duration.........: avg=574.13ms min=22.62ms med=585.51ms max=1.47s p(90)=771.19ms p(95)=806.94ms
iterations.................: 38448 80.01982/s
vus........................: 99 min=1 max=99
vus_max....................: 100 min=100 max=100
Can you try running the same script with loadimpact/k6:0.29.0 and even the loadimpact/k6:master docker image (built from the yet unreleased master git branch) and see if the problem persists?
As you can see from the v0.29.0 release notes, we made a lot of changes in how DNS resolution in k6 works, and if you hadn't specified options.hosts in your test, I'd have been sure that the problem was caused by https://github.com/loadimpact/k6/issues/726. With hosts I doubt that is the cause, but we should rule out as many things as possible. And I'm asking you to also try the yet unreleased master docker image (that will become v0.31.0 early next year), because we recently realized our HTTP2 dependency is quite old and updated it (https://github.com/loadimpact/k6/pull/1734).
If using the newer k6 versions doesn't resolve the issue, we probably should reopen and investigate https://github.com/loadimpact/k6/issues/732 more thoroughly.
0.29.0 and master have the same behavior as I described.
Even if there is no dns lookup, the connection should be torn down and rebuilt as stated if the no-connection-reuse option is selected.
In #732, what is the definition of idle? I have no idling/sleeping in my VU.
My target server is capable of http/2 so that probably is what k6 is negotiating.
I was able to identify a routing rule in my cluster causing my issue. In the process I ran tcpdump and I did see new connections being created for each request, so I am satisfied #732 is resolved as well.
Thanks for your troubleshooting help!
Most helpful comment
I was able to identify a routing rule in my cluster causing my issue. In the process I ran
tcpdumpand I did see new connections being created for each request, so I am satisfied #732 is resolved as well.Thanks for your troubleshooting help!