Quite a lot of errors now result in an error_code metric tag value of 1000, i.e. the most generic catch-all error code we have. In a recent sample of production errors, here are the aggregated counts of errors with 1000 for error_code:
| Error string | Count |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |------- |
| connection error: PROTOCOL_ERROR | 4 |
| EOF | 112 |
| ~http: ContentLength=??? with Body length~ ??? | 2 |
| ~http: Request.ContentLength=??? with nil~ Body | 586 |
| http2: server sent GOAWAY and closed the connection; LastStreamID=???, ErrCode=NO_ERROR, debug="" | 116 |
| ~http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error~ | 3 |
| net/http: HTTP/1.x transport connection broken: malformed HTTP response "???" | 1 |
| net/http: HTTP/1.x transport connection broken: malformed HTTP status code "to" | 1 |
| net/http: request canceled | 475 |
| net/http: request canceled while waiting for connection | 169 |
| stream error: stream ID ???; CANCEL | 3 |
| stream error: stream ID ???; PROTOCOL_ERROR | 43 |
| stream error: stream ID ???; PROTOCOL_ERROR; invalid header field value "???" | 29 |
| tls: first record does not look like a TLS handshake | 1 |
| unexpected EOF | 70 |
| x509: cannot validate certificate for ??? because it doesn't contain any IP SANs | 2 |
| x509: certificate has expired or is not yet valid | 3 |
| x509: certificate is valid for ???, not ??? | 13 |
| x509: certificate signed by unknown authority | 61 |
The http: Request.ContentLength=??? with nil Body error is better described in https://github.com/loadimpact/k6/issues/1094, while the GetBody-mentioning error is already fixed in master by https://github.com/loadimpact/k6/pull/1093.
Also, it might be a good idea to rename the current errCode type and the constants, as discussed in https://github.com/loadimpact/k6/pull/1102#discussion_r310889060
Slightly connected to this, but we should see if we properly detect the too many open files error. It deserves its own dedicated error code, and a message pointing users to instructions how to increase their OS limits.
Something else strange is that the connection reset by peer error message includes the local and remote IPs and port numbers, something that should be overwritten by this code, but for some reason isn't:
https://github.com/loadimpact/k6/blob/a2077dd22a86662ef3cd699bc57b0048136e51ba/lib/netext/httpext/error_codes.go#L137-L138
We don't handle error messages and error codes for request timeouts properly. Consider the following script:
import http from "k6/http";
export default function () {
let resp = http.get("https://httpbin.org/delay/5", { timeout: 3000 });
console.log(resp.error);
console.log(resp.error_code);
}
with k6 v0.26.1, we'd get something like this:
WARN[0003] Request Failed error="Get https://httpbin.org/delay/5: context deadline exceeded"
INFO[0003] context deadline exceeded
INFO[0003] 1000
and with k6 v0.26.0:
WARN[0003] Request Failed error="Get https://httpbin.org/delay/5: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
INFO[0003] net/http: request canceled
INFO[0003] 1000
The difference in error message is probably caused by https://github.com/loadimpact/k6/pull/1261, but in both cases we don't handle the error code properly.
Most helpful comment
Slightly connected to this, but we should see if we properly detect the
too many open fileserror. It deserves its own dedicated error code, and a message pointing users to instructions how to increase their OS limits.