Hi:
I found that the value of latencies.proxy proviced by plugin log-serializer is much bigger than ngx.var.upstream_response_time,because KONG_WAITING_TIME is count cost of read request body in(when turn lua_need_request_body off).
latencies = { kong = (ctx.KONG_ACCESS_TIME or 0) + (ctx.KONG_RECEIVE_TIME or 0) + (ctx.KONG_REWRITE_TIME or 0) + (ctx.KONG_BALANCER_TIME or 0), proxy = ctx.KONG_WAITING_TIME or -1, request = var.request_time * 1000, .... }
I have reviewed the commit log of https://github.com/Kong/kong/blob/master/kong/plugins/log-serializers/basic.lua, found this was change from ngx.var.upstream_response_time to KONG_WAITING_TIME. I don't know the reason. The exactly upstream should not including the cost of read request body, i think.
pls help.
The KONG_WAITING_TIME is the time between end of access phase and the begin of header_filter phase (Code here). The access phase may have finished before the request body's fully transmitted when the request body is big. After the request body's transmitted, the balancer phase started, then the header_filter phase.
We can easily re-produce this by uploading a file to kong:
// set Kong log level to debug, enable Prometheus plugin to gain more logs
// setup routes, upstreams, ...
dd if=/dev/zero of=assets/10m.txt count=10240 bs=1024
curl -X POST http://localhost/post -F "file=@assets/10m.txt" -H "Content-Type: multipart/form-data" -i
Corresponding response, the Upstream-Latency is like 5 seconds.
HTTP/1.1 200 OK
Content-Type: text/plain; charset=utf-8
Content-Length: 8
Connection: keep-alive
Date: Wed, 10 Jul 2019 06:41:14 GMT
X-Kong-Upstream-Status: 200
X-Kong-Upstream-Latency: 5174
X-Kong-Proxy-Latency: 2
Check Kong log:
2019/07/10 06:41:08 [debug] 22125#0: *6420 [lua] base_plugin.lua:22: rewrite(): executing plugin "prometheus": rewrite
2019/07/10 06:41:08 [debug] 22125#0: *6420 [lua] base_plugin.lua:26: executing plugin "prometheus": access <--- timer started
2019/07/10 06:41:08 [warn] 22125#0: *6420 a client request body is buffered to a temporary file /opt/apps/kong/client_body_temp/0000000003 ...
<-- 6 seconds for transmission
2019/07/10 06:41:14 [debug] 22125#0: *6420 [lua] init.lua:608: balancer(): setting address (try 1): 10.6.41.164:31009 <--- maybe it's better to start timer here
2019/07/10 06:41:14 [debug] 22125#0: *6420 [lua] base_plugin.lua:30: header_filter(): executing plugin "prometheus": header_filter <--- timer stopped
Maybe Kong should not start upstream latency timer at the end the access phase(the begining the balancer phase may be a choice)?
@zeeshen Nice catch, your analysis seems correct to me. If I recall correctly, the KONG_WAITING_TIME variable was added before the introduction of the balancer_by_lua phase in Kong, which explains why it doesn't account for it. When we introduced the balancer phase, we did not update the KONG_WAITING_TIME variable to remain accurate.
@xiaochai The difference between KONG_WAITING_TIME and the $upstream_response_time NGINX variable is that the former intends to track the time Kong spends waiting to receive the response headers from upstream; while the latter ($upstream_response_time) tracks the time spent _receiving_ the upstream response(s) (from NGINX connecting to the upstream, until the last byte of the response payload has been received). That said, it seems like $upstream_header_time is a viable candidate, as it tracks the time spend by NGINX receiving the headers from upstream.
A PR fixing this would be most welcome! Of course, readers of $upstream_header_time should keep in mind that it may return a comma-separated list of values, and be mindful of its values' resolution.
This should now be fixed on next branch, so I am closing this. Please reopen if you find any bugs.
Most helpful comment
The
KONG_WAITING_TIMEis the time between end ofaccessphase and the begin ofheader_filterphase (Code here). The access phase may have finished before the request body's fully transmitted when the request body is big. After the request body's transmitted, thebalancerphase started, then theheader_filterphase.We can easily re-produce this by uploading a file to kong:
Corresponding response, the
Upstream-Latencyis like 5 seconds.Check Kong log:
2019/07/10 06:41:08 [debug] 22125#0: *6420 [lua] base_plugin.lua:22: rewrite(): executing plugin "prometheus": rewrite 2019/07/10 06:41:08 [debug] 22125#0: *6420 [lua] base_plugin.lua:26: executing plugin "prometheus": access <--- timer started 2019/07/10 06:41:08 [warn] 22125#0: *6420 a client request body is buffered to a temporary file /opt/apps/kong/client_body_temp/0000000003 ... <-- 6 seconds for transmission 2019/07/10 06:41:14 [debug] 22125#0: *6420 [lua] init.lua:608: balancer(): setting address (try 1): 10.6.41.164:31009 <--- maybe it's better to start timer here 2019/07/10 06:41:14 [debug] 22125#0: *6420 [lua] base_plugin.lua:30: header_filter(): executing plugin "prometheus": header_filter <--- timer stoppedMaybe Kong should not start
upstream latencytimer at the end theaccessphase(the begining thebalancerphase may be a choice)?