Kong: 104: Connection reset by peer

Created on 13 Nov 2019  路  7Comments  路  Source: Kong/kong

Summary

Hi, i have encountered a problem.
My server route is: client -> k8s ingress -> kong -> upserver.
502 response is ofen occured when i request the server with approximately more than 1 minutes interval.
I have capture the TCP packets with tcpdump in kong pod and upstream server pod, meanwhile the TCP connection data with ss -antp, such as following:

  1. More than one minute after preview request, i first saved the TCP connection data in the two pod:
    kong pod:
/tmp # ss -ant | grep 10.107.24.14
ESTAB    0         0             10.244.0.134:44430        10.107.24.14:8000    
ESTAB    0         0             10.244.0.134:35304        10.107.24.14:8000    
ESTAB    0         0             10.244.0.134:49246        10.107.24.14:8000    
ESTAB    0         0             10.244.0.134:43078        10.107.24.14:8000 

upstream pod:

/tmp # ss -ant | grep EST | grep 10.244.0.134
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:35304              
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:43078              
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:49246              
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:44430

at this moment, ESTAB TCP is same between the two pods

  1. Then i start send one request to api, a 502 response occured.
    kong debug log:
2019/11/13 20:17:48 [debug] 36#0: *263002 [lua] balancer.lua:442: queryDns(): [ringbalancer 2] unchanged dns record entry for resume-parser-service.resume-parser-space: 10.107.24.14:nil
2019/11/13 20:17:48 [debug] 36#0: *263002 [lua] init.lua:608: balancer(): setting address (try 1): 10.107.24.14:8000
2019/11/13 20:17:48 [error] 36#0: *263002 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.244.0.132, server: kong, request: "POST /resume_parser/parse_android HTTP/1.1", upstream: "http://10.107.24.14:8000/parse_android", host: ""
10.244.0.132 - [13/Nov/2019:20:17:48 +0800] LOG.KONG | POST /resume_parser/parse_android HTTP/1.1 502 69 7645 python-requests/2.20.0 113.106.106.3 10.107.24.14:8000 - 0.000 0.005 -
  1. The tcpdump capture data with this request in kong is:
/tmp # cat tcp1.log 
17:45:29.697151 IP kong-69446c8c84-p9dvs.44430 > resume-parser-service.resume-parser-space.svc.cluster.local.8000: Flags [.], seq 1055832147:1055839137, ack 1250270335, win 1401, options [nop,nop,TS val 220831485 ecr 2311028121], length 6990
17:45:29.697179 IP resume-parser-service.resume-parser-space.svc.cluster.local.8000 > kong-69446c8c84-p9dvs.44430: Flags [R], seq 1250270335, win 0, length 0
17:45:29.697186 IP kong-69446c8c84-p9dvs.44430 > resume-parser-service.resume-parser-space.svc.cluster.local.8000: Flags [P.], seq 6990:8827, ack 1, win 1401, options [nop,nop,TS val 220831485 ecr 2311028121], length 1837
17:45:29.697196 IP resume-parser-service.resume-parser-space.svc.cluster.local.8000 > kong-69446c8c84-p9dvs.44430: Flags [R], seq 1250270335, win 0, length 0

as the log see, there has RST ack with port: 44430 in kong pod.

at the same time, there is non tcp packets captured in upstream pod.

  1. The ss -ant data after request in the two pod is:
    kong pod:
/tmp # ss -ant | grep 10.107.24.14
ESTAB    0         0             10.244.0.134:39982        10.107.24.14:8000    
ESTAB    0         0             10.244.0.134:35304        10.107.24.14:8000    
ESTAB    0         0             10.244.0.134:49246        10.107.24.14:8000    
ESTAB    0         0             10.244.0.134:43078        10.107.24.14:8000 

upstream pod:

/tmp # ss -ant | grep EST | grep 10.244.0.134
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:35304              
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:43078              
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:49246              
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:44430
ESTAB      0      0      10.244.4.112:8000               10.244.0.134:39982

as see above, a new TCP connection: 10.244.0.134:39982 is established, and the old TCP: 10.244.0.134:44430 disappears from kong pod, but still exist in upstream server.

It is confusing that seams the keep-alive tcp connection :44430 is break of in the kong pod, and have not notice the upstream.

This problem may occured in some service, but not appeared in other service.

Thank you.

Steps To Reproduce

  1. kong configurate a upstream with k8s service.
  2. request the api with more than 1 minutes
  3. sometimes a 502 response may occured

Additional Details & Logs

  • Kong version: 1.0.3
  • Kong debug-level logs:
2019/11/13 20:17:48 [debug] 36#0: *263002 [lua] balancer.lua:442: queryDns(): [ringbalancer 2] unchanged dns record entry for resume-parser-service.resume-parser-space: 10.107.24.14:nil
2019/11/13 20:17:48 [debug] 36#0: *263002 [lua] init.lua:608: balancer(): setting address (try 1): 10.107.24.14:8000
2019/11/13 20:17:48 [error] 36#0: *263002 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.244.0.132, server: kong, request: "POST /resume_parser/parse_android HTTP/1.1", upstream: "http://10.107.24.14:8000/parse_android", host: ""
10.244.0.132 - [13/Nov/2019:20:17:48 +0800] LOG.KONG | POST /resume_parser/parse_android HTTP/1.1 502 69 7645 python-requests/2.20.0 113.106.106.3 10.107.24.14:8000 - 0.000 0.005 -

or

2019/11/13 19:35:17 [debug] 34#0: *243996 [lua] balancer.lua:442: queryDns(): [ringbalancer 2] unchanged dns record entry for resume-parser-service.resume-parser-space: 10.107.24.14:nil
2019/11/13 19:35:17 [debug] 34#0: *243996 [lua] init.lua:608: balancer(): setting address (try 1): 10.107.24.14:8000
2019/11/13 19:35:17 [error] 34#0: *243996 writev() failed (104: Connection reset by peer) while sending request to upstream, client: 10.244.0.132, server: kong, request: "POST /resume_parser/parse_android HTTP/1.1", upstream: "http://10.107.24.14:8000/parse_android", host: ""
10.244.0.132 - [13/Nov/2019:19:35:17 +0800] LOG.KONG | POST /resume_parser/parse_android HTTP/1.1 502 69 87475 Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 113.106.106.3 10.107.24.14:8000 - 0.001 0.007 -
  • Kong configuration:
    some import config:
client_max_body_size 0;
proxy_ssl_server_name on;
underscores_in_headers on;

lua_package_path '/usr/local/kong/lualibs/?.lua;/usr/local/kong/plugins/?.lua;/usr/local/kong/lualibs/?/init.lua;./?.lua;./?/init.lua;;;;';
lua_package_cpath ';;';
lua_socket_pool_size 30;
lua_max_running_timers 4096;
lua_max_pending_timers 16384;
lua_shared_dict kong                5m;
lua_shared_dict kong_db_cache       128m;
lua_shared_dict kong_db_cache_miss 12m;
lua_shared_dict kong_locks          8m;
lua_shared_dict kong_process_events 5m;
lua_shared_dict kong_cluster_events 5m;
lua_shared_dict kong_healthchecks   5m;
lua_shared_dict kong_rate_limiting_counters 12m;
lua_socket_log_errors off;
lua_ssl_verify_depth 1;

# injected nginx_http_* directives
log_format kong_log '$remote_addr - [$time_local] LOG.KONG | $request $status $body_bytes_sent $request_length $http_user_agent $http_x_forwarded_for $upstream_addr $request_auth_time $upstream_response_time $request$
lua_shared_dict glocache 2m;
lua_shared_dict prometheus_metrics 5m;

upstream kong_upstream {
    server 0.0.0.1;
    balancer_by_lua_block {
        Kong.balancer()
    }
    keepalive 60;
}
...
    location / {
        default_type                     '';

        set $ctx_ref                     '';
        set $upstream_host               '';
        set $upstream_upgrade            '';
        set $upstream_connection         '';
        set $upstream_scheme             '';
        set $upstream_uri                '';
        set $upstream_x_forwarded_for    '';
        set $upstream_x_forwarded_proto  '';
        set $upstream_x_forwarded_host   '';
        set $upstream_x_forwarded_port   '';

        rewrite_by_lua_block {
            Kong.rewrite()
        }

        access_by_lua_block {
            Kong.access()
        }

        proxy_http_version 1.1;
        proxy_set_header   Host              $upstream_host;
        proxy_set_header   Upgrade           $upstream_upgrade;
        proxy_set_header   Connection        $upstream_connection;
        proxy_set_header   X-Forwarded-For   $upstream_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $upstream_x_forwarded_proto;
        proxy_set_header   X-Forwarded-Host  $upstream_x_forwarded_host;
        proxy_set_header   X-Forwarded-Port  $upstream_x_forwarded_port;
        proxy_set_header   X-Real-IP         $remote_addr;
        proxy_pass_header  Server;
        proxy_pass_header  Date;
        proxy_ssl_name     $upstream_host;
        proxy_pass         $upstream_scheme://kong_upstream$upstream_uri;
...
  • Operating system: ubuntu 16.04, Linux 4.4.0-62-generic

All 7 comments

I have resolved this problem, it is not a fault of kong.

Hey, @so2bin. Out of curiosity, how did you resolve it? Thanks.

my server is running on k8s and the kube-proxy is run with ipvs mode. there is a http keepalive connection between kong pod and upstream, but the tcp connection is not keepalived, so there is no beatheart tcp package between them. meanwhile the tcp maintained in ipvs has a default 15 minute expiration, so the tcp will expired if not used in 15m, then 502 will happened after 15m.

Hey, @so2bin. Out of curiosity, how did you resolve it? Thanks.

see the privious reply above

Thanks a lot.

my server is running on k8s and the kube-proxy is run with ipvs mode. there is a http keepalive connection between kong pod and upstream, but the tcp connection is not keepalived, so there is no beatheart tcp package between them. meanwhile the tcp maintained in ipvs has a default 15 minute expiration, so the tcp will expired if not used in 15m, then 502 will happened after 15m.

hi,I have the same problem, what is the special solution?

I can not find any way to set upstream keepalive_timeout in kong, It seems that the connection in kong is ESTABLISHED forever, there is no packet on this connection,the state will not change even out of 7200s.
the only way I can see is setting keepalive_timeout in side of upstream.

Is there any other better solution? thanks a lot

I have resolved this problem, it is not a fault of kong.

How did u solve the problem锛烮 encountered the same.

Was this page helpful?
0 / 5 - 0 ratings