Envoy: Address field in listener not working (upstream connect error or disconnect/reset before headers)

Created on 22 Apr 2017 · 18Comments · Source: envoyproxy/envoy

Am not sure if this is related to #326 , still referencing the issue since it has the same error message, but on the face of it they seem to be different.

I'm trying to get the address field in a listener to work and I'm unable to do that. I've written a simple shell script as a test harness that will do the following - (this will work only on linux) -

Spins up an envoy container named envoyct1 with a default config and installs curl and python packages in it.
Uses nsenter to plumb two hardcoded IPs on eth0 of the envoy container that is spun up. These two IPs are simulations for two VIPs.
Copies over a new envoy config that has two listeners configured with the two IPs, on port 80.
Spins up a backend python server on port 9001 for service/1 backend.

When I docker exec into envoyct1, and fire curl <VIP1>/service/1, I expect to get a 404. But I see this error -

bash-4.3# curl 192.45.67.90/service/1
upstream connect error or disconnect/reset before headersbash-4.3#

If I spin up a python server on a different port, and curl the IP:port, it works -

bash-4.3# python -m SimpleHTTPServer 9002
Serving HTTP on 0.0.0.0 port 9002 ...
192.45.67.90 - - [22/Apr/2017 01:44:27] "GET / HTTP/1.1" 200 -

bash-4.3# curl 192.45.67.90:9002
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html>
<title>Directory listing for /</title>
<body>
<h2>Directory listing for /</h2>
<hr>
<ul>
<li><a href=".dockerenv">.dockerenv</a>
<li><a href="bin/">bin/</a>
<li><a href="dev/">dev/</a>
<li><a href="etc/">etc/</a>
<li><a href="home/">home/</a>
<li><a href="lib/">lib/</a>
<li><a href="lib64/">lib64/</a>
<li><a href="media/">media/</a>
<li><a href="mnt/">mnt/</a>
<li><a href="proc/">proc/</a>
<li><a href="root/">root/</a>
<li><a href="run/">run/</a>
<li><a href="sbin/">sbin/</a>
<li><a href="srv/">srv/</a>
<li><a href="sys/">sys/</a>
<li><a href="tmp/">tmp/</a>
<li><a href="usr/">usr/</a>
<li><a href="var/">var/</a>
</ul>
<hr>
</body>
</html>
bash-4.3#

So this doesn't look like a network configuration issue (the curl is being issued from inside the envoy container).

Is this an envoy config issue? Or some other?

When I tried to debug this using gdb and a debug envoy build, it looked like a worker thread that handles the connection request somewhere in the connection_manager_impl.cc chain sees a socket close event and so spits out this error. I'm not sure why it should see a socket close event..

Am I doing something wrong with the config? Can someone please take a look?

BTW, it doesn't matter if I have one or two listeners in my config file. It's the same result. Also, it doesn't matter whether I plumb the VIPs or not - using a simple 127.0.0.10 loopback IP yields the same result.

I'm attaching the harness as a zip file. Unzip it and simply run ./setup_ifaces.sh, and it'll spin up an envoy alpine container and do the rest of the plumbing. If you fire ./setup_ifaces.sh ubuntu, it will pull the lyft/envoy ubuntu image instead and do the same stuff there.

So basically, this happens across ubuntu/alpine, loopback/eth0. Any pointers/help would be much appreciated.

Thanks!

setup_envoy_multiple_listener.zip

question

Source

vijayendrabvs

Most helpful comment

I resolved my issue by removing http2_protocol_options: {}

danesavot on 9 Mar 2018

👍23

All 18 comments

"upstream connect error or disconnect/reset before headers" means that Envoy cannot connect to the upstream that is being routed to. Your listener config is probably fine. I would use a combination of the /stats and /clusters admin endpoint output to debug further, and verify that you can connect to your backend services from within the Envoy container.

mattklein123 on 23 Apr 2017

@mattklein123 Thanks for taking a look! The text below is a bit long owing to the outputs I've pasted - thanks in advance for reading through them!

When I look at the /clusters output, I see service1 and service 2 there, with a series of entries for 127.0.0.1:9001 (the python backend service), but for it, I see the cx_connect_fail stat set to 0 - if it were a connectivity issue from the envoy server, that shouldn't be 0, correct?

bash-4.3# curl 127.0.0.10:8001/clusters
service1::default_priority::max_connections::1024
service1::default_priority::max_pending_requests::1024
service1::default_priority::max_requests::1024
service1::default_priority::max_retries::3
service1::high_priority::max_connections::1024
service1::high_priority::max_pending_requests::1024
service1::high_priority::max_requests::1024
service1::high_priority::max_retries::3
service1::127.0.0.1:9001::cx_active::0
service1::127.0.0.1:9001::cx_connect_fail::0
service1::127.0.0.1:9001::cx_total::0
service1::127.0.0.1:9001::rq_active::0
service1::127.0.0.1:9001::rq_timeout::0
service1::127.0.0.1:9001::rq_total::0
service1::127.0.0.1:9001::health_flags::healthy
service1::127.0.0.1:9001::weight::1
service1::127.0.0.1:9001::zone::
service1::127.0.0.1:9001::canary::false
service1::127.0.0.1:9001::success_rate::-1
service2::default_priority::max_connections::1024
service2::default_priority::max_pending_requests::1024
service2::default_priority::max_requests::1024
service2::default_priority::max_retries::3
service2::high_priority::max_connections::1024
service2::high_priority::max_pending_requests::1024
service2::high_priority::max_requests::1024
service2::high_priority::max_retries::3
bash-4.3#

With the /stats output, I see some parameters that seem to apply here, pasting only them below (and the complete output in a separate excerpt below that) -

cluster.service1.max_host_weight: 1
cluster.service1.membership_change: 1
cluster.service1.membership_healthy: 1
cluster.service1.membership_total: 1
cluster.service1.update_attempt: 49
cluster.service1.update_failure: 0
cluster.service1.update_success: 49

The membership_healthy value shows 1 , which I infer means that envoy is able to see the backend service1 in the cluster - is that the case?

What are the update attempts referring to? They also seem to have gone through successfully 100% of the time (49 attempts).

complete output -

bash-4.3# curl 127.0.0.10:8001/stats  | grep service1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  95cluster.service1.lb_healthy_panic: 0 0 --:--:-- --:--:-- --:--:--     0
10    0  9510    0     0  5479k      0 --:--:-- --:--:-- --cluster.service1.lb_local_cluster_not_ok: 0
:--:-- 9287k
cluster.service1.lb_recalculate_zone_structures: 0
cluster.service1.lb_zone_cluster_too_small: 0
cluster.service1.lb_zone_no_capacity_left: 0
cluster.service1.lb_zone_number_differs: 0
cluster.service1.lb_zone_routing_all_directly: 0
cluster.service1.lb_zone_routing_cross_zone: 0
cluster.service1.lb_zone_routing_sampled: 0
cluster.service1.max_host_weight: 1
cluster.service1.membership_change: 1
cluster.service1.membership_healthy: 1
cluster.service1.membership_total: 1
cluster.service1.update_attempt: 49
cluster.service1.update_failure: 0
cluster.service1.update_success: 49
cluster.service1.upstream_cx_active: 0
cluster.service1.upstream_cx_close_header: 0
cluster.service1.upstream_cx_connect_fail: 0
cluster.service1.upstream_cx_connect_timeout: 0
cluster.service1.upstream_cx_destroy: 0
cluster.service1.upstream_cx_destroy_local: 0
cluster.service1.upstream_cx_destroy_local_with_active_rq: 0
cluster.service1.upstream_cx_destroy_remote: 0
cluster.service1.upstream_cx_destroy_remote_with_active_rq: 0
cluster.service1.upstream_cx_destroy_with_active_rq: 0
cluster.service1.upstream_cx_http1_total: 0
cluster.service1.upstream_cx_http2_total: 0
cluster.service1.upstream_cx_max_requests: 0
cluster.service1.upstream_cx_none_healthy: 0
cluster.service1.upstream_cx_overflow: 0
cluster.service1.upstream_cx_protocol_error: 0
cluster.service1.upstream_cx_rx_bytes_buffered: 0
cluster.service1.upstream_cx_rx_bytes_total: 0
cluster.service1.upstream_cx_total: 0
cluster.service1.upstream_cx_tx_bytes_buffered: 0
cluster.service1.upstream_cx_tx_bytes_total: 0
cluster.service1.upstream_rq_active: 0
cluster.service1.upstream_rq_cancelled: 0
cluster.service1.upstream_rq_maintenance_mode: 0
cluster.service1.upstream_rq_pending_active: 0
cluster.service1.upstream_rq_pending_failure_eject: 0
cluster.service1.upstream_rq_pending_overflow: 0
cluster.service1.upstream_rq_pending_total: 0
cluster.service1.upstream_rq_per_try_timeout: 0
cluster.service1.upstream_rq_retry: 0
cluster.service1.upstream_rq_retry_overflow: 0
cluster.service1.upstream_rq_retry_success: 0
cluster.service1.upstream_rq_rx_reset: 0
cluster.service1.upstream_rq_timeout: 0
cluster.service1.upstream_rq_total: 0
cluster.service1.upstream_rq_tx_reset: 0
bash-4.3#

Thing is, I'm able to curl the backend service (it runs in the same envoy container) from within the envoy container on both VIPs and localhost without any issues -

Here's the netstat output to begin with -

bash-4.3# hostname
a24847dd0490
bash-4.3# netstat -apn
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:8001            0.0.0.0:*               LISTEN      39/envoy
tcp        0      0 0.0.0.0:9001            0.0.0.0:*               LISTEN      47/python
tcp        0      0 192.45.67.90:80         0.0.0.0:*               LISTEN      39/envoy
tcp        0      0 192.45.67.89:80         0.0.0.0:*               LISTEN      39/envoy
tcp        3      0 127.0.0.1:10000         0.0.0.0:*               LISTEN      1/envoy
tcp       85      0 127.0.0.1:10000         127.0.0.1:43004         CLOSE_WAIT  -
tcp       88      0 127.0.0.1:10000         127.0.0.1:43006         CLOSE_WAIT  -
tcp       80      0 127.0.0.1:10000         127.0.0.1:43002         CLOSE_WAIT  -
udp        0      0 172.17.0.3:55605        10.254.58.55:53         ESTABLISHED 39/envoy
udp        0      0 172.17.0.3:59536        10.241.16.126:53        ESTABLISHED 39/envoy
udp        0      0 172.17.0.3:47713        10.254.58.54:53         ESTABLISHED 39/envoy
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name     Path
unix  2      [ ]         DGRAM                    657299   39/envoy             @envoy_domain_socket_1
unix  2      [ ]         DGRAM                    666235   1/envoy              @envoy_domain_socket_0
bash-4.3#

Here's ps -

bash-4.3# ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 /usr/local/bin/envoy -c /usr/local/conf/envoy/google_com_proxy.json
   39 root       0:00 /usr/local/bin/envoy -c /usr/local/conf/envoy/envoy-multiple-listener-config.json --restart-epoch 1
   47 root       0:00 python -m SimpleHTTPServer 9001
   61 root       0:00 bash
   83 root       0:00 ps aux
bash-4.3#

Now, I'm pinging the backend python simpleHTTP server directly on its 9001 port via one of the VIPs (192.45.67.89) -

bash-4.3# curl 192.45.67.89:9001
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html>
<title>Directory listing for /</title>
<body>
<h2>Directory listing for /</h2>
<hr>
<ul>
<li><a href=".dockerenv">.dockerenv</a>
<li><a href="bin/">bin/</a>
<li><a href="dev/">dev/</a>
<li><a href="etc/">etc/</a>
<li><a href="home/">home/</a>
<li><a href="lib/">lib/</a>
<li><a href="lib64/">lib64/</a>
<li><a href="media/">media/</a>
<li><a href="mnt/">mnt/</a>
<li><a href="proc/">proc/</a>
<li><a href="root/">root/</a>
<li><a href="run/">run/</a>
<li><a href="sbin/">sbin/</a>
<li><a href="srv/">srv/</a>
<li><a href="sys/">sys/</a>
<li><a href="tmp/">tmp/</a>
<li><a href="usr/">usr/</a>
<li><a href="var/">var/</a>
</ul>
<hr>
</body>
</html>
bash-4.3#

Next, I'm pinging the backend python simpleHTTP server directly on its 9001 port via the other VIP (192.45.67.90) again from within the envoy container -

bash-4.3# curl 192.45.67.90:9001
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><html>
<title>Directory listing for /</title>
<body>
<h2>Directory listing for /</h2>
<hr>
<ul>
<li><a href=".dockerenv">.dockerenv</a>
<li><a href="bin/">bin/</a>
<li><a href="dev/">dev/</a>
<li><a href="etc/">etc/</a>
<li><a href="home/">home/</a>
<li><a href="lib/">lib/</a>
<li><a href="lib64/">lib64/</a>
<li><a href="media/">media/</a>
<li><a href="mnt/">mnt/</a>
<li><a href="proc/">proc/</a>
<li><a href="root/">root/</a>
<li><a href="run/">run/</a>
<li><a href="sbin/">sbin/</a>
<li><a href="srv/">srv/</a>
<li><a href="sys/">sys/</a>
<li><a href="tmp/">tmp/</a>
<li><a href="usr/">usr/</a>
<li><a href="var/">var/</a>
</ul>
<hr>
</body>
</html>
bash-4.3#

But when I try to go via the VIP on port 80 -

bash-4.3# curl -vvv 192.45.67.90:80/service/1
*   Trying 192.45.67.90...
* TCP_NODELAY set
* Connected to 192.45.67.90 (192.45.67.90) port 80 (#0)
> GET /service/1 HTTP/1.1
> Host: 192.45.67.90
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 57
< content-type: text/plain
< date: Sun, 23 Apr 2017 21:34:13 GMT
< server: envoy
<
* Curl_http_done: called premature == 0
* Connection #0 to host 192.45.67.90 left intact
upstream connect error or disconnect/reset before headersbash-4.3#

Why is envoy getting a 503 as a response when it should be able to reach the backend service?

Finally, the admin access log doesn't show up any new entry when I issue a curl on the VIP/service/1 path, I'm guessing that is expected. Are there any other logs that I can enable to view envoy connection activity?

bash-4.3# curl 192.45.67.90/service/1
upstream connect error or disconnect/reset before headersbash-4.3#
bash-4.3# cat /var/log/envoy/admin_access.log
[2017-04-22T00:08:14.583Z] "GET / HTTP/1.1" 404 - 0 530 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:29:19.803Z] "GET /clusters HTTP/1.1" 200 - 0 1195 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.1:8001" "-"
[2017-04-23T21:29:35.917Z] "GET /admin HTTP/1.1" 404 - 0 530 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.1:8001" "-"
[2017-04-23T21:29:49.917Z] "GET /server_info HTTP/1.1" 200 - 0 36 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.1:8001" "-"
[2017-04-23T21:44:24.729Z] "GET / HTTP/1.1" 404 - 0 530 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:44:34.330Z] "GET /clusters HTTP/1.1" 200 - 0 1195 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:47:00.993Z] "GET /admin HTTP/1.1" 404 - 0 530 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:47:03.843Z] "GET /stats HTTP/1.1" 200 - 0 9511 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:47:12.991Z] "GET /stats HTTP/1.1" 200 - 0 9510 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
[2017-04-23T21:47:57.548Z] "GET /stats HTTP/1.1" 200 - 0 9510 0 - "172.17.0.3" "curl/7.52.1" "-" "127.0.0.10:8001" "-"
bash-4.3#

vijayendrabvs on 24 Apr 2017

I skimmed through this quickly and I don't see any call to service1 at all in the stats above, so they are probably going to service2. I can't tell without seeing the full config, full dump of stats, and full dump of clusters output. I won't be able to help you further in this issue. If someone else doesn't help I would try Gitter for more interactive help. This is a configuration or docker setup issue.

mattklein123 on 24 Apr 2017

Np, thanks @mattklein123 ! I'll post this on gitter.

vijayendrabvs on 24 Apr 2017

@mattklein123 This issue was being caused because I plumbed subinterfaces but didn't configure any routing on them. Going via the docker network create and connect commands resolved connectivity issues and we were able to bring up multiple listeners. Thanks for your help on this!

vijayendrabvs on 25 Apr 2017

@vijayendrabvs I am running into the same problem. My golang service is accessible from within the service container on port 9096 but not accessible through the envoy front-proxy container, with exactly the same response as you reported.

Can you provide any details on the resolution please?

aambhaik on 20 Jun 2017

👍2

I'm running into the same issue today.

I can access the service from the container using curl but not able to accesss through the envoy container via http://localhost:10000/symphony
My envoy.yaml

tatic_resources:
listeners:

address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
  - name: envoy.http_connection_manager
    
    config:
    
    codec_type: auto
    
    stat_prefix: ingress_http
    
    route_config:
    
    name: local_route
    
    virtual_hosts:
    - name: backend
      
      domains:
      - "*"
        
        routes:
      - match:
        
        prefix: "/symphony"
        
        route:
        
        cluster: symphony
      - match:
        
        prefix: "/service/2"
        
        route:
        
        cluster: service2
        
        http_filters:
    - name: envoy.router
      
      config: {}
      
      clusters:
name: symphony
connect_timeout: 0.25s
type: STATIC
lb_policy: round_robin
http2_protocol_options: {}
hosts:
- socket_address:
  
  address: 10.129.16.178
  
  port_value: 8080
name: service2
connect_timeout: 0.25s
type: strict_dns
lb_policy: round_robin
http2_protocol_options: {}
hosts:
- socket_address:
  
  address: service2
  
  port_value: 80
  
  admin:
  
  access_log_path: "/dev/null"
  
  address:
  
  socket_address:
  
  address: 0.0.0.0
  
  port_value: 1

danesavot on 17 Feb 2018

👎5

Same issue here, is there some way to solve this?
I can wget -qO- localhost:80/ping my service from within the container but I get the error when curling the ingress: upstream connect error or disconnect/reset before headers.

AmerbankDavd on 9 Mar 2018

I resolved my issue by removing http2_protocol_options: {}

danesavot on 9 Mar 2018

👍23

Where did you change that option?

AmerbankDavd on 15 Mar 2018

Share your Envoy config file. I will take a look.

danesavot on 15 Mar 2018

Envoy is used in my Istio container. But I don't know where to find that config file.

AmerbankDavd on 16 Mar 2018

@danesavot ,
Where can you find envoy config file in istio-container? And how to change this config file?
Do we need modify them and create istio-proxy container by ourselves?

johnzheng1975 on 18 Apr 2018

@AmerbankDavd Had you resolved this?

johnzheng1975 on 18 Apr 2018

In my istio 0.5.1, there is no http2_protocol_options: {} at all.

kubectl exec -ti istio-pilot-676d495bf8-9c2px -c istio-proxy -n istio-system -- cat /etc/istio/proxy/envoy_pilot.json
{
"listeners": [
{
"address": "tcp://0.0.0.0:15003",
"name": "tcp_0.0.0.0_15003",
"filters": [
{
"type": "read",
"name": "tcp_proxy",
"config": {
"stat_prefix": "tcp",
"route_config": {
"routes": [
{
"cluster": "in.8080"
}
]
}
}
}
],
"bind_to_port": true
}
],
"admin": {
"access_log_path": "/dev/stdout",
"address": "tcp://127.0.0.1:15000"
},
"cluster_manager": {
"clusters": [
{
"name": "in.8080",
"connect_timeout_ms": 1000,
"type": "static",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://127.0.0.1:8080"
}
]
}
]
}
}

johnzheng1975 on 18 Apr 2018

👎7

I have added all the changes recommended to get the hello world example to run into this repo https://github.com/oinke/gprc-hello

The terminal still shows:
server_1 | E0302 08:41:34.225022613 7 http_server_filter.cc:271] GET request without QUERY
and when i browse localhost:8080 i can see
upstream connect error or disconnect/reset before headers. reset reason: remote reset

Running on macOS Mojave 10.14.2 with Docker version 18.09.2, build 6247962

oinke on 2 Mar 2019

@oinke seems like your issue is not related to this issue. I have posted a PR (https://github.com/oinke/gprc-hello/pull/1) to your repo.

dio on 2 Mar 2019

@danesavot I also resolved by commenting out the empty http2 options. huge thanks!

# http2_protocol_options: {}

Outside of that, for everyone else, if you're running containers on the host, checkout networking: https://docs.docker.com/network/network-tutorial-standalone/

I created a custom docker bridge network, had the other containers run with --network and the jumped into the envoy container and ensured I could curl to those by name.

the empty http2 options was from the envoy tutorial