I'm facing an issue with the envoy connect proxy. When I add an upstream to a service, the envoy proxy isn't configured correctly by consul, instead getting a gRPC config stream closed: 2, cannot create upstream cluster without discovery chain error message. A similar message Error handling ADS stream: cannot create upstream cluster without discovery chain is logged by the agent.
The issue is not present in the built-in proxy. Both were started using the consul connect proxy|envoy command, although the issue persists if envoy is started manually with the bootstrapped config.
consul connect envoy -sidecar-for [service]
Client config
datacenter = "dc"
data_dir = "/opt/consul/"
connect {
enabled = true
}
retry_join = ["test-server.lxd"]
ports {
grpc = 8502
}
Server config
datacenter = "dc"
data_dir = "/opt/consul/"
connect {
enabled = true
}
retry_join = [ "test-server.lxd" ]
server = true
bootstrap_expect = 1
Service defintions
service {
id = "web-1"
name = "web"
port = 8080
connect {
sidecar_service {
proxy {
upstreams = [{
destination_name = "redis"
local_bind_port = 8128
}]
}
}
}
}
service {
id = "redis-1"
name = "redis"
port = 15123
connect {
sidecar_service {}
}
}
I'm running consul on LXD Containers. The Host is an Ubuntu Server 18.04 LTS and the Containers are Debian 10. Arch is amd64. I could reproduce the issue on Consul 1.6.0 and 1.6.1 as well as envoy 1.9.1, 1.11.0 and 1.11.1.
Envoy Logs
[2019-09-22 15:09:41.201][5337][info][main] [source/server/server.cc:516] starting main dispatch loop
[2019-09-22 15:09:41.204][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
[2019-09-22 15:09:41.204][5337][info][upstream] [source/common/upstream/cluster_manager_impl.cc:148] cm init: all clusters initialized
[2019-09-22 15:09:41.204][5337][info][main] [source/server/server.cc:500] all clusters initialized. initializing init manager
[2019-09-22 15:09:41.353][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
[2019-09-22 15:09:41.353][5337][info][config] [source/server/listener_manager_impl.cc:761] all dependencies initialized. starting workers
[2019-09-22 15:09:41.590][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
[2019-09-22 15:09:43.845][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
[2019-09-22 15:09:51.225][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
Consul Agent logs
2019/09/22 15:09:41 [DEBUG] http: Request GET /v1/agent/services (1.105611ms) from=127.0.0.1:44378
2019/09/22 15:09:41 [DEBUG] http: Request GET /v1/agent/self (1.811474ms) from=127.0.0.1:44378
2019/09/22 15:09:41 [DEBUG] http: Request GET /v1/agent/service/web-1-sidecar-proxy (266.444碌s) from=127.0.0.1:44378
2019/09/22 15:09:41 [WARN] agent: Check "service:redis-1-sidecar-proxy:1" socket connection failed: dial tcp 127.0.0.1:21000: connect: connection refused
2019/09/22 15:09:41 [DEBUG] Error handling ADS stream: cannot create upstream cluster without discovery chain
2019/09/22 15:09:41 [DEBUG] Error handling ADS stream: cannot create upstream cluster without discovery chain
2019/09/22 15:09:41 [DEBUG] Error handling ADS stream: cannot create upstream cluster without discovery chain
2019/09/22 15:09:43 [WARN] agent: Check "service:web-1-sidecar-proxy:1" socket connection failed: dial tcp 127.0.0.1:21001: connect: connection refused
2019/09/22 15:09:43 [DEBUG] Error handling ADS stream: cannot create upstream cluster without discovery chain
Can confirm. I upgraded Consul to 1.6.1 with Envoy at 1.11.1 and facing this same issue.
I can confirm this issue too with Consul 1.6.0 / 1.6.1 and Envoy 1.11.1.
In addition, I noticed that the error is only occurring when services are defined at agent's boot. If they are defined after and the configuration is reloaded by consul reload, the proxy can start normally. My Systemd configuration is probably the best way to explain this (dirty) workaround :
/etc/systemd/system/consul.service
[Unit]
Description="HashiCorp Consul - A service mesh solution"
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target
[Service]
Type=notify
User=consul
Group=consul
# Hack to circumvent issue 6521
ExecStartPre=/bin/bash /opt/consul/consul_disable_services.sh
ExecStartPost=/bin/bash /opt/consul/consul_enable_services.sh
ExecStartPost=/usr/local/bin/consul reload
ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/ -config-dir=/etc/consul.d/services/
ExecReload=/usr/local/bin/consul reload
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
/opt/consul/consul_disable_services.sh
#!/bin/bash
for f in `/usr/bin/find /etc/consul.d/services/ -name '*.json'`;
do
/bin/mv "$f" "$f.disabled";
done
/opt/consul/consul_enable_services.sh
#!/bin/bash
for f in `/usr/bin/find /etc/consul.d/services/ -name '*.json.disabled'`;
do
/bin/mv "$f" "${f/.disabled/}";
done
consul-envoy Dockerfile
FROM consul:latest
FROM envoyproxy/envoy:v1.11.1
COPY --from=0 /bin/consul /bin/consul
ENTRYPOINT ["consul", "connect", "envoy"]
./servers/consul.hcl
retry_join = ["consul_server"]
connect = {
enabled = true
}
ports = {
grpc = 8502
}
server = true
ui = false
./client/consul.hcl
retry_join = ["consul_server"],
bind_addr = "{{ GetAllInterfaces | include \"network\" \"172.18.0.0/24\" | attr \"address\" }}",
client_addr = "{{ GetAllInterfaces | include \"network\" \"172.18.1.0/24\" | attr \"address\" }}",
connect {
enabled = true
}
ports {
grpc = 8502
}
server = false
ui = false
services {
name = "echo-client"
port = 8080
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "echo"
local_bind_port = 9191
}
}
}
}
}
services {
name = "echo"
port = 9090
connect {
sidecar_service {}
}
}
docker-compose.yml
version: '3.7'
services:
consul_server:
image: consul:1.6.1
command: "agent -bootstrap-expect=1 -config-file /etc/consul/consul.hcl"
volumes:
- ./servers/:/etc/consul/:ro
networks:
- consul
consul_client:
image: consul:1.6.1
command: "agent -config-file /etc/consul/consul.hcl"
depends_on:
- consul_server
volumes:
- ./client/:/etc/consul/:ro
networks:
- consul
- app
sidecar_echo_client:
image: consul-envoy
command: "-sidecar-for echo-client"
depends_on:
- consul_client
networks:
- app
environment:
CONSUL_GRPC_ADDR: "consul_client:8502"
CONSUL_HTTP_ADDR: "consul_client:8500"
sidecar_echo:
image: consul-envoy
command: "-sidecar-for echo"
depends_on:
- consul_client
networks:
- app
environment:
CONSUL_GRPC_ADDR: "consul_client:8502"
CONSUL_HTTP_ADDR: "consul_client:8500"
networks:
consul:
ipam:
driver: default
config:
- subnet: 172.18.0.2/24
app:
ipam:
driver: default
config:
- subnet: 172.18.1.2/24
This will be fixed in an upcoming 1.6.x point release.
@rboyer I'm seeing this issue too. Could you suggest any work around for this? Or older version without the bug.
It's working when I build from master but it seems like now it is forwarding the requests to unhealthy nodes too. I'll see if I can provide further info related to this.
Hi @nitsh,
The master branch is the work-in-progress branch for the next major release (1.7.x) and isn't necessarily ready for general use. The release/1.6.x branch is the branch for the next minor release in the 1.6 series and should be perfectly fine to use (as long as you are fine running a prerelease build). This bug was fixed in master and backported for the next upcoming 1.6.x release as well.
It sounds like you are comfortable building from source. If so, try building from release/1.6.x instead of master.
Fwiw, I was having the same problem and it was resolved by using Consul v1.5.3 instead of v1.6.1.
@rboyer, are you saying that you would expect this to be fixed in v1.6.2 when that is released? Thanks.
EDIT: Turns out I had a race condition, I think in my case things were trying to register with Consul before it was started up. Just adding this in case anyone else has the same issue. With things properly sequenced, I'm not seeing this issue and can use 1.6.1.
Hey there,
This issue has been automatically locked because it is closed and there hasn't been any activity for at least _30_ days.
If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.
Most helpful comment
This will be fixed in an upcoming 1.6.x point release.