Consul: Connect Envoy sidecar proxy "cannot create upstream cluster without discovery chain"

Created on 22 Sep 2019  路  7Comments  路  Source: hashicorp/consul

Overview of the Issue

I'm facing an issue with the envoy connect proxy. When I add an upstream to a service, the envoy proxy isn't configured correctly by consul, instead getting a gRPC config stream closed: 2, cannot create upstream cluster without discovery chain error message. A similar message Error handling ADS stream: cannot create upstream cluster without discovery chain is logged by the agent.

The issue is not present in the built-in proxy. Both were started using the consul connect proxy|envoy command, although the issue persists if envoy is started manually with the bootstrapped config.

Reproduction Steps

  1. Create a cluster with at least 1 server and client
  2. Place 2 service defintions on the client(s) with one having the other as an upstream
  3. Run envoy with consul connect envoy -sidecar-for [service]

Consul info for both Client and Server


Client config

datacenter = "dc"
data_dir = "/opt/consul/"
connect {
  enabled = true
}
retry_join = ["test-server.lxd"]
ports {
  grpc = 8502
}


Server config

datacenter = "dc"
data_dir = "/opt/consul/"
connect {
  enabled = true
}
retry_join = [ "test-server.lxd" ]
server = true
bootstrap_expect = 1


Service defintions

service {
  id = "web-1"
  name = "web"
  port = 8080
  connect {
    sidecar_service {
      proxy {
        upstreams = [{
            destination_name = "redis"
            local_bind_port = 8128
        }]
      }
    }
  }
}
service {
  id = "redis-1"
  name = "redis"
  port =  15123
  connect {
    sidecar_service {}
  }
}

Operating system and Environment details

I'm running consul on LXD Containers. The Host is an Ubuntu Server 18.04 LTS and the Containers are Debian 10. Arch is amd64. I could reproduce the issue on Consul 1.6.0 and 1.6.1 as well as envoy 1.9.1, 1.11.0 and 1.11.1.

Log Fragments


Envoy Logs

[2019-09-22 15:09:41.201][5337][info][main] [source/server/server.cc:516] starting main dispatch loop
[2019-09-22 15:09:41.204][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
[2019-09-22 15:09:41.204][5337][info][upstream] [source/common/upstream/cluster_manager_impl.cc:148] cm init: all clusters initialized
[2019-09-22 15:09:41.204][5337][info][main] [source/server/server.cc:500] all clusters initialized. initializing init manager
[2019-09-22 15:09:41.353][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
[2019-09-22 15:09:41.353][5337][info][config] [source/server/listener_manager_impl.cc:761] all dependencies initialized. starting workers
[2019-09-22 15:09:41.590][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
[2019-09-22 15:09:43.845][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain
[2019-09-22 15:09:51.225][5337][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 2, cannot create upstream cluster without discovery chain


Consul Agent logs

2019/09/22 15:09:41 [DEBUG] http: Request GET /v1/agent/services (1.105611ms) from=127.0.0.1:44378
2019/09/22 15:09:41 [DEBUG] http: Request GET /v1/agent/self (1.811474ms) from=127.0.0.1:44378
2019/09/22 15:09:41 [DEBUG] http: Request GET /v1/agent/service/web-1-sidecar-proxy (266.444碌s) from=127.0.0.1:44378
2019/09/22 15:09:41 [WARN] agent: Check "service:redis-1-sidecar-proxy:1" socket connection failed: dial tcp 127.0.0.1:21000: connect: connection refused
2019/09/22 15:09:41 [DEBUG] Error handling ADS stream: cannot create upstream cluster without discovery chain
2019/09/22 15:09:41 [DEBUG] Error handling ADS stream: cannot create upstream cluster without discovery chain
2019/09/22 15:09:41 [DEBUG] Error handling ADS stream: cannot create upstream cluster without discovery chain
2019/09/22 15:09:43 [WARN] agent: Check "service:web-1-sidecar-proxy:1" socket connection failed: dial tcp 127.0.0.1:21001: connect: connection refused
2019/09/22 15:09:43 [DEBUG] Error handling ADS stream: cannot create upstream cluster without discovery chain

Most helpful comment

This will be fixed in an upcoming 1.6.x point release.

All 7 comments

Can confirm. I upgraded Consul to 1.6.1 with Envoy at 1.11.1 and facing this same issue.

I can confirm this issue too with Consul 1.6.0 / 1.6.1 and Envoy 1.11.1.

In addition, I noticed that the error is only occurring when services are defined at agent's boot. If they are defined after and the configuration is reloaded by consul reload, the proxy can start normally. My Systemd configuration is probably the best way to explain this (dirty) workaround :


/etc/systemd/system/consul.service

[Unit]
Description="HashiCorp Consul - A service mesh solution"
Documentation=https://www.consul.io/

Requires=network-online.target
After=network-online.target

[Service]
Type=notify
User=consul
Group=consul

# Hack to circumvent issue 6521
ExecStartPre=/bin/bash /opt/consul/consul_disable_services.sh
ExecStartPost=/bin/bash /opt/consul/consul_enable_services.sh
ExecStartPost=/usr/local/bin/consul reload

ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/ -config-dir=/etc/consul.d/services/
ExecReload=/usr/local/bin/consul reload

KillMode=process
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target


/opt/consul/consul_disable_services.sh

#!/bin/bash

for f in `/usr/bin/find /etc/consul.d/services/ -name '*.json'`;
do
  /bin/mv "$f" "$f.disabled";
done


/opt/consul/consul_enable_services.sh

#!/bin/bash

for f in `/usr/bin/find /etc/consul.d/services/ -name '*.json.disabled'`;
do
  /bin/mv "$f" "${f/.disabled/}";
done

To reproduce the issue with Docker:


consul-envoy Dockerfile

FROM consul:latest
FROM envoyproxy/envoy:v1.11.1
COPY --from=0 /bin/consul /bin/consul
ENTRYPOINT ["consul", "connect", "envoy"]


./servers/consul.hcl

retry_join = ["consul_server"]

connect = {
  enabled = true
}

ports = {
  grpc = 8502
}

server = true
ui = false


./client/consul.hcl

retry_join = ["consul_server"],

bind_addr = "{{ GetAllInterfaces | include \"network\" \"172.18.0.0/24\" | attr \"address\" }}",
client_addr = "{{ GetAllInterfaces | include \"network\" \"172.18.1.0/24\" | attr \"address\" }}",

connect {
  enabled = true
}

ports  {
  grpc = 8502
}

server = false
ui = false

services {
  name = "echo-client"
  port = 8080
  connect {
    sidecar_service {
      proxy {
        upstreams {
          destination_name = "echo"
          local_bind_port = 9191
        }
      }
    }
  }
}
services {
  name = "echo"
  port = 9090
  connect {
    sidecar_service {}
  }
}


docker-compose.yml

version: '3.7'

services:
  consul_server:
    image: consul:1.6.1
    command: "agent -bootstrap-expect=1 -config-file /etc/consul/consul.hcl"
    volumes:
      - ./servers/:/etc/consul/:ro
    networks:
      - consul
  consul_client:
    image: consul:1.6.1
    command: "agent -config-file /etc/consul/consul.hcl"
    depends_on:
      - consul_server
    volumes:
      - ./client/:/etc/consul/:ro
    networks:
      - consul
      - app
  sidecar_echo_client:
    image: consul-envoy
    command: "-sidecar-for echo-client"
    depends_on:
      - consul_client
    networks:
      - app
    environment:
      CONSUL_GRPC_ADDR: "consul_client:8502"
      CONSUL_HTTP_ADDR: "consul_client:8500"
  sidecar_echo:
    image: consul-envoy
    command: "-sidecar-for echo"
    depends_on:
      - consul_client
    networks:
      - app
    environment:
      CONSUL_GRPC_ADDR: "consul_client:8502"
      CONSUL_HTTP_ADDR: "consul_client:8500"

networks:
  consul:
    ipam:
      driver: default
      config:
        - subnet: 172.18.0.2/24
  app:
    ipam:
      driver: default
      config:
        - subnet: 172.18.1.2/24

This will be fixed in an upcoming 1.6.x point release.

@rboyer I'm seeing this issue too. Could you suggest any work around for this? Or older version without the bug.

It's working when I build from master but it seems like now it is forwarding the requests to unhealthy nodes too. I'll see if I can provide further info related to this.

Hi @nitsh,

The master branch is the work-in-progress branch for the next major release (1.7.x) and isn't necessarily ready for general use. The release/1.6.x branch is the branch for the next minor release in the 1.6 series and should be perfectly fine to use (as long as you are fine running a prerelease build). This bug was fixed in master and backported for the next upcoming 1.6.x release as well.

It sounds like you are comfortable building from source. If so, try building from release/1.6.x instead of master.

Fwiw, I was having the same problem and it was resolved by using Consul v1.5.3 instead of v1.6.1.

@rboyer, are you saying that you would expect this to be fixed in v1.6.2 when that is released? Thanks.

EDIT: Turns out I had a race condition, I think in my case things were trying to register with Consul before it was started up. Just adding this in case anyone else has the same issue. With things properly sequenced, I'm not seeing this issue and can use 1.6.1.

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least _30_ days.

If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.

Was this page helpful?
0 / 5 - 0 ratings