Envoy: get request errors: "no healthy upstream"

Created on 17 Nov 2019  路  9Comments  路  Source: envoyproxy/envoy

Title: get request errors: "no healthy upstream"

Description:

Dynamic configuration discovery through control panel.
In step:
2019-11-15 19:00:43: update cluster timeout and change config version.
2019-11-15 21:17:03: get A lot of request errors, grpc-status: 14, grpc-message: no healthy upstream
Deploying 21 envoy nodes, two of them had this error.
Envoy restart or reload node returns to normal.
Envoy version: 1.11.2

Config:
xds:

cache.NewSnapshotCache(false, xdHasher{}, xdLogger{})

envoy yaml:

admin:
  access_log_path: /data/log/envoybridge/admin_access.log
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901
dynamic_resources:
  ads_config:
    api_type: GRPC
    grpc_services:
    - envoy_grpc:
        cluster_name: xds_cluster
  cds_config:
    ads: {}
  lds_config:
    ads: {}

node:
  cluster: grpc-cluster
  id: grpc-node
static_resources:
  clusters:
  - name: xds_cluster
    connect_timeout: 1s
    type: strict_dns
    lb_policy: round_robin
    http2_protocol_options: {}
    load_assignment:
      cluster_name: xds_cluster
      endpoints:
      - lb_endpoints:
          endpoint:
            address:
              socket_address:
                address: x.x.x.x
                port_value: 9000
      - lb_endpoints:
          endpoint:
            address:
              socket_address:
                address: x.x.x.x
                port_value: 9000
      - lb_endpoints:
          endpoint:
            address:
              socket_address:
                address: x.x.x.x
                port_value: 9000
  - name: log_cluster
    type: EDS
    connect_timeout: 0.1s
    lb_policy: ROUND_ROBIN
    http2_protocol_options: {}
    eds_cluster_config:
      service_name: log_cluster
      eds_config:
        api_config_source:
          api_type: GRPC
          grpc_services:
            envoy_grpc:
              cluster_name: xds_cluster

Logs:

[2019-11-15 19:00:43.206][23415][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 14, upstream conne
ct error or disconnect/reset before headers. reset reason: connection termination
[2019-11-15 19:03:03.399][23415][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 14, upstream conne
ct error or disconnect/reset before headers. reset reason: connection termination
[2019-11-15 19:08:36.138][23415][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 13,
[2019-11-15 19:08:36.620][23415][info][upstream] [source/server/lds_api.cc:60] lds: add/update listener 'grpc-listener'
[2019-11-15 19:08:36.621][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:495] add/update cluster app.xxx starting warming
[2019-11-15 19:08:36.625][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:495] add/update cluster app.xxx starting warming
[2019-11-15 19:08:36.626][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:495] add/update cluster app.xxx starting warming
[2019-11-15 19:08:36.626][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:495] add/update cluster app.xxx starting warming
[2019-11-15 19:08:36.627][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:495] add/update cluster app.xxx starting warming
[2019-11-15 19:08:36.627][23415][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 19:08:36.627][23415][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 19:08:36.627][23415][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 19:08:36.627][23415][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 19:08:36.627][23415][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 19:08:52.511][23415][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2019-11-15 21:01:22.203][23415][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 13,
[2019-11-15 21:17:03.937][23415][warning][config] [bazel-out/k8-opt/bin/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:87] gRPC config stream closed: 13,
[2019-11-15 21:17:03.937][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:507] warming cluster app.xxx complete
[2019-11-15 21:17:03.938][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:507] warming cluster app.xxx complete
[2019-11-15 21:17:03.938][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:507] warming cluster app.xxx complete
[2019-11-15 21:17:03.938][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:507] warming cluster app.xxx complete
[2019-11-15 21:17:03.939][23415][info][upstream] [source/common/upstream/cluster_manager_impl.cc:507] warming cluster app.xxx complete
[2019-11-15 21:41:03.451][3324][info][main] [source/server/server.cc:238] initializing epoch 7 (hot restart version=11.104)
[2019-11-15 21:41:03.451][3324][info][main] [source/server/server.cc:240] statically linked extensions:
[2019-11-15 21:41:03.451][3324][info][main] [source/server/server.cc:242] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2019-11-15 21:41:03.451][3324][info][main] [source/server/server.cc:245] filters.http: envoy.buffer,envoy.cors,envoy.csrf,envoy.ext_authz,envoy.fault,envoy.filters.http.dynamic_forward_proxy,envoy.filters.http.grpc_http1_reverse_bridge,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.original_src,envoy.filters.http.rbac,envoy.filters.http.tap,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash
[2019-11-15 21:41:03.451][3324][info][main] [source/server/server.cc:248] filters.listener: envoy.listener.original_dst,envoy.listener.original_src,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2019-11-15 21:41:03.451][3324][info][main] [source/server/server.cc:251] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.dubbo_proxy,envoy.filters.network.mysql_proxy,envoy.filters.network.rbac,envoy.filters.network.sni_cluster,envoy.filters.network.thrift_proxy,envoy.filters.network.zookeeper_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy
[2019-11-15 21:41:03.452][3324][info][main] [source/server/server.cc:253] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
[2019-11-15 21:41:03.452][3324][info][main] [source/server/server.cc:255] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.tracers.datadog,envoy.tracers.opencensus,envoy.zipkin
[2019-11-15 21:41:03.452][3324][info][main] [source/server/server.cc:258] transport_sockets.downstream: envoy.transport_sockets.alts,envoy.transport_sockets.tap,raw_buffer,tls
[2019-11-15 21:41:03.452][3324][info][main] [source/server/server.cc:261] transport_sockets.upstream: envoy.transport_sockets.alts,envoy.transport_sockets.tap,raw_buffer,tls
[2019-11-15 21:41:03.452][3324][info][main] [source/server/server.cc:267] buffer implementation: old (libevent)
[2019-11-15 21:41:03.458][23415][warning][main] [source/server/server.cc:574] shutting down admin due to child startup
[2019-11-15 21:41:03.458][23415][warning][main] [source/server/server.cc:580] terminating parent process
[2019-11-15 21:41:03.459][3324][info][main] [source/server/server.cc:322] admin address: 0.0.0.0:9901
[2019-11-15 21:41:03.460][3324][info][main] [source/server/server.cc:432] runtime: layers:

  • name: base
    static_layer:
    {}
  • name: admin
    admin_layer:
    {}
    [2019-11-15 21:41:03.460][3324][warning][runtime] [source/common/runtime/runtime_impl.cc:497] Skipping unsupported runtime layer: name: "base"
    static_layer {
    }

[2019-11-15 21:41:03.460][3324][info][config] [source/server/configuration_impl.cc:61] loading 0 static secret(s)
[2019-11-15 21:41:03.460][3324][info][config] [source/server/configuration_impl.cc:67] loading 2 cluster(s)
[2019-11-15 21:41:03.462][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:124] cm init: initializing secondary clusters
[2019-11-15 21:41:03.463][3324][info][config] [source/server/configuration_impl.cc:71] loading 0 listener(s)
[2019-11-15 21:41:03.463][3324][info][config] [source/server/configuration_impl.cc:96] loading tracing configuration
[2019-11-15 21:41:03.463][3324][info][config] [source/server/configuration_impl.cc:116] loading stats sink configuration
[2019-11-15 21:41:03.463][3324][info][main] [source/server/server.cc:516] starting main dispatch loop
[2019-11-15 21:41:03.468][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:144] cm init: initializing cds
[2019-11-15 21:41:03.469][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:489] add/update cluster app.xxx during init
[2019-11-15 21:41:03.470][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:489] add/update cluster app.xxx during init
[2019-11-15 21:41:03.471][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:489] add/update cluster app.xxx during init
[2019-11-15 21:41:03.471][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:489] add/update cluster app.xxx during init
[2019-11-15 21:41:03.472][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:489] add/update cluster app.xxx during init
[2019-11-15 21:41:03.472][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:124] cm init: initializing secondary clusters
[2019-11-15 21:41:03.475][3324][info][upstream] [source/common/upstream/cluster_manager_impl.cc:148] cm init: all clusters initialized
[2019-11-15 21:41:03.475][3324][info][main] [source/server/server.cc:500] all clusters initialized. initializing init manager
[2019-11-15 21:41:03.479][3324][info][upstream] [source/server/lds_api.cc:60] lds: add/update listener 'grpc-listener'
[2019-11-15 21:41:03.480][3324][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 21:41:03.480][3324][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 21:41:03.480][3324][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 21:41:03.481][3324][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 21:41:03.481][3324][warning][misc] [source/common/protobuf/utility.cc:199] Using deprecated option 'envoy.api.v2.route.CorsPolicy.allow_origin_regex' from file route.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-11-15 21:41:03.481][3324][info][config] [source/server/listener_manager_impl.cc:761] all dependencies initialized. starting workers

Normal node:
image
Abnormal node:
image

question stale

Most helpful comment

Hi @tonyboxes I'm facing the same issue, is there any solution?

All 9 comments

This seems to be a race between DNS resolution/cluster warming and readiness. Do you mind attaching full snippets of your logs with debug level logging? I assume that the errors you are talking about are requests to the web.interface_cluster that you are pointing out in the screenshots?

The error was triggered by accident. My service has been running in production for over a month. I updated the dynamic configuration, and the error occurred two hours later.The node restart returned to normal.

My ads mode is false, will it affect?

Does the panic threshold need to be configured to take effect?

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

@tonyboxes Hello Tony, have you solved your issue ? I'm facing the same currently

Hi @tonyboxes I'm facing the same issue, is there any solution?

Can you show GRPC code? I want to learn how to use GRPC server config xds.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jeremybaumont picture jeremybaumont  路  3Comments

zanes2016 picture zanes2016  路  3Comments

jmillikin-stripe picture jmillikin-stripe  路  3Comments

weixiao-huang picture weixiao-huang  路  3Comments

vpiduri picture vpiduri  路  3Comments