Istio: Sporadic 503 errors

Created on 3 Oct 2018  ·  130Comments  ·  Source: istio/istio

Describe the bug
I have a web application in a Pod running with an Istio sidecar and I get random 503 errors from the sidecar itself (Envoy) not even reaching the web application. This is happening with a very low demand. In the logs below you will see a request to an endpoint named /logout which is throwing the 503 error and not reaching the web application whatsoever.

Expected behavior
Everything is forwarded to the web application.

Steps to reproduce the bug
Simple test invoking several endpoints in the web application, with a very small load.

Version
Istio --> 1.0.1
Kubernetes --> 1.11.3

Installation
istio.yaml - use this generated file for installation without authentication enabled

Environment
Bare metal servers, on premise, OL7.5, Linux 4.17.5-1.el7.elrepo.x86_64 #1 SMP Sun Jul 8 10:40:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

Logs

preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][http] external/envoy/source/common/http/conn_manager_impl.cc:190] [C32] new stream
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=host value=pp-helpers.test.oami.eu
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=cookie value=CLASSIFSESSIONID=MjM2N2MyOWUtODFhOS00ZGVjLTkwNjktOTQ5NzhlZTc2ZDk1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=content-type value=application/x-www-form-urlencoded
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=content-length value=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=user-agent value=Apache-HttpClient/4.5.5 (Java/1.8.0_151)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-forwarded-for value=10.133.0.44, 174.16.212.0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-forwarded-proto value=http
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-envoy-external-address value=174.16.212.0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-request-id value=82aeb468-9f91-9b30-9482-2aa3591e1979
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-envoy-decorator-operation value=classification-helper-ui.preprod-cb.svc.cluster.local:8080/api/classification*
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-b3-traceid value=e4ece8f07a5a7481
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-b3-spanid value=e4ece8f07a5a7481
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-b3-sampled value=1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-istio-attributes value=Ck8KCnNvdXJjZS51aWQSQRI/a3ViZXJuZXRlczovL2lzdGlvLWluZ3Jlc3NnYXRld2F5LTVkOTk5Yzg3NTctZHY5cnMuaXN0aW8tc3lzdGVtCk4KE2Rlc3RpbmF0aW9uLnNlcnZpY2USNxI1Y2xhc3NpZmljYXRpb24taGVscGVyLXVpLnByZXByb2QtY2Iuc3ZjLmNsdXN0ZXIubG9jYWwKUwoYZGVzdGluYXRpb24uc2VydmljZS5ob3N0EjcSNWNsYXNzaWZpY2F0aW9uLWhlbHBlci11aS5wcmVwcm9kLWNiLnN2Yy5jbHVzdGVyLmxvY2FsClEKF2Rlc3RpbmF0aW9uLnNlcnZpY2UudWlkEjYSNGlzdGlvOi8vcHJlcHJvZC1jYi9zZXJ2aWNlcy9jbGFzc2lmaWNhdGlvbi1oZWxwZXItdWkKLQodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USDBIKcHJlcHJvZC1jYgo2ChhkZXN0aW5hdGlvbi5zZXJ2aWNlLm5hbWUSGhIYY2xhc3NpZmljYXRpb24taGVscGVyLXVp
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:405] [C32] headers complete
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:309] [C32] completed header: key=x-envoy-original-path value=/api/classification/logout
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:426] [C32] message complete
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][filter] src/envoy/http/mixer/filter.cc:60] Called Mixer::Filter : Filter
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][filter] src/envoy/http/mixer/filter.cc:204] Called Mixer::Filter : setDecoderFilterCallbacks
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][http] external/envoy/source/common/http/conn_manager_impl.cc:889] [C32][S16141632285256672176] request end stream
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][http] external/envoy/source/common/http/conn_manager_impl.cc:490] [C32][S16141632285256672176] request headers complete (end_stream=true):
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':authority', 'pp-helpers.test.oami.eu'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':path', '//logout'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':method', 'POST'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'cookie', 'CLASSIFSESSIONID=MjM2N2MyOWUtODFhOS00ZGVjLTkwNjktOTQ5NzhlZTc2ZDk1'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'content-type', 'application/x-www-form-urlencoded'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'content-length', '0'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'user-agent', 'Apache-HttpClient/4.5.5 (Java/1.8.0_151)'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-forwarded-for', '10.133.0.44, 174.16.212.0'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-forwarded-proto', 'http'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-external-address', '174.16.212.0'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-request-id', '82aeb468-9f91-9b30-9482-2aa3591e1979'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-decorator-operation', 'classification-helper-ui.preprod-cb.svc.cluster.local:8080/api/classification*'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-traceid', 'e4ece8f07a5a7481'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-spanid', 'e4ece8f07a5a7481'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-sampled', '1'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-istio-attributes', 'Ck8KCnNvdXJjZS51aWQSQRI/a3ViZXJuZXRlczovL2lzdGlvLWluZ3Jlc3NnYXRld2F5LTVkOTk5Yzg3NTctZHY5cnMuaXN0aW8tc3lzdGVtCk4KE2Rlc3RpbmF0aW9uLnNlcnZpY2USNxI1Y2xhc3NpZmljYXRpb24taGVscGVyLXVpLnByZXByb2QtY2Iuc3ZjLmNsdXN0ZXIubG9jYWwKUwoYZGVzdGluYXRpb24uc2VydmljZS5ob3N0EjcSNWNsYXNzaWZpY2F0aW9uLWhlbHBlci11aS5wcmVwcm9kLWNiLnN2Yy5jbHVzdGVyLmxvY2FsClEKF2Rlc3RpbmF0aW9uLnNlcnZpY2UudWlkEjYSNGlzdGlvOi8vcHJlcHJvZC1jYi9zZXJ2aWNlcy9jbGFzc2lmaWNhdGlvbi1oZWxwZXItdWkKLQodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USDBIKcHJlcHJvZC1jYgo2ChhkZXN0aW5hdGlvbi5zZXJ2aWNlLm5hbWUSGhIYY2xhc3NpZmljYXRpb24taGVscGVyLXVp'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-original-path', '/api/classification/logout'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][filter] src/envoy/http/mixer/filter.cc:122] Called Mixer::Filter : decodeHeaders
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][filter] ./src/envoy/utils/header_update.h:46] Mixer forward attributes set: ClEKCnNvdXJjZS51aWQSQxJBa3ViZXJuZXRlczovL2NsYXNzaWZpY2F0aW9uLWhlbHBlci11aS03NmY5Y2RjZDRkLXJ6Nm45LnByZXByb2QtY2I=
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][router] external/envoy/source/common/router/router.cc:252] [C0][S14382728320977988265] cluster 'outbound|9091||istio-policy.istio-system.svc.cluster.local' match for URL '/istio.mixer.v1.Mixer/Check'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][router] external/envoy/source/common/router/router.cc:303] [C0][S14382728320977988265] router decoding headers:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':method', 'POST'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':path', '/istio.mixer.v1.Mixer/Check'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':authority', 'mixer'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':scheme', 'http'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'te', 'trailers'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'grpc-timeout', '5000m'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'content-type', 'application/grpc'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-traceid', 'e4ece8f07a5a7481'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-spanid', 'df547f7d4992bf4e'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-parentspanid', 'e4ece8f07a5a7481'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-sampled', '1'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-istio-attributes', 'ClEKCnNvdXJjZS51aWQSQxJBa3ViZXJuZXRlczovL2NsYXNzaWZpY2F0aW9uLWhlbHBlci11aS03NmY5Y2RjZDRkLXJ6Nm45LnByZXByb2QtY2I='
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-internal', 'true'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-forwarded-for', '174.16.213.27'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-expected-rq-timeout-ms', '5000'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][pool] external/envoy/source/common/http/http2/conn_pool.cc:97] [C236] creating stream
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][debug][router] external/envoy/source/common/router/router.cc:971] [C0][S14382728320977988265] pool ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http2] external/envoy/source/common/http/http2/codec_impl.cc:492] [C236] send data: bytes=85
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:326] [C236] writing 85 bytes, end_stream false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http2] external/envoy/source/common/http/http2/codec_impl.cc:446] [C236] sent frame type=1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][router] external/envoy/source/common/router/router.cc:871] [C0][S14382728320977988265] proxying 985 bytes
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:326] [C236] writing 994 bytes, end_stream false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.958][95][trace][http2] external/envoy/source/common/http/http2/codec_impl.cc:446] [C236] sent frame type=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][debug][grpc] src/envoy/utils/grpc_transport.cc:46] Sending Check request: attributes {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "inbound"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "x-envoy-original-path"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "/api/classification/logout"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "Apache-HttpClient/4.5.5 (Java/1.8.0_151)"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "e4ece8f07a5a7481"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "//logout"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "x-envoy-external-address"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "174.16.212.0"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "application/x-www-form-urlencoded"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "CLASSIFSESSIONID=MjM2N2MyOWUtODFhOS00ZGVjLTkwNjktOTQ5NzhlZTc2ZDk1"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "pp-helpers.test.oami.eu"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "82aeb468-9f91-9b30-9482-2aa3591e1979"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "10.133.0.44, 174.16.212.0"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "classification-helper-ui.preprod-cb.svc.cluster.local"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "preprod-cb"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "classification-helper-ui"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "kubernetes://classification-helper-ui-76f9cdcd4d-rz6n9.preprod-cb"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "origin.ip"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "kubernetes://istio-ingressgateway-5d999c8757-dv9rs.istio-system"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   words: "istio://preprod-cb/services/classification-helper-ui"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 3
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -19
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 17
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -6
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 18
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -11
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 19
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: 91
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 22
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: 92
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 25
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -4
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 131
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: 92
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 152
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -14
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 154
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -17
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 155
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -15
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 190
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -20
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 191
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -16
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 192
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -15
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 193
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -14
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 197
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -17
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   strings {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 201
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: -1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   int64s {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 151
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: 8080
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   bools {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 177
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   timestamps {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 24
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       seconds: 1538556462
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       nanos: 958551761
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   bytes {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: -18
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: "\256\020=8"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   bytes {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 150
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value: "\000\000\000\000\000\000\000\000\000\000\377\377\256\020\325\033"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   string_maps {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     key: 15
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     value {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: -7
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -8
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: -2
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -3
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 31
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -11
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 32
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: 91
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 33
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -6
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 55
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: 134
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 58
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -9
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 59
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -10
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 86
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -4
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 98
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -13
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 100
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: 92
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 102
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -12
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 121
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -5
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 122
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: 135
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       entries {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         key: 123
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:         value: -5
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: global_word_count: 203
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: deduplication_id: "231e89c0-6dcb-4ada-a813-375cedb8a64427"
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][debug][filter] src/envoy/http/mixer/filter.cc:146] Called Mixer::Filter : decodeHeaders Stop
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:719] [C32][S16141632285256672176] decode headers called: filter=0x115c9040 status=1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:362] [C32] parsed 1253 bytes
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:232] [C32] readDisable: enabled=true disable=true
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:389] [C236] socket event: 2
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:457] [C236] write ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][trace][connection] external/envoy/source/common/network/raw_buffer_socket.cc:62] [C236] write returns: 1079
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:389] [C32] socket event: 2
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.959][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:457] [C32] write ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:389] [C236] socket event: 3
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:457] [C236] write ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:427] [C236] read ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][connection] external/envoy/source/common/network/raw_buffer_socket.cc:21] [C236] read returns: 103
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][connection] external/envoy/source/common/network/raw_buffer_socket.cc:21] [C236] read returns: -1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][connection] external/envoy/source/common/network/raw_buffer_socket.cc:29] [C236] read error: 11
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][http2] external/envoy/source/common/http/http2/codec_impl.cc:277] [C236] dispatching 103 bytes
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][http2] external/envoy/source/common/http/http2/codec_impl.cc:335] [C236] recv frame type=1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][router] external/envoy/source/common/router/router.cc:583] [C0][S14382728320977988265] upstream headers complete: end_stream=false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][http] external/envoy/source/common/http/async_client_impl.cc:93] async http request response headers (end_stream=false):
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':status', '200'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'content-type', 'application/grpc'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-upstream-service-time', '1'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'date', 'Wed, 03 Oct 2018 08:47:42 GMT'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'server', 'envoy'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-decorator-operation', 'Check'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][http2] external/envoy/source/common/http/http2/codec_impl.cc:335] [C236] recv frame type=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][http] external/envoy/source/common/http/async_client_impl.cc:101] async http request response data (length=45 end_stream=false)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][http2] external/envoy/source/common/http/http2/codec_impl.cc:335] [C236] recv frame type=1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][client] external/envoy/source/common/http/codec_client.cc:94] [C236] response complete
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:126] item added to deferred deletion list (size=1)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][pool] external/envoy/source/common/http/http2/conn_pool.cc:189] [C236] destroying stream: 0 remaining
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][http] external/envoy/source/common/http/async_client_impl.cc:108] async http request response trailers:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'grpc-status', '0'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'grpc-message', ''
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][grpc] src/envoy/utils/grpc_transport.cc:67] Check response: precondition {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   status {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   valid_duration {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     seconds: 46
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     nanos: 386564935
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   valid_use_count: 10000
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   referenced_attributes {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     attribute_matches {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       name: 155
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       condition: EXACT
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     attribute_matches {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       name: 201
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       condition: EXACT
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     attribute_matches {
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       name: 152
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:       condition: EXACT
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:     }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:   }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: }
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][filter] src/envoy/http/mixer/filter.cc:211] Called Mixer::Filter : check complete OK
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:1296] [C32][S16141632285256672176] continuing filter chain: filter=0x115c9040
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:719] [C32][S16141632285256672176] decode headers called: filter=0x12d6f310 status=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:719] [C32][S16141632285256672176] decode headers called: filter=0x12d6fcc0 status=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][router] external/envoy/source/common/router/router.cc:252] [C32][S16141632285256672176] cluster 'inbound|8080||classification-helper-ui.preprod-cb.svc.cluster.local' match for URL '//logout'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][router] external/envoy/source/common/router/router.cc:303] [C32][S16141632285256672176] router decoding headers:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':authority', 'pp-helpers.test.oami.eu'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':path', '//logout'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':method', 'POST'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':scheme', 'http'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'cookie', 'CLASSIFSESSIONID=MjM2N2MyOWUtODFhOS00ZGVjLTkwNjktOTQ5NzhlZTc2ZDk1'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'content-type', 'application/x-www-form-urlencoded'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'content-length', '0'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'user-agent', 'Apache-HttpClient/4.5.5 (Java/1.8.0_151)'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-forwarded-for', '10.133.0.44, 174.16.212.0'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-forwarded-proto', 'http'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-external-address', '174.16.212.0'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-request-id', '82aeb468-9f91-9b30-9482-2aa3591e1979'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-traceid', 'e4ece8f07a5a7481'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-spanid', 'e4ece8f07a5a7481'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-b3-sampled', '1'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'x-envoy-original-path', '/api/classification/logout'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][pool] external/envoy/source/common/http/http1/conn_pool.cc:89] [C502] using existing connection
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:232] [C502] readDisable: enabled=false disable=false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][router] external/envoy/source/common/router/router.cc:971] [C32][S16141632285256672176] pool ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:326] [C502] writing 546 bytes, end_stream false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:719] [C32][S16141632285256672176] decode headers called: filter=0x111d3c20 status=1
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:126] item added to deferred deletion list (size=2)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:126] item added to deferred deletion list (size=3)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][http2] external/envoy/source/common/http/http2/codec_impl.cc:501] [C236] stream closed: 0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:126] item added to deferred deletion list (size=4)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http2] external/envoy/source/common/http/http2/codec_impl.cc:292] [C236] dispatched 103 bytes
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:52] clearing deferred deletion list (size=4)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:389] [C502] socket event: 2
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:457] [C502] write ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/raw_buffer_socket.cc:62] [C502] write returns: 546
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:389] [C502] socket event: 3
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:457] [C502] write ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:427] [C502] read ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/raw_buffer_socket.cc:21] [C502] read returns: 0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][connection] external/envoy/source/common/network/connection_impl.cc:451] [C502] remote close
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][connection] external/envoy/source/common/network/connection_impl.cc:133] [C502] closing socket: 0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:341] [C502] parsing 0 bytes
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http] external/envoy/source/common/http/http1/codec_impl.cc:362] [C502] parsed 0 bytes
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][client] external/envoy/source/common/http/codec_client.cc:81] [C502] disconnect. resetting 1 pending requests
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][client] external/envoy/source/common/http/codec_client.cc:104] [C502] request reset
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:126] item added to deferred deletion list (size=1)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][router] external/envoy/source/common/router/router.cc:457] [C32][S16141632285256672176] upstream reset
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][filter] src/envoy/http/mixer/filter.cc:191] Called Mixer::Filter : encodeHeaders 2
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:997] [C32][S16141632285256672176] encode headers called: filter=0x11d85140 status=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:997] [C32][S16141632285256672176] encode headers called: filter=0x11305dc0 status=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][http] external/envoy/source/common/http/conn_manager_impl.cc:1083] [C32][S16141632285256672176] encoding headers via codec (end_stream=false):
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: ':status', '503'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'content-length', '57'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'content-type', 'text/plain'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'date', 'Wed, 03 Oct 2018 08:47:42 GMT'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: 'server', 'envoy'
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]:
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:326] [C32] writing 134 bytes, end_stream false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:1157] [C32][S16141632285256672176] encode data called: filter=0x11d85140 status=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:1157] [C32][S16141632285256672176] encode data called: filter=0x11305dc0 status=0
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][http] external/envoy/source/common/http/conn_manager_impl.cc:1170] [C32][S16141632285256672176] encoding data via codec (size=57 end_stream=true)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:326] [C32] writing 57 bytes, end_stream false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][filter] src/envoy/http/mixer/filter.cc:257] Called Mixer::Filter : onDestroy state: 2
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:126] item added to deferred deletion list (size=2)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:232] [C32] readDisable: enabled=false disable=false
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][pool] external/envoy/source/common/http/http1/conn_pool.cc:122] [C502] client disconnected
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:126] item added to deferred deletion list (size=3)
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][main] external/envoy/source/common/event/dispatcher_impl.cc:52] clearing deferred deletion list (size=3)
arenetworking

Most helpful comment

FYI. There is a chance some of these 503s might be related to this bug where Envoy doesn't acknowledge a FIN packet immediately after getting a response from upstream and returns the connection to the connection pool instead, which will trigger a 503 on the next write to the stream: https://github.com/envoyproxy/envoy/issues/6190.

We run Envoy in a cluster that currently gets on average about 1k requests/minute and used to see 503s left and right across different services after about 15min or so of a fresh deployment. We swept them under the rug by doing retries, but as others mentioned, this can't be a long term solution due to its potential side effects on non-idempotent requests. We recently started playing around with idle upstream timeout values as introduced by https://github.com/envoyproxy/envoy/pull/2714 and haven't had any 503s within the past week (with the same amount of traffic). Would be interesting to see if envoyproxy/envoy/issues/6190 would fix any/most of these mysterious connection resets but I think it'd make sense for Istio to at the very least support this idle timeout option for upstream connections (option 4).

All 130 comments

Still facing sporadic 503s, but maybe related to the HPA autoscaling the Control Plane and impacting applications. Does this make sense?

Most probably fixed with #9805 for #9480.

It won't be fixed btw. Sporadic 503s are a common issue in istio at the moment due to the delays in pushing out mesh config to all envoys.

The fixes you have linked are to a specific to an issue in 1.0.3, but the OP is on 1.0.1

one option suggested in istio-user groups to disable policy check

 # disablePolicyChecks disables mixer policy checks.
  # Will set the value with same name in istio config map - pilot needs to be restarted to take effect.
  disablePolicyChecks: true

Reopening and relating to #7665

I keep suffering this issue...

Using Istio 1.0.4 here.

I’m struggling with random 503s that I get in a stupid simple application even when getting the JS to display the page. After setting the proxy logging level to debug, I can spot this:

[2019-02-19 09:52:50.754][35][debug][http] external/envoy/source/common/http/conn_manager_impl.cc:190] [C5963] new stream
[2019-02-19 09:52:50.754][35][debug][filter] src/envoy/http/mixer/filter.cc:60] Called Mixer::Filter : Filter
[2019-02-19 09:52:50.754][35][debug][filter] src/envoy/http/mixer/filter.cc:204] Called Mixer::Filter : setDecoderFilterCallbacks
[2019-02-19 09:52:50.754][35][debug][http] external/envoy/source/common/http/conn_manager_impl.cc:889] [C5963][S17073327596826725699] request end stream
[2019-02-19 09:52:50.754][35][debug][http] external/envoy/source/common/http/conn_manager_impl.cc:490] [C5963][S17073327596826725699] request headers complete (end_stream=true):
‘:authority’, ‘pp-helpers.test.oami.eu’
‘:path’, ‘/client.min.js?version=1.7.0’
‘:method’, ‘GET’
‘user-agent’, ‘Apache-HttpClient/4.5.6 (Java/1.8.0_151)’
‘x-forwarded-for’, ‘10.133.0.44, 10.136.106.52’
‘x-forwarded-proto’, ‘http’
‘x-envoy-external-address’, ‘10.136.106.52’
‘x-request-id’, ‘817c0d38-f489-9023-9f46-706f504357d5’
‘x-envoy-decorator-operation’, ‘helpers-frontend.preprod-cb.svc.cluster.local:80/*’
‘x-b3-traceid’, ‘b421de1126053c6d’
‘x-b3-spanid’, ‘b421de1126053c6d’
‘x-b3-sampled’, ‘1’
‘x-istio-attributes’, ‘Ck8KCnNvdXJjZS51aWQSQRI/a3ViZXJuZXRlczovL2lzdGlvLWluZ3Jlc3NnYXRld2F5LTY5OTZkNTY2ZDQtYmNtamYuaXN0aW8tc3lzdGVtCkYKE2Rlc3RpbmF0aW9uLnNlcnZpY2USLxItaGVscGVycy1mcm9udGVuZC5wcmVwcm9kLWNiLnN2Yy5jbHVzdGVyLmxvY2FsCksKGGRlc3RpbmF0aW9uLnNlcnZpY2UuaG9zdBIvEi1oZWxwZXJzLWZyb250ZW5kLnByZXByb2QtY2Iuc3ZjLmNsdXN0ZXIubG9jYWwKSQoXZGVzdGluYXRpb24uc2VydmljZS51aWQSLhIsaXN0aW86Ly9wcmVwcm9kLWNiL3NlcnZpY2VzL2hlbHBlcnMtZnJvbnRlbmQKLQodZGVzdGluYXRpb24uc2VydmljZS5uYW1lc3BhY2USDBIKcHJlcHJvZC1jYgouChhkZXN0aW5hdGlvbi5zZXJ2aWNlLm5hbWUSEhIQaGVscGVycy1mcm9udGVuZA==’
‘content-length’, ‘0’
[2019-02-19 09:52:50.754][35][debug][filter] src/envoy/http/mixer/filter.cc:122] Called Mixer::Filter : decodeHeaders
[2019-02-19 09:52:50.754][35][debug][filter] src/envoy/http/mixer/filter.cc:211] Called Mixer::Filter : check complete OK
[2019-02-19 09:52:50.754][35][debug][router] external/envoy/source/common/router/router.cc:252] [C5963][S17073327596826725699] cluster ‘inbound|80||helpers-frontend.preprod-cb.svc.cluster.local’ match for URL ‘/client.min.js?version=1.7.0’
[2019-02-19 09:52:50.754][35][debug][router] external/envoy/source/common/router/router.cc:303] [C5963][S17073327596826725699] router decoding headers:
‘:authority’, ‘pp-helpers.test.oami.eu’
‘:path’, ‘/client.min.js?version=1.7.0’
‘:method’, ‘GET’
‘:scheme’, ‘http’
‘user-agent’, ‘Apache-HttpClient/4.5.6 (Java/1.8.0_151)’
‘x-forwarded-for’, ‘10.133.0.44, 10.136.106.52’
‘x-forwarded-proto’, ‘http’
‘x-envoy-external-address’, ‘10.136.106.52’
‘x-request-id’, ‘817c0d38-f489-9023-9f46-706f504357d5’
‘x-b3-traceid’, ‘b421de1126053c6d’
‘x-b3-spanid’, ‘b421de1126053c6d’
‘x-b3-sampled’, ‘1’
‘content-length’, ‘0’
[2019-02-19 09:52:50.754][35][debug][pool] external/envoy/source/common/http/http1/conn_pool.cc:89] [C6024] using existing connection
[2019-02-19 09:52:50.754][35][debug][router] external/envoy/source/common/router/router.cc:971] [C5963][S17073327596826725699] pool ready
[2019-02-19 09:52:50.755][35][debug][connection] external/envoy/source/common/network/connection_impl.cc:451] [C6024] remote close
[2019-02-19 09:52:50.755][35][debug][connection] external/envoy/source/common/network/connection_impl.cc:133] [C6024] closing socket: 0
[2019-02-19 09:52:50.755][35][debug][client] external/envoy/source/common/http/codec_client.cc:81] [C6024] disconnect. resetting 1 pending requests
[2019-02-19 09:52:50.755][35][debug][client] external/envoy/source/common/http/codec_client.cc:104] [C6024] request reset
[2019-02-19 09:52:50.755][35][debug][router] external/envoy/source/common/router/router.cc:457] [C5963][S17073327596826725699] upstream reset
[2019-02-19 09:52:50.755][35][debug][filter] src/envoy/http/mixer/filter.cc:191] Called Mixer::Filter : encodeHeaders 2
[2019-02-19 09:52:50.755][35][debug][http] external/envoy/source/common/http/conn_manager_impl.cc:1083] [C5963][S17073327596826725699] encoding headers via codec (end_stream=false):
‘:status’, ‘503’
‘content-length’, ‘57’
‘content-type’, ‘text/plain’
‘date’, ‘Tue, 19 Feb 2019 09:52:50 GMT’
‘server’, ‘envoy’
[2019-02-19 09:52:50.755][35][debug][filter] src/envoy/http/mixer/filter.cc:257] Called Mixer::Filter : onDestroy state: 2
[2019-02-19 09:52:50.755][35][debug][pool] external/envoy/source/common/http/http1/conn_pool.cc:122] [C6024] client disconnected
[2019-02-19 09:52:50.755][35][debug][filter] src/envoy/http/mixer/filter.cc:273] Called Mixer::Filter : log

In the target application, I don’t see any 503 whatsoever. Everything is always 200:

“GET /client.min.js?version=1.7.0 HTTP/1.1” 200 8774206

Anyone can help out here to understand what’s going on? If I remove Istio and route it using Nginx Ingress Controller, then everything works perfectly.

Thanks!

I see the same issue, with Istio service mesh sidecars, Istio IngressGateway, and Ambassador API Gateway. I believe the root cause is a race condition between the TCP keepalive setting and the stream idle timeout. Istio does connection pooling of HTTP connections under the covers. If a connection is idle for 5 minutes, the TCP keepalive should fire to keep the conneciton alive. But, the stream idle timeout is also 5 minutes, so a race condition exists.

I've commented more about it here:
https://github.com/istio/istio/issues/9906

We were able to correct the issue by setting the spec->http->retries->attempts field on the VirtualService manifest. In Istio v1.1, they've introduced the retryOn configuration setting to Envoy, so you can retry on only 5xx errors. They've also introduced the ability to set the steam idle timeout. I haven't tested this hypothesis against the snapshot builds of 1.1--we'll resume testing of this when 1.1 goes GA.

After some research, I think I spotted the issue for my use case: it's related to the HAProxy I use as the load balancer to the Istio Ingress Gateway, so when I skip the HAProxy and use directly a NodePort, it works perfectly.

I presume there is some misalignment with low-level TCP configuration between the HAProxy and the Envoy running inside the Ingress Gateway (or something similar), but I still need to figure out what's going on.

Any help on this regard would be highly appreciated.

This is something the OpenShift folks might have some thoughts on as OpenShift by default uses an HAProxy ingress. @knrc @jwendell ?

@christian-posta I'm not aware of this impacting us as of yet but I'll keep an eye on it, the race mentioned above does seem to be a likely suspect. @emedina if you have a reproducer then I can take a look to see how it behaves on our system.

@knrc @christian-posta Going through testing to try to figure out the right configuration, but as a initial teaser, changing option httpclose to option http-server-close seems to lower the number of 503s, although they still appear.

Also did some basic testing using Google Cloud with httpbin and fortio running a minimal load test (10 qps using 10 threads during 10m) against /html and /json endpoints, and I get 503 errors:

Code 200 : 5998 (100.0 %)
Code 503 : 2 (0.0 %)

I know that Google Cloud may not be using the latest Istio, but I'm wondering how Google can provide this functionality when a basic test fails so easily.

And I presume this issue is still there... because I keep getting it even with latest 1.1-rc.

More feedback...

Tried 1.1.0-rc.0 in a clean LAB environment running v1.13.2 and it's still throwing sporadic 503 with a simple load test using fortio and two endpoints of httpbin:

kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 600s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:32500/json

kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 600s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:32500/html

As you can see, I'm using the NodePort to call the Ingress Gateway, so no external impact by HAProxy (just in case). Additionally, this is definitively a very low demanding load test running for 10m. However, there are several 503 errors:

From the endpoint /html:

08:58:38 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
08:58:42 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
08:58:45 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
09:06:01 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
From the endpoint /json:

09:05:47 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
09:05:56 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
09:06:00 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)

The logs from the Ingress Gateway are:

# kail -l app=istio-ingressgateway --since 10m | grep ' 503 '
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T08:58:38.611Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 2 - "174.16.249.64" "fortio.org/fortio-1.3.1" "ad930b2f-b08c-99bf-93b3-e93d4d3aa475" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:33416 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T08:58:42.641Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 19 - "174.16.249.64" "fortio.org/fortio-1.3.1" "447f4df1-99fe-9efa-9302-f2084120baaa" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:33421 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T08:58:45.874Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 6 - "174.16.249.64" "fortio.org/fortio-1.3.1" "13c3e002-6f69-9575-a642-3e130cc1aeea" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:33415 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T09:05:47.855Z] "GET /json HTTP/1.1" 503 UC "-" 0 57 15 - "174.16.249.64" "fortio.org/fortio-1.3.1" "96cb8b74-7a29-9829-8bde-30d58b70d353" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:33554 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T09:05:55.955Z] "GET /json HTTP/1.1" 503 UC "-" 0 57 26 - "174.16.249.64" "fortio.org/fortio-1.3.1" "dce86649-0f9e-946f-b15a-4e48300f45b8" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:33561 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T09:05:59.924Z] "GET /json HTTP/1.1" 503 UC "-" 0 57 146 - "174.16.249.64" "fortio.org/fortio-1.3.1" "fb921a6f-2f04-993b-9e45-25349c3962f1" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:33550 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T09:06:01.426Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 0 - "174.16.249.64" "fortio.org/fortio-1.3.1" "e821b8b9-4273-9814-bb3a-e13d6749a35d" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:33419 -

No apparent issues in httpbin itself;

# kubectl logs -f httpbin-54f5bb4957-sgt4v
[2019-02-20 16:32:08 +0000] [1] [INFO] Starting gunicorn 19.9.0
[2019-02-20 16:32:08 +0000] [1] [INFO] Listening at: http://0.0.0.0:80 (1)
[2019-02-20 16:32:08 +0000] [1] [INFO] Using worker: gevent
[2019-02-20 16:32:08 +0000] [10] [INFO] Booting worker with pid: 10

And eventually the fortio load test concludes reporting the issues:

For the endpoint `/json`:

Code 200 : 5997 (100.0 %)
Code 503 : 3 (0.1 %)

For the endpoint `/html`:

Code 200 : 5996 (99.9 %)
Code 503 : 4 (0.1 %)

This scenario can be reproduced consistently and always throws 503 errors.

Also, I increased the number of replicas to 3 for the httpbin service and 2 for the Ingress Gateway in order to discard issues with capacity, but still getting 503 errors on each load test.

And another run of 30m showing, again. 503 errors:

kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 1800s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:32500/json

Code 200 : 17995 (100.0 %)
Code 503 : 5 (0.0 %)
kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 1800s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:32500/html

Code 200 : 17993 (100.0 %)
Code 503 : 7 (0.0 %)
# kail -l app=istio-ingressgateway | grep ' 503 '
istio-system/istio-ingressgateway-5979b99885-ljzdl[istio-proxy]: [2019-02-24T09:44:10.421Z] "GET /json HTTP/1.1" 503 UC "-" 0 57 0 - "174.16.249.64" "fortio.org/fortio-1.3.1" "b47f435f-063d-9e38-aff3-ac1202ea58d1" "httpbin.org" "174.16.71.77:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.111.24:80 174.16.249.64:39987 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T09:44:46.453Z] "GET /json HTTP/1.1" 503 UC "-" 0 57 2 - "174.16.249.64" "fortio.org/fortio-1.3.1" "537415ba-d9e4-9eae-85e8-6b24a632f924" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:39990 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T09:44:48.216Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 0 - "174.16.249.64" "fortio.org/fortio-1.3.1" "a5b4e97c-87dc-9232-8c77-53ea0be5d685" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:40045 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T09:44:49.215Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 2 - "174.16.249.64" "fortio.org/fortio-1.3.1" "3181b1bc-9776-94db-96d0-44074f0b8b3e" "httpbin.org" "174.16.111.12:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:40046 -
istio-system/istio-ingressgateway-5979b99885-ljzdl[istio-proxy]: [2019-02-24T09:45:21.238Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 1 - "174.16.249.64" "fortio.org/fortio-1.3.1" "c5996f8a-3b5e-95ae-9825-f4530254ed95" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.111.24:80 174.16.249.64:41439 -
istio-system/istio-ingressgateway-5979b99885-ljzdl[istio-proxy]: [2019-02-24T09:45:26.469Z] "GET /json HTTP/1.1" 503 UC "-" 0 57 0 - "174.16.249.64" "fortio.org/fortio-1.3.1" "5c2fa8aa-ab1a-90ed-9262-f61e75dfe1d6" "httpbin.org" "174.16.111.12:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.111.24:80 174.16.249.64:41364 -
istio-system/istio-ingressgateway-5979b99885-ljzdl[istio-proxy]: [2019-02-24T09:45:50.250Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 6 - "174.16.249.64" "fortio.org/fortio-1.3.1" "723d6e0c-c8bd-98ab-b947-9d39bc0af8f9" "httpbin.org" "174.16.71.77:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.111.24:80 174.16.249.64:41458 -
istio-system/istio-ingressgateway-5979b99885-ljzdl[istio-proxy]: [2019-02-24T09:46:44.507Z] "GET /json HTTP/1.1" 503 UC "-" 0 57 0 - "174.16.249.64" "fortio.org/fortio-1.3.1" "2fdb1ce0-6eff-9fd6-bec2-7a258fb3b26a" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.111.24:80 174.16.249.64:41487 -
istio-system/istio-ingressgateway-5979b99885-ljzdl[istio-proxy]: [2019-02-24T10:05:55.152Z] "GET /json HTTP/1.1" 503 UC "-" 0 57 3 - "174.16.249.64" "fortio.org/fortio-1.3.1" "417990a5-de4c-943a-874c-eefc6ee38927" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.111.24:80 174.16.249.64:57217 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T10:05:52.933Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 115 - "174.16.249.64" "fortio.org/fortio-1.3.1" "32045583-0442-94fd-9bf0-2148c9dc0f18" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:57243 -
istio-system/istio-ingressgateway-5979b99885-ljzdl[istio-proxy]: [2019-02-24T10:06:38.945Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 1 - "174.16.249.64" "fortio.org/fortio-1.3.1" "dbde8e93-5d86-9d5a-b041-c80ecc27119e" "httpbin.org" "174.16.249.69:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.111.24:80 174.16.249.64:57274 -
istio-system/istio-ingressgateway-5979b99885-rxd66[istio-proxy]: [2019-02-24T10:09:17.031Z] "GET /html HTTP/1.1" 503 UC "-" 0 57 2 - "174.16.249.64" "fortio.org/fortio-1.3.1" "f1cc53c4-5918-93c8-8334-cea08c51c423" "httpbin.org" "174.16.71.77:80" outbound|8000||httpbin.default.svc.cluster.local - 174.16.71.101:80 174.16.249.64:57198 -

Now a valid workaround is to use the retry mechanism configurable through HTTPRetry in the VirtualService, and when 1.1 is finally released, the retryOn with 5xx.

Testing this out with 1.1.0-rc.0 and just a single attempts: 1 results in no 503 whatsoever:

#  kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 1800s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:32500/json

Code 200 : 18000 (100.0 %)

# kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 1800s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:32500/html

Code 200 : 18000 (100.0 %)

Any reason you're on the first RC of 1.1? There were a lot of issues on the
early releases, including 1.0 regressions. Not sure if any are specific to
this issue or not though.

On Sun, Feb 24, 2019, 9:55 AM Enrique Medina Montenegro <
[email protected]> wrote:

Now a valid workaround is to use the retry mechanism configurable through
HTTPRetry in the VirtualService, and when 1.1 is finally released, the
retryOn with 5xx.

Testing this out with 1.1.0-rc.0 and just a single attempts: 1 results in
no 503 whatsoever:

kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 1800s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:32500/json

Code 200 : 18000 (100.0 %)

kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 1800s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:32500/html

Code 200 : 18000 (100.0 %)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/istio/istio/issues/9113#issuecomment-466783950, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ALnFDJJ0QbwJxsuZ7YfeWaoGgcbA75MRks5vQqfcgaJpZM4XFnSU
.

@jaygorrell, please take a look at the issue from the beginning, not just my last comment. You will realize that this is happenning since 1.0.1.

No, I haven't tried each and every 1.1 snapshot, but 1.0.1, 1.0.4, 1.0.5, 1.0.6 and latest release 1.1.0-rc.0 show the same issue.

My apologies - I didn't realize a rc was out and read that as snapshot.0. Sorry about that.

@emedina This is exactly the behavior we see. We have retries turned on as a workaround, but clearly it's not a long-term solution

the OpenShift folks might have some thoughts on as OpenShift by

@christian-posta In my testing, I experience the problem with Istio Ingressgateway, Istio service mesh sidecars, and Ambassador API Gateway (also built atop Envoy). I do not see this error at all when using an Nginx Ingress Gateway.

curious if mixer policy deployment is enabled in your env? Note: mixer will be part of your request flow even if you are not using any mixer policy. We do have a PR in flight to disable mixer policy by default.

@dankirkpatrick I am not sure this is the same as the tcp keep alive issue.. See my comment here: https://github.com/istio/istio/issues/9906#issuecomment-467992199
Long story short, for tcp keep alive situations before 1.1, retry seems to make sense..

However, the issue that @emedina demonstrates seems different. The connection is kept occupied with a constant stream of 10 rps. The only scenario where the op is hitting the same tcp keep alive issue is when the fortio client ended up doing persistent HTTP to one of the replicas, but that seems unlikely..

FYI I'm trying to reproduce this locally to investigate

At first glance, this looks like a good usecase for the new retry stuff https://github.com/istio/istio/pull/10566

@knrc has a PR out for this and should be added to the assignees once he's inducted into the github org

For anyone who has this issue for 1.0, please look at @Stono's tips at https://github.com/istio/istio/issues/12183

@duderino I'm not sure this should be assigned to me ... I'm OOO next week.

We shouldn't need retry for when no pod is going down. Also retries wouldn't work for POST.

Retries do work for POST - that's what 100 continue is for. They don't work after the body is sent, but in this case ( TCP closed ) they work very well, all browsers do this.

My test

 kubectl -n fortio-control exec -it cli-fortio-7675dfcf59-ztdmn -- fortio load -qps 10 -t 600s -c 10 -httpbufferkb 8569 http://httpbin.httpbin.svc.cluster.local:8000/json
22:02:23 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
22:04:18 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)

Note the 2 timestamps (02:23, 04:18)

The attached tcpdump

# remember to start injector with  global.proxy.enableCoreDump=true
tcpdump -n -s0 -i lo -w /tmp/tcpdump.out port 80

There are 2 RST around the time, and if you follow the stream you see that both TCP connections
end with envoy sending the request.

All other connections are terminated with normal FIN/FIN.

tcpdump.pcap.zip

I think this is 'fixed' for 1.1, but we didn't have consensus so I'll try to capture the objections and alternatives for the record....

The problem: envoy as a client borrows a connection from a connection pool that has already been closed by the peer. Since this is a routine occurrence we don't want to lose the request and return a 503 to the downstream client.

Approach 1: Envoy should notice the reset/close when sending the request to the upstream and automatically borrow a new connection from the pool or initiate a new connection. AFAIK Envoy does not support this today so we can't get this into Istio 1.1. @PiotrSikora @htuch @jplevyak @alyssawilk

Approach 2: @knrc's https://github.com/istio/istio/pull/12188 which was built upon work done by @nmittler: Configure Envoy to retry these requests. This covers the connection pool race condition, but is imperfect because it may retry additional cases we don't want to retry. This is the current solution going into 1.1.

Approach 3: @rshriram abandoned https://github.com/istio/istio/pull/12227 to set max requests per connection to 1. This certainly avoids the race condition, but also completely disables connection pooling. Several of us (@costinm @knrc @mandarjog @duderino) were not comfortable doing this without substantial performance testing and doing that would delay 1.1 too much.

Approach 4: We could instead set the an idle timeout for connections using the mechanism introduced in https://github.com/envoyproxy/envoy/pull/2714. The idea is that we can close idle connections and remove them from the connection pool before the peer is likely to close them. The peer can still close them for other reasons (e.g., overload) but in theory this would reduce the number of 503s without potentially retrying requests we don't want retried. The concern I have with this one is that it's not clear to me that the idle timeout only applies to sockets with no active requests on them. If there were active requests with infrequent activity (e.g,. long polls) we might lose those too.

So Approach 2 IMHO is the least risky option for 1.1, but it's arguable.

@knrc reports he still sees the rare 503 (about 7 in 1 hour at 10 rps which means a probability of about 0.0002 per request) and is continuing to investigate. Demoting to P1 to cover the remaining work

Belatedly agreed: 1) still has inherent races. 2) does have some risk, especially for non-idempotent requests, but is the best option for avoiding 503s. 3) has been historically super expensive for the proxy I've run so I agree with the folks who had perf concerns. I'd say 4 has higher risk of not getting it right first, but may be safer if you need to avoid spurious retries.

Update for today's call

I have traces from both ingressgateway and httpbin sides, the error is happening when the sidecar returns RST/ACK in response to the request (GET in this case) but without trying to talk with the application. The response from ingressgateway to the caller is then a 503.

I'm still investigating the reason behind the RST/ACK from the sidecar.

BTW I can now reproduce these after approx 10-15 mins

I don't know if it helps debugging, but with everything we do in https://github.com/istio/istio/issues/12183 - we only ever see the 503's being reported from the destination, not the source. Which says to me the retries and outlier detection are protecting the source from the problem.

screenshot 2019-03-07 at 10 56 51

Also having sporadic 503 errors. In my case it's coming from a simple health check. The 503 does not come from the application. It comes from the istio-proxy. Found this when my pods were failing health checks, yet the application logs were 100% devoid of 503s. The istio-proxy logs showed about 2.9% of responses were 503 UC, the rest are 200.

All 503 responses had the UC flag.

From the envoy docs:

UC: Upstream connection termination in addition to 503 response code.

Running Istio 1.0.6
GKE 1.10.x

PS -- This is the only deployment that is throwing 503s like this. However... I may have a malformed ip address in a ServiceEntry.

The malformed IP address happens to be the FQDN of a service that's on the same cluster but outside the mesh. The service does exist. The FQDN is correct. Resolution is set to DNS. Not sure why this is trying to get read as an IP Address...


Logs


[2019-03-07 20:10:30.719][17][warning][config] bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_mux_subscription_lib/common/config/grpc_mux_subscription_impl.h:70] gRPC config for type.googleapis.com/envoy.api.v2.ClusterLoadAssignment rejected: malformed IP address: api-pricing.pricing.svc.cluster.local. Consider setting resolver_name or setting cluster type to 'STRICT_DNS' or 'LOGICAL_DNS'
...
This line shows up in the logs 2,141 times

Upgrading to Istio 1.1.0 has given us a more resilient installation, in that the 503 responses are being handled such that they are less disruptive to our applications.

Istio 1.1.0
Kubernetes : 1.13.4_1515

However, we are nevertheless seeing a continuous stream of 503 responses across all of our pods, making them all apparently unhealthy.

It would be highly preferable to see these eliminated so that we can see a correct health status for our pods.

I'm also intrigued to see that we have many thousands of messages from the istio-proxy as shown below. If there is anything further I can try/provide to help track down this issue I can do so.

[2019-03-29 09:26:31.637][18][warning][misc] [external/envoy/source/common/protobuf/utility.cc:129] Using deprecated option 'envoy.api.v2.route.Route.per_filter_config'. This configuration will be removed from Envoy soon. Please see https://github.com/envoyproxy/envoy/blob/master/DEPRECATED.md for details.
[2019-03-29 09:26:31.637][18][warning][misc] [external/envoy/source/common/protobuf/utility.cc:129] Using deprecated option 'envoy.api.v2.route.Route.per_filter_config'. This configuration will be removed from Envoy soon. Please see https://github.com/envoyproxy/envoy/blob/master/DEPRECATED.md for details.
[2019-03-29 09:26:31.637][18][warning][misc] [external/envoy/source/common/protobuf/utility.cc:129] Using deprecated option 'envoy.api.v2.route.Route.per_filter_config'. This configuration will be removed from Envoy soon. Please see https://github.com/envoyproxy/envoy/blob/master/DEPRECATED.md for details.

@TonyJDavies re the log spam: https://github.com/istio/istio/issues/12642

@TonyJDavies is your 503 happened during upgrade or after upgrade? Also, are mixer check enabled during your upgrade?

FYI. There is a chance some of these 503s might be related to this bug where Envoy doesn't acknowledge a FIN packet immediately after getting a response from upstream and returns the connection to the connection pool instead, which will trigger a 503 on the next write to the stream: https://github.com/envoyproxy/envoy/issues/6190.

We run Envoy in a cluster that currently gets on average about 1k requests/minute and used to see 503s left and right across different services after about 15min or so of a fresh deployment. We swept them under the rug by doing retries, but as others mentioned, this can't be a long term solution due to its potential side effects on non-idempotent requests. We recently started playing around with idle upstream timeout values as introduced by https://github.com/envoyproxy/envoy/pull/2714 and haven't had any 503s within the past week (with the same amount of traffic). Would be interesting to see if envoyproxy/envoy/issues/6190 would fix any/most of these mysterious connection resets but I think it'd make sense for Istio to at the very least support this idle timeout option for upstream connections (option 4).

@linsun - The 503s are generated continuously by the proxy, as shown by the SysDig screen shot below, which was restricted to show only the istio-proxy containers. Early peaks coincide with load being pushed through the system.
image

The mixer check is enabled as follows:

#
# mixer configuration
#
# @see charts/mixer/values.yaml, it takes precedence
mixer:
  enabled: true
  policy:
    # if policy is enabled the global.disablePolicyChecks has affect.
    enabled: true

  telemetry:
    enabled: true

@arianmotamedi Would you mind sharing some configuration snippets about how you have set up the idle upstream timeouts in your Istio installation?

Thanks!

@arianmotamedi - It would be great if you could provide a link to info that would help configure those idle upstream timeouts as it's not easy to find out where/how that's done. Many thanks.

Idle upstream timeout (managed by the connection pool, not to be confused with lower level tcp keepalives) is not supported in Istio yet - which is kind of preventing us from switching to Istio. It's defined per cluster in Envoy: https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/core/protocol.proto#envoy-api-msg-core-httpprotocoloptions:

idle_timeout
(Duration) The idle timeout for upstream connection pool connections. The idle timeout is defined as the period in which there are no active requests. If not set, there is no idle timeout. When the idle timeout is reached the connection will be closed. Note that request based timeouts mean that HTTP/2 PINGs will not keep the connection alive.

Which would look like this:

{
  "connect_timeout":"0.25s",
  "hosts":[
    {
      "socket_address":{
        "address":"foo.bar",
        "port_value": 8080
      }
    }
  ],
  "name":"my-service",
  "type":"STRICT_DNS",
  "common_http_protocol_options": {
    "idle_timeout": "15s"
  }
}

@GregHanson could you look into adding this as a configuration for our users? This is a very recent feature from envoy, not sure if Istio proxy has that fix yet.

@TonyJDavies I would definitely recommend disable mixer or disable mixer check in your env. Even if you are not using mixer policy, the proxy still asks mixer to see if a call is allowed thus can introduce additional 503s for you.

@arianmotamedi @TonyJDavies @linsun is this a setting you would want exposed for configuration? Or can we just set a reasonable default value?

@mandarjog @jnativio any input on a recommended value from a performance standpoint? Do we want it default disabled as it is currently?

I think setting a default is better than what we have right now which is disabled. I am thinking the default would be under 1 min (either 15s or 30s).

As far as making it configurable, it may make sense down the road as some service may take more itme to respond but we can cross that bridge when we need it.

I definitely think it needs to be exposed as a configuration, and default should be disabled (like it is in Envoy) for backwards compatibility. There isn't a global "default" value to use here really, it depends on the underlying system's requirements as there is a trade-off between performance and stability in this case. Lowering this value can significantly affect performance especially if mtls is enabled, which might not be acceptable for latency-critical systems that can get around these 503s by other means (route level retries or even application-level retries). Ultimately for production environments, this idle timeout needs to be set to a value lower than the upstream keep-alive timeout, which needs to be measured per each upstream service.

Similarly in an environment with sufficient traffic, it is possible that a timeout might not occur for long periods of time (since each request resets the timer) to the point that upstream might return FIN immediately after a response which could trigger https://github.com/envoyproxy/envoy/issues/6190.

@GregHanson I think @arianmotamedi has a good point. Once this configuration is exposed we can try different values to evaluate the impact and make recommendations to customers.

@jnativio @arianmotamedi sounds like this isn't something we want configured for the entire mesh then so it sounds like it should be added to Istio DestinationRule which will allow configuration on a per service basis: https://github.com/istio/api/blob/master/networking/v1alpha3/destination_rule.proto#L147
Adding it to ConnectionPoolSettings seems the ideal location

@GregHanson exactly what I was thinking. I'm actually taking a stab at it right now; trying to understand Istio's codebase better :) should be able to open up a PR with the changes soon...

@GregHanson could you look into adding this as a configuration for our users? This is a very recent feature from envoy, not sure if Istio proxy has that fix yet.

@TonyJDavies I would definitely recommend disable mixer or disable mixer check in your env. Even if you are not using mixer policy, the proxy still asks mixer to see if a call is allowed thus can introduce additional 503s for you.

If you don't need mixer policy checks, then disable it. If you rely on that functionality you could try enabling the experimental mixer retries which were introduced with 1.1.0. Instructions for enabling it can be found here: https://github.com/istio/istio/issues/9992#issuecomment-466229569

@TonyJDavies I recommend you to check out this too https://github.com/istio/istio/issues/12183#issuecomment-477044517 -- how many services do you have? If it is lots of sidecar, implementing the sidecar config would be recommended as it has proved to be helpful for @Stono

@linsun - We have 122 services running across 183 pods, each of which have sidecars that are auto-injected. Having tried disabling mixer entirely, we see no significant reduction (if any) to the rate of 503s emanating from the istio-proxy containers.

@TonyJDavies thanks for trying that. Are you services communicating to each every other services or most of them aren't talking to others? Sidecar config is the other thing I'd recommend to implement for you if most services only talk to a few services. Christian explained it really well in this video: https://twitter.com/christianposta/status/1114251191838363648

We are working on getting the idle_timeout merged for you in 1.1, it is already merged in master I believe. cc @GregHanson if I'm wrong.

@TonyJDavies @emedina Istio 1.1.3 has been released, and it includes idle_timeout as a destination rule setting as discussed. Please go ahead and give it a shot, and let us know if it helps reduce the number of 503s in your cluster. I suggest starting with a 15s timeout, testing it out for a few days, and gradually increasing it if it remains stable.

@arianmotamedi Does this replace the workaround specified by @Stono in #12183?

1.1.3 isn't released yet according to GitHub.

On Tue, 16 Apr 2019, 4:29 pm Enrique Medina Montenegro, <
[email protected]> wrote:

@arianmotamedi https://github.com/arianmotamedi Does this replace the
workaround specified by @Stono https://github.com/Stono in #12183
https://github.com/istio/istio/issues/12183?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/istio/istio/issues/9113#issuecomment-483711549, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AALK7COSCISZX6ANK4Q5CSDPQXWGNANCNFSM4FYWOSKA
.

@Stono it's tagged in github, and https://istio.io/ defaults to Download 1.1.3 🤔

@emedina looks like the only thing done as part of https://github.com/istio/istio/issues/12183 is preStop hooks. That won't help with cases where upstream closes a connection but Envoy keeps it alive, so the next request immediately causes a 503 (https://github.com/envoyproxy/envoy/issues/6190, which looks like was merged yesterday, so should be fixed in the next version of Envoy). This idle timeout is a workaround for that.

@arianmotamedi So we actually need to combine them :-)

I will try to provide feedback ASAP on 1.1.3 then.

Click it, there are no artefacts.
This happens quite a lot, a pre release ends up on GitHub causing confusion

On Tue, 16 Apr 2019, 5:24 pm Enrique Medina Montenegro, <
[email protected]> wrote:

@arianmotamedi https://github.com/arianmotamedi So we actually need to
combine them :-)

I will try to provide feedback ASAP on 1.1.3 then.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/istio/istio/issues/9113#issuecomment-483733000, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AALK7CNU32CP52NR4IENST3PQX4SXANCNFSM4FYWOSKA
.

Ah, gotcha. Ok well I guess we'll keep waiting then 😁

Ok looks like it's really released this time.

503s are gone for me.

@zimmertr just to confirm, that's by setting an idle timeout in 1.1.3?

Sporadic 503 responses to curl simply disappeared after I upgraded from 1.1.2 to 1.1.3 without any further action.

@zimmertr Sorry if this info is elsewhere...I'm having a hard time keeping track of all the places.

Did you make any other changes when upgrading from 1.0.x to 1.1.2?

Thanks much.

@emedina @jaygorrell did either of you get a chance to try out idleTimeout? Did it help with the overall rate of 503s?

I did, thanks! Setting it to 3s (nodejs server is 5s) eliminated 100% of my errors _during runtime_. I did continue to get some 503s but was able to correlate them with deployments. The Istio default retry _should_ catch those but I tracked that problem down to an Envoy issue: https://github.com/envoyproxy/envoy/issues/6726

@arianmotamedi Sorry, but haven't been able yet to test this out due to other priorities. I will send feedback back whenever I find a chance, hopefully next week.

Hello,
I'm seeing some of these now as well. I've idleTimeout set to 30seconds. My services were healthy at all times, no errors or failing health checks. It looks like https://github.com/envoyproxy/envoy/issues/6190.

Screenshot 2019-05-06 at 10 41 56

I actually think it's this: https://github.com/istio/istio/issues/9906#issuecomment-465739430

As a result I'm updating my destinationrules to be:

spec:
  host: app 
  trafficPolicy:
    connectionPool:
      http:
        idleTimeout: 4m
      tcp:
        tcpKeepalive:
          time: 2m
          interval: 30s

It looks like this problem is actually much bigger than I thought, albeit seemingly being captured by retries.

The 503's are only reported from the destination, not the source which is one of the reasons I missed them in the past.

Screenshot 2019-05-06 at 18 38 56

@arianmotamedi I could test it out this week and the results are somehow better in 1.1.5, although not perfect. Let me explain.

I basically run a fortio load test against httpbin (as in my previous comments) using the different versions of istio where I have found issues in the past, and the new 1.1.5. This has been done in a controlled LAB environment. Here are the results:

1.0.0 --> Sporadic 503 errors;

# kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 600s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:31380/json
Fortio 1.3.1 running at 10 queries per second, 8->8 procs, for 10m0s: http://10.138.132.8:31380/json
15:41:15 I httprunner.go:82> Starting http test for http://10.138.132.8:31380/json with 10 threads at 10.0 qps
Starting at 10 qps with 10 thread(s) [gomax 8] for 10m0s : 600 calls each (total 6000)
15:42:06 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
#  kubectl exec -ti fortio-deploy-5588f84b45-5xx5v -- fortio load -qps 10 -t 600s -c 10 -httpbufferkb 8569 -H "Host: httpbin.org" http://10.138.132.8:31380/html
Fortio 1.3.1 running at 10 queries per second, 8->8 procs, for 10m0s: http://10.138.132.8:31380/html
15:41:57 I httprunner.go:82> Starting http test for http://10.138.132.8:31380/html with 10 threads at 10.0 qps
Starting at 10 qps with 10 thread(s) [gomax 8] for 10m0s : 600 calls each (total 6000)
15:46:00 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
15:46:02 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)



md5-f95b9556be4ec72cd63bb8b89ff6436b



06:45:09 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
06:45:10 E http_client.go:542> Unable to connect to 10.138.132.8:31380 : dial tcp 10.138.132.8:31380: connect: connection refused
06:45:26 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)



md5-00708aebac5f1505013940ffa379c455



Code 200 : 6000 (100.0 %)



md5-5319eab434918a371b9db6ac9ca1982d



sort_desc(sum(istio_requests_total) by (response_code) > 0)

{response_code="200"} | 986519
-- | --
{response_code="503"} | 18



md5-13faacb851f0d70c00eca5cf7d9dac5d



istio_requests_total{connection_security_policy="none",destination_app="httpbin",destination_principal="unknown",destination_service="httpbin.default.svc.cluster.local",destination_service_name="httpbin",destination_service_namespace="default",destination_version="v1",destination_workload="httpbin",destination_workload_namespace="default",instance="174.16.111.61:42422",job="istio-mesh",permissive_response_code="none",permissive_response_policyid="none",reporter="destination",request_protocol="http",response_code="503",response_flags="UC",source_app="istio-ingressgateway",source_principal="unknown",source_version="unknown",source_workload="istio-ingressgateway",source_workload_namespace="istio-system"}    16

To be honest, I cannot arrive to a clear conclusion: 1.1.5 seems to work better than the previous releases in the sense that the same load test with fortio against httpbin does not result in a single 503, but then, why are there metrics complaining about sporadic 503s?

Next, I will move the tests to my non-prod environment with sort of real workloads to see if I get the same behaviour and I will keep this thread posted.

@emedina I'm willing to bet it is https://github.com/istio/istio/issues/13848

Change your pom query to group by reporter and see if the reporter is destination and let me know.

If you look in Jaeger you'll see traces like mine on that issue.

Be great if you could confirm

@Stono As you expected :-)

image

Do you see the response code UC in your access logs, traces or debug itsio-proxy logs?

@emedina please try setting your idleTimeout value to 15s and give it another shot. We're testing two different things here, and before we diverge too much I want to confirm whether this is related to https://github.com/envoyproxy/envoy/issues/6190 or something entirely different.

@Stono I do see the UC:

response_code="503",response_flags="UC"

@arianmotamedi I still see the 503 even with the configuration you suggested:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: httpbin
spec:
  host: httpbin.default.svc.cluster.local
  exportTo:
    - "*"
  trafficPolicy:
    connectionPool:
      http:
        idleTimeout: 15s
      tcp:
        tcpKeepalive:
          time: 2m
          interval: 30s

Awesome; same issue then @arianmotamedi!

@emedina would be great if you could capture and post Envoy logs for a 503 scenario.

@arianmotamedi We only see info about the 503 error in the access log of the istio-proxy from httpbin:

[2019-05-13T10:00:42.203Z] "GET /json HTTP/1.1" 200 - "-" 0 429 5 4 "10.138.132.9" "fortio.org/fortio-1.3.1" "8a8312d2-4af0-4acf-a660-a545171f8077" "httpbin.org" "127.0.0.1:80" inbound|8000|http|httpbin.default.svc.cluster.local - 174.16.249.130:80 10.138.132.9:0 -
[2019-05-13T10:00:42.208Z] "GET /json HTTP/1.1" 503 UC "-" 0 95 1 - "10.138.132.9" "fortio.org/fortio-1.3.1" "5ff7ca9d-6374-4f9c-93ab-ab7f094ce43f" "httpbin.org" "127.0.0.1:80" inbound|8000|http|httpbin.default.svc.cluster.local - 174.16.249.130:80 10.138.132.9:0 -
[2019-05-13T10:00:42.202Z] "GET /json HTTP/1.1" 200 - "-" 0 429 7 5 "10.138.132.9" "fortio.org/fortio-1.3.1" "a30400d9-3339-4e31-a210-baead0cac600" "httpbin.org" "127.0.0.1:80" inbound|8000|http|httpbin.default.svc.cluster.local - 174.16.249.130:80 10.138.132.9:0 -

If we enable the trace level we get some additional logs:

[2019-05-13 10:00:52.027][19][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:37] [C1] read error: Resource temporarily unavailable
[2019-05-13 10:00:53.860][28][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:37] [C5] read error: Resource temporarily unavailable
[2019-05-13 10:00:53.860][19][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:37] [C4] read error: Resource temporarily unavailable

Resource temporarily unavailable what did the traffic look like at the time these were generated? Can you run ulimit -n in the httpbin container as well as its sidecar?

The other thing that concerns me is idle_timeout isn't being applied to the inbound cluster, which is what all of these 503s are coming from. Not saying that's the reason for these, but it's definitely something that can cause issues if the call chain doesn't respect this value all the way through.

@arianmotamedi Our previous tests did not include any DestinationRule with the idle_timeout. Anyway, I have added them now and the results are the same ones. The values returned both by httpbin and its istio-proxy sidecar for ulimit -n are the same ones: 65535.

The errors displayed by the logs belonging to the istio-proxy sidecar do show up even before starting off the fortio load test:

[2019-05-14 07:54:20.785][27][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:37] [C2] read error: Resource temporarily unavailable

More info, in case it is useful, about the info logged by istio-proxy sidecar when it starts up:

2019-05-14T06:47:31.822813Z     info    Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster httpbin.default --service-node sidecar~174.16.249.162~httpbin-54f5bb4957-w7476.default~default.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warning --concurrency 2]

[2019-05-14 06:47:31.851][20][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.

And also the recurring logs once it has already started up:

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:376] [C1] writing 209 bytes, end_stream false

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/conn_manager_impl.cc:1443] [C1][S6763301124091004476] encoding data via codec (size=2955 end_stream=true)

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:376] [C1] writing 2967 bytes, end_stream false

[2019-05-14 07:54:59.193][18][trace][main] [external/envoy/source/common/event/dispatcher_impl.cc:133] item added to deferred deletion list (size=1)

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/conn_manager_impl.cc:811] [C1][S6763301124091004476] decode headers called: filter=0x5222780 status=1

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:384] [C1] parsed 110 bytes

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:440] [C1] socket event: 2

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:508] [C1] write ready

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:66] [C1] write returns: 3176

[2019-05-14 07:54:59.193][18][trace][main] [external/envoy/source/common/event/dispatcher_impl.cc:53] clearing deferred deletion list (size=1)

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:440] [C1] socket event: 3

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:508] [C1] write ready

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:478] [C1] read ready

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:23] [C1] read returns: 105

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:37] [C1] read error: Resource temporarily unavailable

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:363] [C1] parsing 105 bytes

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:476] [C1] message begin

[2019-05-14 07:54:59.193][18][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:243] [C1] new stream

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:331] [C1] completed header: key=Host value=127.0.0.1:15000

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:331] [C1] completed header: key=User-Agent value=Go-http-client/1.1

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:442] [C1] headers complete

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:331] [C1] completed header: key=Accept-Encoding value=gzip

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:463] [C1] message complete

[2019-05-14 07:54:59.193][18][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:580] [C1][S1215648406998790148] request headers complete (end_stream=true):

':authority', '127.0.0.1:15000'

':path', '/listeners'

':method', 'GET'

'user-agent', 'Go-http-client/1.1'

'accept-encoding', 'gzip'



[2019-05-14 07:54:59.193][18][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1040] [C1][S1215648406998790148] request end stream

[2019-05-14 07:54:59.193][18][debug][admin] [external/envoy/source/server/http/admin.cc:1068] [C1][S1215648406998790148] request complete: path: /listeners

[2019-05-14 07:54:59.193][18][debug][http] [external/envoy/source/common/http/conn_manager_impl.cc:1305] [C1][S1215648406998790148] encoding headers via codec (end_stream=false):

':status', '200'

'content-type', 'application/json'

'cache-control', 'no-cache, max-age=0'

'x-content-type-options', 'nosniff'

'date', 'Tue, 14 May 2019 07:54:59 GMT'

'server', 'envoy'



[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:376] [C1] writing 200 bytes, end_stream false

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/conn_manager_impl.cc:1443] [C1][S1215648406998790148] encoding data via codec (size=1469 end_stream=true)

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:376] [C1] writing 1481 bytes, end_stream false

[2019-05-14 07:54:59.193][18][trace][main] [external/envoy/source/common/event/dispatcher_impl.cc:133] item added to deferred deletion list (size=1)

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/conn_manager_impl.cc:811] [C1][S1215648406998790148] decode headers called: filter=0x421dc20 status=1

[2019-05-14 07:54:59.193][18][trace][http] [external/envoy/source/common/http/http1/codec_impl.cc:384] [C1] parsed 105 bytes

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:440] [C1] socket event: 2

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/connection_impl.cc:508] [C1] write ready

[2019-05-14 07:54:59.193][18][trace][connection] [external/envoy/source/common/network/raw_buffer_socket.cc:66] [C1] write returns: 1681

[2019-05-14 07:54:59.193][18][trace][main] [external/envoy/source/common/event/dispatcher_impl.cc:53] clearing deferred deletion list (size=1)

Not sure what you mean by "What did the traffic look like at the time there were generated?", but as I was trying to explain before, 503 do show up from minute 0 solely by internal requests made by the sidecar; for instance, to the /listeners endpoint. Here you can check a Prom graph with the query:

sort_desc(sum(istio_requests_total) by (response_code) > 0)

after triggering the load test in fortio:

Prom graph

The log from @emedina shows

  • the 503 connection C502 reuses upstream connection
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.960][95][debug][pool] external/envoy/source/common/http/http1/conn_pool.cc:89] [C502] using existing connection
  • The upstream connection is writable
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:326] [C502] writing 546 bytes, end_stream false
  • The upstream connection is somehow not readable
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:389] [C502] socket event: 3
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:457] [C502] write ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/connection_impl.cc:427] [C502] read ready
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][trace][connection] external/envoy/source/common/network/raw_buffer_socket.cc:21] [C502] read returns: 0
  • envoy will return 503 to downstream
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][connection] external/envoy/source/common/network/connection_impl.cc:451] [C502] remote close
preprod-cb/classification-helper-ui-76f9cdcd4d-rz6n9[istio-proxy]: [2018-10-03 08:47:42.961][95][debug][connection] external/envoy/source/common/network/connection_impl.cc:133] [C502] closing socket: 0

Theoretically it's not fully prevented: the upstream connection disconnection and picking this connection is concurrent. However, this envoy PR should reduce the chance.

To explain the envoy PR in high level: envoy will actively determine if an upstream connection is reusable. It is extremely useful when the load is small.

Just in case it adds value, we noticed in our testing this:

2019-05-16T10:35:22.536281Z     info    Envoy proxy is ready
[2019-05-16 10:39:37.050][159][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Listener.use_original_dst' from file lds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
2019-05-16T10:39:39.091322Z     warn    Epoch 0 terminated with an error: signal: killed
2019-05-16T10:39:39.091398Z     warn    Aborted all epochs
2019-05-16T10:39:39.091446Z     info    Epoch 0: set retry delay to 51.2s, budget to 1
2019-05-16T10:39:39.091502Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:40.535566Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:42.535192Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:44.535141Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:46.535474Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:48.535957Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:50.535082Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:52.534954Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:54.535242Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:56.534980Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:39:58.535081Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:00.535469Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:02.535113Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:04.535173Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:06.535167Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:08.535217Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:10.535106Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:12.535069Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:14.535300Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:16.535363Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:18.535101Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:20.535266Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:22.535072Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:24.535268Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:26.535153Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:28.535373Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:40:30.291638Z     info    Reconciling retry (budget 1)
2019-05-16T10:40:30.291701Z     info    Epoch 0 starting
2019-05-16T10:40:30.292853Z     info    Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster helpers-frontend.preprod-cb --service-node sidecar~174.16.212.227~helpers-frontend-6f6bbf67d8-ml9dc.preprod-cb~preprod-cb.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warning --concurrency 2]
[2019-05-16 10:40:30.424][170][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-16 10:40:30.424][170][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-16 10:40:30.424][170][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-16 10:40:30.428][170][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, no healthy upstream
[2019-05-16 10:40:30.428][170][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:49] Unable to establish new stream
2019-05-16T10:40:30.536001Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2019-05-16 10:40:31.748][170][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Listener.use_original_dst' from file lds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
2019-05-16T10:40:32.536486Z     info    Envoy proxy is ready
2019-05-16T10:44:24.827214Z     warn    Epoch 0 terminated with an error: signal: killed
2019-05-16T10:44:24.827268Z     warn    Aborted all epochs
2019-05-16T10:44:24.827306Z     info    Epoch 0: set retry delay to 1m42.4s, budget to 0
2019-05-16T10:44:24.827318Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:26.535103Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:28.535113Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:30.535230Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:32.534935Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:34.535446Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:36.535279Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:38.534907Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:40.535413Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:42.535061Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:44.535253Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:46.535479Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:48.535191Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:50.535295Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:52.534969Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:54.535118Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:56.535226Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:44:58.534991Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:00.535169Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:02.534914Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:04.535310Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:06.535356Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:08.535132Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:10.535097Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:12.535250Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:14.535366Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:16.535112Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:18.535147Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:20.535009Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:22.535121Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:24.535054Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:26.535283Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:28.534970Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:30.535078Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:32.534998Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:34.535158Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:36.536141Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:38.535153Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:40.535151Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:42.535169Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:44.535362Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:46.535528Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:48.535347Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:50.535247Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:52.535012Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:54.535382Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:56.535237Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:45:58.535155Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:46:00.535197Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:46:02.535105Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:46:04.535386Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:46:06.535204Z     info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection refused
2019-05-16T10:46:07.227412Z     info    Reconciling retry (budget 0)
2019-05-16T10:46:07.227480Z     info    Epoch 0 starting
2019-05-16T10:46:07.228459Z     info    Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster helpers-frontend.preprod-cb --service-node sidecar~174.16.212.227~helpers-frontend-6f6bbf67d8-ml9dc.preprod-cb~preprod-cb.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warning --concurrency 2]
[2019-05-16 10:46:07.320][182][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-16 10:46:07.320][182][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-16 10:46:07.320][182][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Cluster.hosts' from file cds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
[2019-05-16 10:46:07.325][182][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 14, no healthy upstream
[2019-05-16 10:46:07.325][182][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:49] Unable to establish new stream
2019-05-16T10:46:08.596368Z     info    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 0 rejected
[2019-05-16 10:46:08.745][182][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Listener.use_original_dst' from file lds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
2019-05-16T10:46:10.537085Z     info    Envoy proxy is ready
[2019-05-16 10:46:10.991][182][warning][misc] [external/envoy/source/common/protobuf/utility.cc:174] Using deprecated option 'envoy.api.v2.Listener.use_original_dst' from file lds.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/intro/deprecated for details.
2019-05-16T10:46:12.744250Z     warn    Epoch 0 terminated with an error: signal: killed
2019-05-16T10:46:12.744282Z     warn    Aborted all epochs
2019-05-16T10:46:12.744309Z     error   Permanent error: budget exhausted trying to fulfill the desired configuration
2019-05-16T10:46:12.744315Z     error   cannot start the e with the desired configuration
2019-05-16T10:46:12.744331Z     error

@emedina that looks like a separate issue tbh.

The upstream connection is somehow not readable

That's the issue; but why is it happening so much. For some reason this is happening between source and destination envoy far too frequently, we've had 308 so far today:

Screenshot 2019-05-16 at 13 50 12

As for the PR you mentioned, we tried an istio-proxy build with that back ported yesterday; it had no affect.

@Stono In my case, this is a test I was conducting for 10 minutes and it was working fine. But I increased it to 30 minutes and around the 10th minute, the istio-proxy in my Pod dies and Prom shows lots of response_flags="UF,URX".

Also if I correlate the istio-proxy sidecar in my Pod with the pilot logs I can see that at the same time:

istio-proxy sidecar
info    Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: read tcp 127.0.0.1:58856->127.0.0.1:15000: read: connection reset by peer
istio pilot
2019-05-16T14:06:59.566667Z     info    ads     ADS:CDS: REQ 174.16.212.227:58724 sidecar~174.16.212.227~helpers-frontend-6f6bbf67d8-ml9dc.preprod-cb~preprod-cb.svc.cluster.local-64 582.059µs raw: node:<id:"sidecar~174.16.212.227~helpers-frontend-6f6bbf67d8-ml9dc.preprod-cb~preprod-cb.svc.cluster.local" cluster:"helpers-frontend.preprod-cb" metadata:<fields:<key:"CONFIG_NAMESPACE" value:<string_value:"preprod-cb" > > fields:<key:"INTERCEPTION_MODE" value:<string_value:"REDIRECT" > > fields:<key:"ISTIO_META_INSTANCE_IPS" value:<string_value:"174.16.212.227,174.16.212.227" > > fields:<key:"ISTIO_PROXY_SHA" value:<string_value:"istio-proxy:73fa9b1f29f91029cc2485a685994a0d1dbcde21" > > fields:<key:"ISTIO_PROXY_VERSION" value:<string_value:"1.1.3" > > fields:<key:"ISTIO_VERSION" value:<string_value:"1.1.6" > > fields:<key:"POD_NAME" value:<string_value:"helpers-frontend-6f6bbf67d8-ml9dc" > > fields:<key:"app" value:<string_value:"helpers-frontend" > > fields:<key:"group" value:<string_value:"helpers" > > fields:<key:"istio" value:<string_value:"sidecar" > > fields:<key:"pod-template-hash" value:<string_value:"6f6bbf67d8" > > fields:<key:"version" value:<string_value:"v1" > > > locality:<> build_version:"73fa9b1f29f91029cc2485a685994a0d1dbcde21/1.11.0-dev/Clean/RELEASE/BoringSSL" > type_url:"type.googleapis.com/envoy.api.v2.Cluster"

Could find this post in relation:

https://ervikrant06.github.io/istio/kubernetes/Kubernetes-istio-logs/

More findings in our testings throughout the whole week...

We figured out that the issue with the istio-proxy restarting was due to memory, so we increased the request/limits and during a 1h test that was always throwing several 503 errors, we have not seen any! The memory was initially defaulted to 128Mi and we increased it to 512Mi.

BTW, we also upgraded to 1.1.6.

Next step is to use the --addNamespace in the pilot configuration to only include those namespaces we care (not the whole cluster) in order to avoid the 512Mi memory consumption in each and every sidecar.

You can do that with Sidecar mate too, rather than a pilot flag:

- apiVersion: networking.istio.io/v1alpha3
  kind: Sidecar
  metadata:
    creationTimestamp: "2019-03-27T18:17:55Z"
    generation: 1
    labels:
      app: sauron-web
      group: delivery-platform
      heritage: Tiller
      release: sauron-web
    name: sauron-web-showcase-policy
    namespace: sauron-web
    resourceVersion: "171174347"
    selfLink: /apis/networking.istio.io/v1alpha3/namespaces/sauron-web/sidecars/sauron-web-showcase-policy
    uid: a43a900a-50bc-11e9-905c-42010aa402ab
  spec:
    egress:
    - hosts:
      - abtest-allocator/*
      - arval-pch/*
      - finance-partner-config-service/*
      - istio-system/zipkin.istio-system.svc.cluster.local
      - istio-system/jaeger-agent.istio-system.svc.cluster.local
      - istio-system/istio-telemetry.istio-system.svc.cluster.local
      - istio-system/istio-policy.istio-system.svc.cluster.local
      - kube-system/kube-dns.kube-system.svc.cluster.local
      - core-system/logstash.core-system.svc.cluster.local

The format is namespace/service

My sidecars float around 40meg

@emedina envoy upstream crash is one of the possible causes of downstream envoy 503. The envoy memory usage is highly affected by

  • rps: envoy need to buffer the transporting content
  • mesh size: sidecar resource is supposed to provide partial view of the mesh hence reduce the envoy config
  • mesh change: envoy provide drain manager at the cost of memory usage during the drain. There is no official document to describe what subset of mesh mutation is casting load to envoy as the implementation details are changing all the time

@Stono I used your configuration and it perfectly suits our needs, so thanks for the hint.

On the other side, I can sort of reach conclusions on my side:

  • From 1.1.6 everything seems to be working for us, and no more 503 are seen at the app level; only minor UCs in Prom (example from a 1h load test just finished now):

Query:

istio_requests_total{response_code="503",destination_service_name!="clearance-helper-ui",destination_service_namespace="preprod-cb"}

Result:

istio_requests_total{connection_security_policy="none",destination_app="classification-helper-ui",destination_principal="unknown",destination_service="classification-helper-ui.preprod-cb.svc.cluster.local",destination_service_name="classification-helper-ui",destination_service_namespace="preprod-cb",destination_version="v1",destination_workload="classification-helper-ui",destination_workload_namespace="preprod-cb",instance="174.16.191.251:42422",job="istio-mesh",permissive_response_code="none",permissive_response_policyid="none",reporter="destination",request_protocol="http",response_code="503",response_flags="UC",source_app="istio-ingressgateway",source_principal="unknown",source_version="unknown",source_workload="istio-ingressgateway",source_workload_namespace="istio-system"}    1
istio_requests_total{connection_security_policy="none",destination_app="helpers-frontend",destination_principal="unknown",destination_service="helpers-frontend.preprod-cb.svc.cluster.local",destination_service_name="helpers-frontend",destination_service_namespace="preprod-cb",destination_version="v1",destination_workload="helpers-frontend",destination_workload_namespace="preprod-cb",instance="174.16.191.251:42422",job="istio-mesh",permissive_response_code="none",permissive_response_policyid="none",reporter="destination",request_protocol="http",response_code="503",response_flags="UC",source_app="istio-ingressgateway",source_principal="unknown",source_version="unknown",source_workload="istio-ingressgateway",source_workload_namespace="istio-system"}    1
  • Not using a DestinationRule for idle timeout seems to decrease for us the number of 503 (still only in Prom). For example, as shown above, our last 1h load test only produced a couple of 503 without the DestinationRule, but our previous same 1h load test with the DestinationRule produced around 7.

Therefore, now with the latest 1.1.7 everything is working for us as expected, so we will move it to PROD next week. I will keep you posted.

istio-proxy is crashing for a specific deployment with segmentation fault, which causes sporadic 503's. Can that be related to this issue, or should I make a new one?

@Multiply Unrelated, please open a new issue. Thanks!

same issue on my side, i commented on the other thread:
https://github.com/istio/istio/issues/13848#issuecomment-496929796

if you need any infos let me know
thanks

1.1.8 contains important upgrades for people who are experiencing this issue. Because it's believed to be triggered at Envoy side: https://github.com/envoyproxy/envoy/pull/6578

After upgrading to 1.1.8, I've not seen 503s so far.

still getting 503 after upgrading.
Do we need to restart services pods ?

Hello, I spent a _long_ time writing a "wtf are my 503" guide recently. 1.1.8 addresses only a very specific cause of 503's, but 503's are the manifestation of several problems.

https://karlstoney.com/2019/05/31/istio-503s-ucs-and-tcp-fun-times/

Would be great if you could follow it through and try and understand specifically the root cause for yours and then open up an issue specifically for your use case as this one is getting really bloated now!

so i need to modify official docker images, that's fun - i'll go with nodeport instead
Thanks @Stono - useful information

What do you mean by modifying docker images?

Well if i get it well, i would need to add/modify client timeout on destination application. Can i do that without modifying the docker images/code ?
ie: https://hub.docker.com/r/verdaccio/verdaccio

Take a step back... my blog is not telling you the cause of 503s, it simply shows you how to debug the cause of the 503 that you may be seeing, of which there are many. 503 simply indicates an upstream error.

You need to do your own debugging to understand where the 503's are coming from and then look at mitigations.

ok my bad, i thought it was a generic solution. Honestly i don't have time to investigate 503's on 50+ deployed apps. I just know it comes from istio/envoy and i don't even use sidecar, just ingress. And from the logs i've pasted on the other threads it really looks like your solution might work(increase timeout on dest apps). I'll test on a java apps and see.
Thanks!

No worries. 503's are retried anyway so they're not exactly hurting you, they're being reported and istio is letting you see them.

Also the default implementation of the Java apache http client will retry 503 too, without istio you'll have been getting 503s too if it is socket timeouts; your java libraries would just be handling it transparently.

@emedina I'm wondering if we close this issue now; with 1.1.8 as well as the debugging steps I've provided people should be able to raise more specific issues related to the 503s they see. This issue has become a bit never-ending/open-ended.

What do you think?

@stono You're right, closing the issue as I cannot reproduce it in my use case.

I am running Istio 1.3.0 and still seeing intermittent 503 errors. Mostly during deploy time.

@njfix6 do you use Mixer policy or telemetry?

This is my helm config.

helm template install/kubernetes/helm/istio --name istio --namespace istio-system \
--set sidecarInjectorWebhook.enabled=true \
--set global.proxy.autoInject=enabled \
--set prometheus.enabled=false \
`# Disable mixer prometheus adapter to remove istio default metrics.` \
--set mixer.adapters.prometheus.enabled=true \
`# Disable mixer policy check, since in our template we set no policy.` \
--set global.disablePolicyChecks=true \
`# Set gateway pods to 1 to sidestep eventual consistency / readiness problems.` \
--set gateways.istio-ingressgateway.autoscaleMin="10" \
--set gateways.istio-ingressgateway.autoscaleMax="250" \
--set gateways.istio-ingressgateway.cpu.targetAverageUtilization="60" \
--set gateways.istio-ingressgateway.resources.requests.cpu=4000m \
--set gateways.istio-ingressgateway.resources.requests.memory=512Mi \
--set gateways.istio-ingressgateway.resources.limits.cpu=6000m \
--set gateways.istio-ingressgateway.resources.limits.memory=512Mi \
--set gateways.istio-ingressgateway.type=NodePort \
`# More pilot replicas for better scale` \
--set pilot.autoscaleMin=3 \
--set pilot.autoscaleMax=20 \
--set pilot.cpu.targetAverageUtilization=60 \
--set pilot.resources.requests.cpu=4800m \
--set pilot.resources.requests.memory=2G \
--set pilot.traceSampling=1.0 \
--set global.proxy.tracer=datadog \
`# More telemetry replicas for better scale` \
--set mixer.telemetry.autoscaleMin=3 \
--set mixer.telemetry.autoscaleMax=20 \
--set mixer.telemetry.cpu.targetAverageUtilization=60 \
--set mixer.telemetry.resources.requests.cpu=3800m \
--set mixer.telemetry.resources.requests.memory=4G \
--set mixer.telemetry.resources.limits.cpu=5800m \
--set mixer.telemetry.resources.limits.memory=5G \
--set mixer.policy.autoscaleMin=3 \
--set mixer.policy.autoscaleMax=20 \
--set mixer.policy.cpu.targetAverageUtilization=60 \
--set global.proxy.resources.requests.cpu=500m \
--set global.proxy.resources.requests.memory=256Mi \
> ./istio.yml

@duderino to be honest, I'm not entirely sure how I tell the difference. I'm somewhat new to Istio I believe I am using both? I work at Lime and we plan to roll this out to all our services but want to get this figured out beforehand.
Just for some context, we do not do any Istio retry on this service yet but I might enable it to try to fix the 503.

image
We are definitely still seeing 503's though.

503's during deployment time are almost always when your application is not gracefully shutting down.

Your pod will receive a SIGTERM and both istio-proxy and your app containers get a SIGTERM too. If you application doesn't handle SIGTERM well, it shuts down abruptly when there are still requests coming from the local istio-proxy. Have a look at your istio_requests_total{reporter="destination"} and if you're seeing them there, it's highly likely that it's your app.

Also, please get the response_flags from that metric.

@Stono thanks for the info! I'll look into that now.

@Stono the response flags is a mix between urx, uf, and uc. urx showing up the least often.

Hey, so they're all upstream errors, upstream in the context of the destination reporter envoy means the local application. So i'm willing to bet this is exactly what I said which is your application is terminating in an ungraceful manner, and istio is simply allowing you to see that.

https://karlstoney.com/2019/05/31/istio-503s-ucs-and-tcp-fun-times might help you more

Ok thanks, I'll take a look.

@Stono We are running rails/puma with nginx on top of our application with a keepalive_timeout 65;. I feel nginx and puma would be shutting down gracefully but the keepalive_timeout may be the main reason for the upstream issue. Have others had issue with nginx keepalive_timeout? I guess the default timeout for puma is 20 seconds too so that may also be causing issues.

Does keepalive affect the graceful shutdown? What if the request is on the fly?
You may need to set grace period in k8s as well.

What istio should have done but not well defined yet is that istio-proxy should maintain the interception during teh graceful shutdown period.

Summary: grace pod termination may need both application and istio-proxy reaction.
The istio behavior discussion is here: https://github.com/istio/istio/issues/17913

Update on my 503 journey:

First, I changed nginx and ruby keep-alive timeout to 360 seconds by adding keepalive_timeout 360; to nginx config, and by addingpersistent_timeout 360 to the puma config file. I don't think this helped much though because I still saw 503's.

Second I realized we were not shutting down nginx gracefully because when nginx receives a SIGTERM it quickly shuts down. Adding the following to your nginx config will make nginx shutdown gracefully:

lifecycle:
  preStop:
    exec:
      # SIGTERM triggers a quick exit; gracefully terminate instead
      command: ["nginx","-s","quit"]

Third I added a retry for 503's by adding the following retry config to my virtual services:

retries:
  attempts: 3
  perTryTimeout: 250ms
  retryOn: gateway-error,connect-failure,refused-stream

Lastly, we are using datadog to monitor Istio, and there is a common bug noted here: https://github.com/DataDog/integrations-core/issues/3121. You need to add send_monotonic_counter: true to your istio datadog config. My conf.yaml file now looks like this:
EDIT: this does not seem to be working for datadog. The metrics seem to still being sent as guages.

init_config:

instances:
  - istio_mesh_endpoint: http://istio-telemetry.istio-system:42422/metrics
    mixer_endpoint: http://istio-telemetry.istio-system:15014/metrics
    galley_endpoint: http://istio-galley.istio-system:15014/metrics
    pilot_endpoint: http://istio-pilot.istio-system:15014/metrics
    citadel_endpoint: http://istio-citadel.istio-system:15014/metrics
    send_histograms_buckets: true
    send_monotonic_counter: true

After all this, I am still seeing intermittent 503's.

My next step is to update from 1.3.0 to 1.3.3 as noted in https://github.com/istio/istio/issues/17913.
I am also looking to increase the graceful shutdown of my application from 30 seconds (kubernetes default) to 45 or 60.

Will keep everyone update :)

Wait for 1.3.4, due out in a couple of days which contains this fix https://github.com/istio/istio/issues/18089 which resolved all 503UCs for me.

I have updated to 1.3.4 and still seeing 503 errors. Going to try to increase graceful shutdown time to 60. terminationGracePeriodSeconds: 60

@njfix6 I would _really_ encourage you to do some debugging into where the 503 errors are coming from, otherwise you're shooting blind with config changes.

The blog I linked talks through how you can use tpdump to identify more information about the source of the 503s to determine what settings you would need to configure, and where.

Was this page helpful?
0 / 5 - 0 ratings