2020-08-05T23:12:06.994966Z critical envoy assert panic: out of memory
2020-08-05T23:12:06.995014Z critical envoy backtrace Caught Aborted, suspect faulting address 0x53900000010
2020-08-05T23:12:06.995024Z critical envoy backtrace Backtrace (use tools/stack_decode.py to get line numbers):
2020-08-05T23:12:06.995026Z critical envoy backtrace Envoy version: 601cb2e4053746c53bad5811bd9b5f380228278a/1.16.0-dev/Clean/RELEASE/BoringSSL
2020-08-05T23:12:07.002342Z critical envoy backtrace #0: __restore_rt [0x7f351b8a68a0]
2020-08-05T23:12:07.022274Z critical envoy backtrace #1: [0x5575cc212089]
2020-08-05T23:12:07.029666Z critical envoy backtrace #2: (anonymous namespace)::handle_oom() [0x5575ce2aaaa5]
2020-08-05T23:12:07.038984Z critical envoy backtrace #3: tcmalloc::allocate_full_cpp_throw_oom() [0x5575ce315c90]
2020-08-05T23:12:07.046672Z critical envoy backtrace #4: fmt::v6::basic_memory_buffer<>::grow() [0x5575cc19895d]
2020-08-05T23:12:07.053475Z critical envoy backtrace #5: fmt::v6::visit_format_arg<>() [0x5575cc189c92]
2020-08-05T23:12:07.060924Z critical envoy backtrace #6: fmt::v6::internal::parse_format_string<>() [0x5575cc1892b5]
2020-08-05T23:12:07.067446Z critical envoy backtrace #7: fmt::v6::internal::vformat<>() [0x5575cc1a4032]
2020-08-05T23:12:07.073824Z critical envoy backtrace #8: Envoy::Server::ListenSocketFactoryImpl::createListenSocketAndApplyOptions() [0x5575cd7b9d6e]
2020-08-05T23:12:07.081319Z critical envoy backtrace #9: Envoy::Server::ListenSocketFactoryImpl::getListenSocket() [0x5575cd7ba208]
2020-08-05T23:12:07.088832Z critical envoy backtrace #10: Envoy::Server::ActiveRawUdpListener::ActiveRawUdpListener() [0x5575cd821236]
2020-08-05T23:12:07.095647Z critical envoy backtrace #11: Envoy::Server::ActiveRawUdpListenerFactory::createActiveUdpListener() [0x5575cd7b3ec8]
2020-08-05T23:12:07.102233Z critical envoy backtrace #12: Envoy::Server::ConnectionHandlerImpl::addListener() [0x5575cd81d3c7]
2020-08-05T23:12:07.109722Z critical envoy backtrace #13: std::__1::__function::__func<>::operator()() [0x5575cd81c72e]
2020-08-05T23:12:07.117505Z critical envoy backtrace #14: Envoy::Event::DispatcherImpl::runPostCallbacks() [0x5575cd8271f6]
2020-08-05T23:12:07.125794Z critical envoy backtrace #15: event_process_active_single_queue [0x5575cdc93cf7]
2020-08-05T23:12:07.136046Z critical envoy backtrace #16: event_base_loop [0x5575cdc9286e]
2020-08-05T23:12:07.155180Z critical envoy backtrace #17: Envoy::Server::WorkerImpl::threadRoutine() [0x5575cd81be74]
2020-08-05T23:12:07.165376Z critical envoy backtrace #18: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x5575cdd44463]
2020-08-05T23:12:07.165516Z critical envoy backtrace #19: start_thread [0x7f351b89b6db]
We have had this happen repeatedly. Thanks to @howardjohn , there is some correlation..
2020-08-06T00:00:03.197497Z debug envoy filter Loading DNS table from external file: Success
2020-08-06T00:00:03.197608Z debug envoy config new fc_contexts has 0 filter chains, including 0 newly built
2020-08-06T00:00:03.197616Z debug envoy config add warming listener: name=dns, hash=2887861284968934043, address=0.0.0.0:15013
2020-08-06T00:00:03.197622Z debug envoy misc Initialize listener dns local-init-manager.
2020-08-06T00:00:03.197625Z debug envoy init init manager Listener-local-init-manager dns 2887861284968934043 contains no targets
2020-08-06T00:00:03.197628Z debug envoy init init manager Listener-local-init-manager dns 2887861284968934043 initialized, notifying Listener-local-init-watcher dns
2020-08-06T00:00:03.197661Z debug envoy config warm complete. updating active listener: name=dns, hash=2887861284968934043, address=0.0.0.0:15013
2020-08-06T00:00:03.197666Z debug envoy config draining listener: name=dns, hash=16912794089181669957, address=0.0.0.0:15013
2020-08-06T00:00:03.197677Z info envoy upstream lds: add/update listener 'dns'
2020-08-06T00:00:03.197688Z debug envoy config Resuming discovery requests for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2020-08-06T00:00:03.197692Z debug envoy config Resuming discovery requests for type.googleapis.com/envoy.api.v2.RouteConfiguration
2020-08-06T00:00:03.197697Z debug envoy config gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener accepted with 62 resources with version 2020-08-06T00:00:03Z/1671
2020-08-06T00:00:03.197730Z critical envoy assert panic: out of memory
xds update of the DNS (udp) listener.
https://github.com/istio/istio/issues/26171
2020-08-05T21:38:25.757174Z critical envoy backtrace Caught Segmentation fault, suspect faulting address 0x0
2020-08-05T21:38:25.761073Z critical envoy backtrace Backtrace (use tools/stack_decode.py to get line numbers):
2020-08-05T21:38:25.761089Z critical envoy backtrace Envoy version: 601cb2e4053746c53bad5811bd9b5f380228278a/1.16.0-dev/Clean/RELEASE/BoringSSL
2020-08-05T21:38:25.761274Z critical envoy backtrace #0: __restore_rt [0x7f77968928a0]
2020-08-05T21:38:25.790495Z critical envoy backtrace #1: Envoy::Network::UdpListenerImpl::send() [0x557c794edb15]
2020-08-05T21:38:25.798024Z critical envoy backtrace #2: Envoy::Extensions::UdpFilters::DnsFilter::DnsFilter::sendDnsResponse() [0x557c79218341]
2020-08-05T21:38:25.805647Z critical envoy backtrace #3: std::__1::__function::__func<>::operator()() [0x557c7921c318]
2020-08-05T21:38:25.813076Z critical envoy backtrace #4: Envoy::Extensions::UdpFilters::DnsFilter::DnsFilterResolver::invokeCallback() [0x557c7921d6ca]
2020-08-05T21:38:25.822320Z critical envoy backtrace #5: std::__1::__function::__func<>::operator()() [0x557c7921f11f]
2020-08-05T21:38:25.852765Z critical envoy backtrace #6: Envoy::Network::DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback() [0x557c794e8efd]
2020-08-05T21:38:25.875241Z critical envoy backtrace #7: end_hquery [0x557c79586f12]
2020-08-05T21:38:25.883623Z critical envoy backtrace #8: qcallback [0x557c7958eae3]
2020-08-05T21:38:25.892917Z critical envoy backtrace #9: end_query [0x557c7958de55]
2020-08-05T21:38:25.900867Z critical envoy backtrace #10: processfds [0x557c7958d21c]
2020-08-05T21:38:25.908194Z critical envoy backtrace #11: std::__1::__function::__func<>::operator()() [0x557c794ea3cd]
2020-08-05T21:38:25.915726Z critical envoy backtrace #12: event_process_active_single_queue [0x557c79948cf7]
2020-08-05T21:38:25.922806Z critical envoy backtrace #13: event_base_loop [0x557c7994786e]
2020-08-05T21:38:25.930302Z critical envoy backtrace #14: Envoy::Server::WorkerImpl::threadRoutine() [0x557c794d0e74]
2020-08-05T21:38:25.937601Z critical envoy backtrace #15: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x557c799f9463]
2020-08-05T21:38:25.937656Z critical envoy backtrace #16: start_thread [0x7f77968876db]
Also a crash ^ on request. I don't have more info on that one, was doing a lot of stuff that the time
[2020-08-06 00:19:28.315][18][critical][backtrace] Caught Segmentation fault, suspect faulting address 0x347ade30
[2020-08-06 00:19:28.315][18][critical][backtrace] Backtrace (use tools/stack_decode.py to get line numbers):
[2020-08-06 00:19:28.315][18][critical][backtrace] Envoy version: 601cb2e4053746c53bad5811bd9b5f380228278a/1.16.0-dev/Clean/RELEASE/BoringSSL
[2020-08-06 00:19:28.315][18][critical][backtrace] #0: __restore_rt [0x7f6e8aaea8a0]
[2020-08-06 00:19:28.325][18][critical][backtrace] #1: std::__1::__function::__func<>::operator()() [0x5629338c9318]
[2020-08-06 00:19:28.337][18][critical][backtrace] #2: Envoy::Extensions::UdpFilters::DnsFilter::DnsFilterResolver::invokeCallback() [0x5629338ca6ca]
[2020-08-06 00:19:28.349][18][critical][backtrace] #3: std::__1::__function::__func<>::operator()() [0x5629338cc11f]
[2020-08-06 00:19:28.360][18][critical][backtrace] #4: Envoy::Network::DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback() [0x562933b95efd]
[2020-08-06 00:19:28.369][18][critical][backtrace] #5: end_hquery [0x562933c33f12]
[2020-08-06 00:19:28.382][18][critical][backtrace] #6: next_lookup [0x562933c33df6]
[2020-08-06 00:19:28.396][18][critical][backtrace] #7: qcallback [0x562933c3bae3]
[2020-08-06 00:19:28.409][18][critical][backtrace] #8: end_query [0x562933c3ae55]
[2020-08-06 00:19:28.422][18][critical][backtrace] #9: process_answer [0x562933c3b6af]
[2020-08-06 00:19:28.433][18][critical][backtrace] #10: processfds [0x562933c3a0a0]
[2020-08-06 00:19:28.443][18][critical][backtrace] #11: std::__1::__function::__func<>::operator()() [0x562933b983a5]
[2020-08-06 00:19:28.452][18][critical][backtrace] #12: Envoy::Event::FileEventImpl::assignEvents()::$_1::__invoke() [0x562933b8a606]
[2020-08-06 00:19:28.461][18][critical][backtrace] #13: event_process_active_single_queue [0x562933ff5feb]
[2020-08-06 00:19:28.470][18][critical][backtrace] #14: event_base_loop [0x562933ff486e]
[2020-08-06 00:19:28.479][18][critical][backtrace] #15: Envoy::Server::WorkerImpl::threadRoutine() [0x562933b7de74]
[2020-08-06 00:19:28.489][18][critical][backtrace] #16: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x5629340a6463]
[2020-08-06 00:19:28.490][18][critical][backtrace] #17: start_thread [0x7f6e8aadf6db]
This is reproduce in plain envoy, with just a single DNS listener changing address rapidly. I will post full reproducer soon
Thanks, @howardjohn! Yes, a reproducer will be very helpful!
It is possible that it is related to the race condition discussed here (https://github.com/envoyproxy/envoy/pull/11914#discussion_r465885114).
$ docker run -p 15013:15013/udp -p 9901:9901 -v $HOME/kube/local:/config -w / --entrypoint envoy envoyproxy/envoy-alpine-debug-dev -c /config/envoy-dns.yaml -l warning --log-format-prefix-with-location 0 --log-format "%v"
there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
Caught Segmentation fault, suspect faulting address 0x130
Backtrace (use tools/stack_decode.py to get line numbers):
Envoy version: 904cc6b36614e0a5c443d11465b408d3a1c1a342/1.16.0-dev/Clean/RELEASE/BoringSSL
#0: [0x7f747650f3d0]
#1: std::__1::__function::__func<>::operator()() [0x55ac720a6648]
#2: Envoy::Extensions::UdpFilters::DnsFilter::DnsFilterResolver::invokeCallback() [0x55ac720a792a]
#3: std::__1::__function::__func<>::operator()() [0x55ac720a9193]
#4: Envoy::Network::DnsResolverImpl::PendingResolution::onAresGetAddrInfoCallback() [0x55ac7235acb1]
#5: end_hquery [0x55ac7220c465]
#6: next_lookup [0x55ac7220c346]
#7: qcallback [0x55ac7220fa03]
#8: end_query [0x55ac7220eda5]
#9: process_answer [0x55ac7220f5db]
#10: processfds [0x55ac7220dfe0]
#11: std::__1::__function::__func<>::operator()() [0x55ac7235d185]
#12: Envoy::Event::FileEventImpl::assignEvents()::$_1::__invoke() [0x55ac7234f0f6]
#13: event_process_active_single_queue [0x55ac727e23d8]
#14: event_base_loop [0x55ac727e0cee]
#15: Envoy::Server::WorkerImpl::threadRoutine() [0x55ac72342b84]
#16: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x55ac728781c3]
#17: [0x7f7476503f00]
envoy-dns.yaml:
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 9901
node:
id: id
cluster: envoy
dynamic_resources:
lds_config:
path: /config/lds-dns.yaml
lds-dns.yaml
resources: [{
"@type": "type.googleapis.com/envoy.config.listener.v3.Listener",
"name": "dns",
"address": {
"socket_address": {
"protocol": "UDP",
"address": "0.0.0.0",
"port_value": 15013
}
},
"listener_filters": [
{
"name": "envoy.filters.udp.dns_filter",
"typed_config": {
"@type": "type.googleapis.com/envoy.extensions.filters.udp.dns_filter.v3alpha.DnsFilterConfig",
"stat_prefix": "dns",
"server_config": {
"inline_dns_table": {
"virtual_domains": [
{
"name": "fake.local",
"endpoint": {
"address_list": {
"address": [
"1.2.3.4"
]
}
}
},
{
"name": "other.fake.local",
"endpoint": {
"address_list": {
"address": [
"2.3.4.5"
]
}
}
}
],
"known_suffixes": [
{
"safe_regex": {
"google_re2": {},
"regex": ".*"
}
}
]
}
},
"client_config": {
"resolver_timeout": "10s",
"max_pending_lookups": "256"
}
}
}
],
"traffic_direction": "OUTBOUND",
"reuse_port": true
}]
switch lds-dns.yaml back and forth repeatedly between what is posted above and the other.fake.local removed. At the time same,
for i in {0..1000}; do
nslookup -port=15013 fake.local. localhost | grep Address
done
While I have the reproducer for ^ above, this was artificial. The OOM and other panic are seen in real environment.
I can trivially reproduce the OOM but only in an Istio environment, haven't gotten it in pure Envoy env
Thanks for the reproducer @howardjohn, I can reproduce the issue. Seems like the PR https://github.com/envoyproxy/envoy/pull/11914 linked by @rshriram fixes it. I baked that to this image: dio123/envoy:11914-dns-fix and did not see the crash.
@dio thanks! I couldn't reproduce this issue: https://github.com/envoyproxy/envoy/issues/12508#issuecomment-669618203
But I do still see the main issue of OOM:
2020-08-06T04:28:28.400765Z critical envoy assert panic: out of memory
2020-08-06T04:28:28.400824Z critical envoy backtrace Caught Aborted, suspect faulting address 0x53900000011
2020-08-06T04:28:28.400830Z critical envoy backtrace Backtrace (use tools/stack_decode.py to get line numbers):
2020-08-06T04:28:28.400835Z critical envoy backtrace Envoy version: 601cb2e4053746c53bad5811bd9b5f380228278a/1.16.0-dev/Clean/RELEASE/BoringSSL
2020-08-06T04:28:28.401187Z critical envoy backtrace #0: __restore_rt [0x7f77867048a0]
2020-08-06T04:28:28.422105Z critical envoy backtrace #1: [0x5556d0548089]
2020-08-06T04:28:28.430418Z critical envoy backtrace #2: (anonymous namespace)::handle_oom() [0x5556d25e0aa5]
2020-08-06T04:28:28.437321Z critical envoy backtrace #3: tcmalloc::allocate_full_cpp_throw_oom() [0x5556d264bc90]
2020-08-06T04:28:28.443532Z critical envoy backtrace #4: fmt::v6::basic_memory_buffer<>::grow() [0x5556d04ce95d]
2020-08-06T04:28:28.449246Z critical envoy backtrace #5: fmt::v6::visit_format_arg<>() [0x5556d04bfc92]
2020-08-06T04:28:28.455528Z critical envoy backtrace #6: fmt::v6::internal::parse_format_string<>() [0x5556d04bf2b5]
2020-08-06T04:28:28.460893Z critical envoy backtrace #7: fmt::v6::internal::vformat<>() [0x5556d04da032]
2020-08-06T04:28:28.466432Z critical envoy backtrace #8: Envoy::Server::ListenSocketFactoryImpl::createListenSocketAndApplyOptions() [0x5556d1aefd6e]
2020-08-06T04:28:28.471869Z critical envoy backtrace #9: Envoy::Server::ListenSocketFactoryImpl::getListenSocket() [0x5556d1af0208]
2020-08-06T04:28:28.477304Z critical envoy backtrace #10: Envoy::Server::ActiveRawUdpListener::ActiveRawUdpListener() [0x5556d1b57236]
2020-08-06T04:28:28.482779Z critical envoy backtrace #11: Envoy::Server::ActiveRawUdpListenerFactory::createActiveUdpListener() [0x5556d1ae9ec8]
2020-08-06T04:28:28.488218Z critical envoy backtrace #12: Envoy::Server::ConnectionHandlerImpl::addListener() [0x5556d1b533c7]
2020-08-06T04:28:28.493671Z critical envoy backtrace #13: std::__1::__function::__func<>::operator()() [0x5556d1b5272e]
2020-08-06T04:28:28.499123Z critical envoy backtrace #14: Envoy::Event::DispatcherImpl::runPostCallbacks() [0x5556d1b5d1f6]
2020-08-06T04:28:28.504590Z critical envoy backtrace #15: event_process_active_single_queue [0x5556d1fc9cf7]
2020-08-06T04:28:28.510030Z critical envoy backtrace #16: event_base_loop [0x5556d1fc886e]
2020-08-06T04:28:28.515453Z critical envoy backtrace #17: Envoy::Server::WorkerImpl::threadRoutine() [0x5556d1b51e74]
2020-08-06T04:28:28.520881Z critical envoy backtrace #18: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x5556d207a463]
2020-08-06T04:28:28.520910Z critical envoy backtrace #19: start_thread [0x7f77866f96db]
I also disabled our iptables capturing of DNS and just directly forwarded the DNS Requests, same issue. I believe this proves its not some infinite iptables loop type issue
@howardjohn I鈥檓 interested in your setup for Istio for reproducing this. Probably share it in the that Istio ticket? Thanks!
@dio if you check out istio/istio (I ran at bc1ea95017bd7ef99994998de2c2aeeccfda1736), then run:
kind create cluster # Happens on any cluster, kind is just simple
go test -v -p 1 ./tests/integration/pilot/ -run TestLocality -count 100 --istio.test.kube.loadbalancer=false --istio.test.nocleanup
This will deploy a bunch of pods to echo-******** namespace
It may take a few tries, but eventually you should see all the pods crash. kubectl logs c-v1-7bb495cfbf-p22dz -c istio-proxy -p will show the crash.
One interesting thing I found is that I just ran it and it took a few minutes to trigger, but when it did I had 4 (out of 5) pods all hit this at the same time
I think this was fixed by https://github.com/envoyproxy/envoy/pull/11914
@mattklein123 I have tried with #11914, still broken. See https://github.com/envoyproxy/envoy/issues/12508#issuecomment-669677645. It fixed one crash but not the main OOM issue we see
cc @abaptiste