Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.):
no
What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.):
nginx, opentracing, zipkin, segfault
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
NGINX Ingress controller version: 0.10.2
Kubernetes version (use kubectl version):
> k version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-15T22:14:38Z", GoVersion:"go1.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.12", GitCommit:"3bda299a6414b4866f179921610d6738206a18fe", GitTreeState:"clean", BuildDate:"2017-12-29T08:39:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Environment:
> cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.3 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.3"
uname -a):> uname -a
Linux ... 3.10.0-693.11.1.el7.x86_64 #1 SMP Fri Oct 27 05:39:05 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
nginx-ingress configmap values
enable-opentracing: "true"
zipkin-collector-host: zipkin.tools
zipkin-collector-port: '80'
zipkin-service-name: ingress-controller
What happened:
worker process exiting due to segfault
What you expected to happen:
no segfaults, no worker restarts
How to reproduce it (as minimally and precisely as possible):
Happens almost every 2 seconds
{"log":"2018/03/20 18:34:54 [alert] 23#23: worker process 11075 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:54.142872722Z"}
{"log":"2018/03/20 18:34:54 [alert] 23#23: worker process 11109 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:54.517182591Z"}
{"log":"2018/03/20 18:34:57 [alert] 23#23: worker process 11143 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:57.923391335Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11178 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.194986234Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11212 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.557907914Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11246 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.816366263Z"}
{"log":"2018/03/20 18:34:59 [alert] 23#23: worker process 11280 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:59.19434845Z"}
{"log":"2018/03/20 18:35:02 [alert] 23#23: worker process 11314 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:02.943432758Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11349 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.206377447Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11383 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.56327926Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11417 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.81832701Z"}
{"log":"2018/03/20 18:35:08 [alert] 23#23: worker process 11451 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:08.92230984Z"}
{"log":"2018/03/20 18:35:08 [alert] 23#23: worker process 11486 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:08.979320448Z"}
{"log":"2018/03/20 18:35:13 [alert] 23#23: worker process 11520 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:13.231301619Z"}
{"log":"2018/03/20 18:35:13 [alert] 23#23: worker process 11555 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:13.791207004Z"}
{"log":"2018/03/20 18:35:14 [alert] 23#23: worker process 11589 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:14.156415018Z"}
{"log":"2018/03/20 18:35:14 [alert] 23#23: worker process 11623 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:14.5254932Z"}
{"log":"2018/03/20 18:35:18 [alert] 23#23: worker process 11657 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:18.272065288Z"}
{"log":"2018/03/20 18:35:18 [alert] 23#23: worker process 11692 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:18.53148589Z"}
{"log":"2018/03/20 18:35:19 [alert] 23#23: worker process 11726 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:19.517368566Z"}
Anything else we need to know:
> tail -f nginx-ingress-controller-3228907275-p158b_kube-system_nginx-ingress-controller-45af43a95ce0ff5989dfe5783d5457214952edc83dc5de1e6745b436c7c5e23b.log | grep 'opentracing' -C3
2018/03/20 17:51:09 [debug] 23719#23719: *680808 http script set $service_name
2018/03/20 17:51:09 [debug] 23719#23719: *680808 post rewrite phase: 4
2018/03/20 17:51:09 [debug] 23719#23719: *680808 generic phase: 5
2018/03/20 17:51:09 [debug] 23719#23719: *680808 extraced opentracing span context from request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 starting opentracing request span for 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 starting opentracing location span for \"/\"(00007FE651FDDD00) in request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 injecting opentracing span context from request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 adding opentracing header \"x-b3-traceid:09e429b7d83c3090\" in request 00007FE65CCFD050
2018/03/20 17:51:09 [alert] 23#23: worker process 23719 exited on signal 11 (core dumped)
2018/03/20 17:51:09 [debug] 23753#23753: epoll add event: fd:13 op:1 ev:00002001
2018/03/20 17:51:09 [debug] 23753#23753: epoll add event: fd:14 op:1 ev:00002001
After analysis of the coredump it's visible that the issue in the opentracing insert_header block:
#0 0x0000561139f929ec in ngx_list_push ()
#1 0x00007f7a1adbb6ab in ngx_opentracing::inject_span_context(opentracing::v1::Tracer const&, ngx_http_request_s*, opentracing::v1::SpanContext const&) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#2 0x00007f7a1adbcf81 in ngx_opentracing::OpenTracingRequestInstrumentor::OpenTracingRequestInstrumentor(ngx_http_request_s*, ngx_http_core_loc_conf_s*, ngx_opentracing::opentracing_loc_conf_t*) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#3 0x00007f7a1adbdce6 in ngx_opentracing::on_enter_block(ngx_http_request_s*) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#4 0x000056113a097a33 in ngx_http_core_generic_phase ()
#5 0x000056113a093645 in ngx_http_core_run_phases ()
#6 0x000056113a08f03e in ngx_http_process_request ()
#7 0x000056113a09068c in ngx_http_process_request_line ()
#8 0x000056113a0a6be1 in ngx_epoll_process_events.lto_priv ()
#9 0x000056113a0c45df in ngx_process_events_and_timers ()
#10 0x000056113a0a8405 in ngx_worker_process_cycle ()
#11 0x000056113a0be07f in ngx_spawn_process ()
#12 0x000056113a0a9715 in ngx_master_process_cycle ()
#13 0x0000561139f8a654 in main ()
@rnburn can you help with this?
Yep. Will take a look.
@trnl thank you for the detailed report.
On another cluster, I can't reproduce it, and the only difference which I noticed is the:
> uname -a
Linux ... 3.10.0-693.2.2.el7.x86_64 #1 SMP Sat Sep 9 03:55:24 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
@rnburn , maybe I can provide some additional information?
@trnl - The information is good. I can tell it's crashing here. I haven't had a chance yet to investigate further but will try to fix this week.
I looked into this and I couldn't find any nginx documentation that says you can't modify headers_in like this line of the code does, but I inquired on nginx-devel to see if there's anything I'm missing.
Something like a smaller isolated example that reproduces the problem or a core dump could help to narrow the problem down.
@rnburn that was fast http://mailman.nginx.org/pipermail/nginx-devel/2018-March/010988.html
@rnburn any update on this?
Hey @aledbf - I have plans to put in a fix based off of this answer I got on the nginx mailing list:
http://mailman.nginx.org/pipermail/nginx-devel/2018-March/011008.html
@aledbf - I put in https://github.com/opentracing-contrib/nginx-opentracing/pull/37 with a fix for this.
The fix required doing some reorganization of the module. I removed the vendor-specific modules in favor of loading in the tracer's dynamically with opentracing_load_tracer from a JSON representation of their configuration.
I can put in a PR to update ingress-nginx to do this.
I think this approach should work better for you since you can use an embedded YAML configuration for the tracer without having any code to worry about handling a tracer's specific options.
@rnburn thank you for the update, the new json file makes sense.
I can put in a PR to update ingress-nginx to do this.
Sure. First let me update the nginx image (I am updating the modules) (https://github.com/kubernetes/ingress-nginx/pull/2537)
@rnburn what's the format of the new json configuration for zipkin
It's got a schema file for the configuration: https://github.com/rnburn/zipkin-cpp-opentracing/blob/master/zipkin_opentracing/tracer_configuration.schema.json
@trnl can you test quay.io/aledbf/nginx-ingress-controller:0.382 ?
It contains the fix for the bug you reported.
Thanks in advance
Most helpful comment
Yep. Will take a look.