Ingress-nginx: "Secure-execution mode" breaking dynatrace monitoring

Created on 19 Apr 2020  路  9Comments  路  Source: kubernetes/ingress-nginx

NGINX Ingress controller version:

NGINX Ingress controller
Release: 0.30.0
Build: git-7e65b90c4
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.17.8

Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"c52f59bbba5fbf21fbb18e9a06f96e563fe4c20a", GitTreeState:"clean", BuildDate:"2020-01-31T20:00:26Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

We are using "Dynatrace OneAgent" as a monitoring tool in our systems (which in part works by injecting itself into other processes). We have been using ingress-nginx 0.25.1 successfully since the beginning but have had problems with this integration when testing the latest version 0.30.0 (we are assuming it's related to the alpine-linux migration)- "Injection to muslC processes run in secure-execution mode is not successful" is the error message.

I created this as a question rather than a bug as I assume there are good reasons for "secure-execution" mode to be enabled, although it would be interesting to know-

  1. If this is something that was done intentionally, or just happened as a consequence of the alpine migration
  2. How this can be disabled, and what the consequences of that might be from a security perspective.

I'm sure we won't be the only team running into this issue, so if it's not something that can/should be fixed in the source, it would helpful to many to have a workaround documented right here on the official github.

Extra note: Both the nginx version and alpine-linux version are officially supported by dynatrace- if we create our own image with the same alpine and nginx versions, monitoring works as expected.

Thanks :)

kinsupport

Most helpful comment

Hi all! I'm Jussi from Dynatrace, and have been working on this problem on our side. I just accidentally ran into this GitHub issue, and thought that I could shed some light on it. First of all, here's my understanding of why this started happening:

The secure-execution mode is enabled in three cases:

  • the real and effective user and/or group IDs of a process differ
  • a non-root-owned process executes a binary with increased capabilities
  • a Linux security module decides to enable it

The cap_net_bind_service capability was added to the Nginx binary in a commit about 16 months ago, around version 0.23. The Nginx process has been running in secure-execution mode ever since that commit (see point two above), but this was not a problem for Dynatrace as the glibc loader is less strict about preloading of libraries than the one in musl libc. But after the move to Alpine (musl) as a root image, the OneAgent can no longer be preloaded into the process effectively preventing our instrumentation.

We are currently working on alternative instrumentation methods for the ingress controller, but unfortunately I cannot offer any estimates or official statements on when they will be available. What I can say is that we've made some good progress and we hope to be able to provide at least a workaround in the near future.

All 9 comments

Injection to muslC processes run in secure-execution mode is not successful

Please post the exact log you get from the ingress controller pod.
Also, this seems an issue with the dynatrace module itself. did you open an issue in https://github.com/Dynatrace/dynatrace-oneagent-operator

Hey, thanks for your response. Sorry if I was unclear, the error message is not from the ingress controller (there are no error messages in the ingress controller), it's from the dynatrace console.

The problem is with the way this agent works- it needs to inject itself into the application, and if the application is running in "secure-execution" mode, that doesn't work. We've discussed this directly with their support team who say this is the problem, so if they're correct there's no fix that can be done on their end.

"secure-execution" mode, that doesn't work.

Not sure what that means. There are no changes in the build process from the migration from Debian to Alpine. The only change is the GCC version, now is 9.2.0

Hi all! I'm Jussi from Dynatrace, and have been working on this problem on our side. I just accidentally ran into this GitHub issue, and thought that I could shed some light on it. First of all, here's my understanding of why this started happening:

The secure-execution mode is enabled in three cases:

  • the real and effective user and/or group IDs of a process differ
  • a non-root-owned process executes a binary with increased capabilities
  • a Linux security module decides to enable it

The cap_net_bind_service capability was added to the Nginx binary in a commit about 16 months ago, around version 0.23. The Nginx process has been running in secure-execution mode ever since that commit (see point two above), but this was not a problem for Dynatrace as the glibc loader is less strict about preloading of libraries than the one in musl libc. But after the move to Alpine (musl) as a root image, the OneAgent can no longer be preloaded into the process effectively preventing our instrumentation.

We are currently working on alternative instrumentation methods for the ingress controller, but unfortunately I cannot offer any estimates or official statements on when they will be available. What I can say is that we've made some good progress and we hope to be able to provide at least a workaround in the near future.

@jvnn thank you for the update

Hi, @jvnn, we face the same issue. Do you have any news regarding the workaround?

Thx

Hi @gregleb, the newest agent versions now include an experimental workaround that (after a few manual config changes) lets you monitor the ingress controller again. "Experimental" means in this case that we cannot guarantee that the workaround can be used in all deployments, but once the agent starts up and begins sending data again, there's nothing to worry about. The changes only concern instrumenting the server, and don't affect runtime behaviour.

For details and instructions, please contact our support thought. My k8s-fu is very limited and I thus don't want to share any config snippets here. Dynatrace support has been informed about the workaround and they can guide you further.

Hi @jvnn, thanks for your update. I'll ask to Dynatrace support.

Closing. Per the last comments, you should contact Dynatrace support.

Was this page helpful?
0 / 5 - 0 ratings