We use the gin tracer and produce over 20k health checks an hour (microservices). Is there any way to filter out the health checks so we don't go above our max?
Thanks!
Would the apm_config.ignore_resources setting the trace agent do the trick?
@gbbr is it possible to set this using ENV vars for the docker agent? https://github.com/DataDog/docker-dd-agent
Not at the moment, but I think we could add it.
In the meantime, the only workaround I can think of is for you to make your own Dockerfile and copy over a custom config:
FROM datadog/docker-dd-agent:latest
COPY ./datadog.yaml /etc/datadog-agent/datadog.yaml
Then build it and run it.
Actually, apologies, on a second look it seems like this already exists as DD_IGNORE_RESOURCES. It takes a comma-separated list of values.
@gbbr thanks! I assume it doesnt take wildcards does it?
It takes a comma-separated list of regular expressions, so you can be quite versatile with it.
Thanks!
And if my resource is github.com/github/service/web.(*Server).MakeServerRoutes.func1
Would that be interpreted literally? Also are there docs for this? I can just read those instead of bothering you :)
Make sure you don’t get any “invalid resource filter: ” errors in the log. It should only happen if the regular expression you give can’t be compiled. If it works, let’s close the issue.
There are no docs, you can just use one of the online regexp matchers to test that it matches. Also, it’s fine to ask for help here.
Thanks!
I will update you on how this goes. Just waiting for some traces to not come in :).
Is this what you had in mind? DD_IGNORE_RESOURCES=(health) I'm not seeing this filter out anything.
Logs look like this, but they always have (so its broken / normal) :
Dec 28 11:48:40 public-betting-dd-agent-LogGroup-1M30F9DTGLAAN service/agent/97d30c0792d3: 2018-12-28 16:48:38,632 INFO exited: collector (terminated by SIGKILL; not expected)
Dec 28 11:48:45 public-betting-dd-agent-LogGroup-1M30F9DTGLAAN service/agent/97d30c0792d3: 2018-12-28 16:48:41,152 INFO spawned: 'collector' with pid 152
Dec 28 11:48:50 public-betting-dd-agent-LogGroup-1M30F9DTGLAAN service/agent/97d30c0792d3: 2018-12-28 16:48:45,287 INFO success: collector entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
Dec 28 11:49:50 public-betting-dd-agent-LogGroup-1M30F9DTGLAAN service/agent/97d30c0792d3: 2018-12-28 16:49:46,085 INFO exited: jmxfetch (exit status 0; expected)
Dec 28 11:50:15 public-betting-dd-agent-LogGroup-1M30F9DTGLAAN service/agent/97d30c0792d3: 2018-12-28 16:50:10,472 INFO exited: trace-agent (terminated by SIGKILL; not expected)
Dec 28 11:50:15 public-betting-dd-agent-LogGroup-1M30F9DTGLAAN service/agent/97d30c0792d3: 2018-12-28 16:50:11,496 INFO spawned: 'trace-agent' with pid 237
Dec 28 11:50:20 public-betting-dd-agent-LogGroup-1M30F9DTGLAAN service/agent/97d30c0792d3: 2018-12-28 16:50:17,702 INFO success: trace-agent entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
You haven't mentioned the resources you want to filter out, so it's hard for me to provide a regular expression. Either way, you should be able to use this online tool to try out your regular expression against a list of test cases. Chose the "golang" flavor on the left. That should do it.
Let me know if you're still having trouble and possibly give me an example of resources you want to match against.
So it seems that the actual env. var name is DD_IGNORE_RESOURCE. I'm so sorry about this. I'm actually working on normalizing these here: https://github.com/DataDog/datadog-trace-agent/pull/552. Please let me know if that fixes things for you.
They are actually documented here in our official docs.
Thanks! I will try it out and see what happens.
@gbbr in the PR it says DD_APM_IGNORE_RESOURCES should I use that?
That will only land in 6.9.0, in an effort to standardize the env. vars. Current releases only have DD_IGNORE_RESOURCE. Thanks for bearing with me.
Hmm. So it still doesnt seem like health checks are getting filtered out.
I have this in the env
DD_IGNORE_RESOURCE=(health)
DD_IGNORE_RESOURCES=(health)
Like I said in my previous comment, it's hard for me to help you out without knowing what resources you are trying to filter out. My recommendation is to reach out to support since they have a lot of experience with this and will get you sorted in no-time.
Hi @gbbr,
i have a cluster agent with DD_APM_IGNORE_RESOURCES eq to '["GET /ping"]'.
The issue that I experience is that after setting the env with above value not only the ping request is filtered out ( which is what i want ) but also all the other resources ( example POST api/v2/submissions/validate).
here below the cluster agent env part:
- env:
- name: DD_LOGS_ENABLED
value: "true"
- name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
value: "true"
- name: DD_APM_ENABLED
value: "true"
- name: DD_API_KEY
value: 123456789abcde
- name: DD_COLLECT_KUBERNETES_EVENTS
value: "true"
- name: DD_LEADER_ELECTION
value: "true"
- name: KUBERNETES
value: "true"
- name: DD_PROCESS_AGENT_ENABLED
value: "true"
- name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC
value: "true"
- name: DD_APM_ANALYZED_SPANS
value: my-service|rack.request=1
- name: DD_APM_IGNORE_RESOURCES
value: '["GET /ping"]'
- name: DD_KUBERNETES_KUBELET_HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
image: datadog/agent:latest
Should i also connect with the support team or this is the right place ?
Should i also connect with the support team or this is the right place ?
It's fine to do it here as well, but it might take a bit longer to get a reply than support. I'll try my best to investigate this soon.
@nic-lan generally it should all be well if it's a valid regular expression. If there is a problem parsing it you should see the following output in the logs at the error level: invalid resource filter: ...
Thank.. i checked the logs but no invalid resource filter in there.
I sent the flare to the support team
@nic-lan thanks. Feel free to mention me and they will reach out. Then I can help you via that channel as well.
@jontonsoup have you managed to resolve your issue? Can we close this?
@jontonsoup Just circling back. Did you still need any help with this issue?
I was in the process of configuring filtering for our health checks and noticed the same issue. All traces are being filtered out. This is what I'm using:
ignore_resources: ["GET.*/ping.php$"]
Since we allow setting this via env. vars now (DD_APM_IGNORE_RESOURCES and DD_IGNORE_RESOURCES), I am going to close this issue.
As for the other issues which are unrelated (cc @laurenty) please open a separate new issue (and feel free to ping me) or reach out to support. FWIW we've been using this functionality quite often and we know it's working so it'll just be a matter of figuring out what you're doing wrong.