Our goal is to get datadog to automatically monitor the usual jmx stuff java containers in our cluster. This is pretty much the standard jmx integration, moved to k8s. This shouldn't be all that difficult: Just monitor a given port for jmx on any container that has it. But we can't get it working.
Reading https://docs.datadoghq.com/agent/autodiscovery/#template-source-kubernetes-pod-annotations, we understand that there are two ways to configure things:
(1) using auto-conf. This would require us to use a special container name for all our containers. Since we run about 10 different applications, we can't use the same image for all of them, so this doesn't work.
(2) using kubernetes labels. On the surface, this might work. The problem is that the JMX configuration annotations are quite long. The apache example is already a bit much to stuff into a json annotation value, but even a basic JMX policy is really long ( see below). Even if it works, it would require us to duplicate the JMX configuration into an annotation on every k8s deployment we have, which is horrible.
Using annotations is reasonable, but it would be much better to use them to tag a container as needing a check, not to include the entire configuration.
Is it possible to use an annotation like this:
apiVersion: v1
kind: Pod
metadata:
annotations:
ad.datadoghq.com/jmx.check_names: '["jmx"]'
ad.datadoghq.com/jmx.init_configs: '[{}]'
In combination with an autoconf file like below ( basically the standard jmx configuration file) , to check JMX on containers without the need to stash a 100 line json file into the value of annotation?
conf:
- include:
type: ThreadPool
attribute:
maxThreads:
alias: tomcat.threads.max
metric_type: gauge
currentThreadCount:
alias: tomcat.threads.count
metric_type: gauge
currentThreadsBusy:
alias: tomcat.threads.busy
metric_type: gauge
- include:
domain: java.lang
type: MemoryPool
attribute:
Usage.used:
alias: jvm.memory_pool.used
metric_type: gauge
Usage.max:
alias: jvm.memory_pool.max
metric_type: gauge
Usage.init:
alias: jvm.memory_pool.init
metric_type: gauge
Usage.committed:
alias: jvm.memory_pool.committed
metric_type: gauge
- include:
domain: java.lang
type: GarbageCollector
attribute:
CollectionCount:
alias: jvm.gc.count
metric_type: gauge
CollectionTime:
alias: jvm.gc.time
metric_type: gauge**
Did you succeed to monitor JMX in k8s ? I'm also trying to do this.
Hi, @guizmaii
No unfortunately I didnt. I ended up tired and frustrated-- it just doesnt work well. I should also note that DataDog's lack of a response on this issue was also super frustrating.
The path we ended up taking was somewhat unusual. We instrumented all of our applications using prometheus, and then used datadog's prometheus integration to pull the metrics in. It was still a struggle to get datadog to read the k8s api, but once we had that going, this pretty much just worked.
This provided several other benefits too :
After running the datadog agent in-cluster, and then using the java prometheus client to expose metrics, you add annotations to your manifest like this:
spec:
replicas: {{ .Values.replicaCount }}
template:
metadata:
annotations:
ad.datadoghq.com/ptplace-bff.check_names: '["prometheus"]'
ad.datadoghq.com/ptplace-bff.init_configs: '[{}]'
ad.datadoghq.com/ptplace-bff.instances: '[ { "prometheus_url": "http://%%host%%:8080/metrics/", "namespace": "ptplace-bff", "metrics": [ "*" ] } ]'
I succeeded to instrument my JVMs via JMX in k8s.
Here's the config I have for my containers:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: dashboard
labels:
app: dashboard
spec:
serviceName: dashboard
replicas: 3
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: dashboard
template:
metadata:
annotations:
# Annotations should have this format: `ad.datadoghq.com/<container_name>.check_names`
ad.datadoghq.com/dashboard.check_names: '["jmx"]'
ad.datadoghq.com/dashboard.init_configs: '[{}]'
ad.datadoghq.com/dashboard.instances: '[{"jmx_url": "service:jmx:rmi://%%host%%:7199/"}]'
labels:
app: dashboard
spec:
terminationGracePeriodSeconds: 60
containers:
- name: dashboard # this is the `container_name` you should use in the annotations.
..............
ports:
- containerPort: 7199
name: jmx_port
and here's the values.yml file I use to deploy the Datadog chart in k8s:
# Copied from here: https://github.com/kubernetes/charts/blob/master/stable/datadog/values.yaml
# Default values for datadog.
image:
# This chart is compatible with different images, please choose one
repository: datadog/agent # Agent6
# repository: datadog/dogstatsd # Standalone DogStatsD6
# repository: datadog/docker-dd-agent # Agent5
tag: 6.2.1-jmx # Use 6.2.1-jmx to enable jmx fetch collection
pullPolicy: IfNotPresent
# NB! Normally you need to keep Datadog DaemonSet enabled!
# The exceptional case could be a situation when you need to run
# single DataDog pod per every namespace, but you do not need to
# re-create a DaemonSet for every non-default namespace install.
# Note, that StatsD and DogStatsD work over UDP, so you may not
# get guaranteed delivery of the metrics in Datadog-per-namespace setup!
daemonset:
enabled: true
## Bind ports on the hostNetwork. Useful for CNI networking where hostPort might
## not be supported. The ports will need to be available on all hosts. Can be
## used for custom metrics instead of a service endpoint.
## WARNING: Make sure that hosts using this are properly firewalled otherwise
## metrics and traces will be accepted from any host able to connect to this host.
# useHostNetwork: true
## Sets the hostPort to the same value of the container port. Can be used as
## for sending custom metrics. The ports will need to be available on all
## hosts.
## WARNING: Make sure that hosts using this are properly firewalled otherwise
## metrics and traces will be accepted from any host able to connect to this host.
useHostPort: true
## Annotations to add to the DaemonSet's Pods
# podAnnotations:
# scheduler.alpha.kubernetes.io/tolerations: '[{"key": "example", "value": "foo"}]'
## Allow the DaemonSet to schedule on tainted nodes (requires Kubernetes >= 1.6)
# tolerations: []
## Allow the DaemonSet to schedule on selected nodes
# Ref: https://kubernetes.io/docs/user-guide/node-selection/
# nodeSelector: {}
## Allow the DaemonSet to schedule ussing affinity rules
# Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
# affinity: {}
## Allow the DaemonSet to perform a rolling update on helm update
## ref: https://kubernetes.io/docs/tasks/manage-daemon/update-daemon-set/
# updateStrategy: RollingUpdate
# Apart from DaemonSet, deploy Datadog agent pods and related service for
# applications that want to send custom metrics. Provides DogStasD service.
#
# HINT: If you want to use datadog.collectEvents, keep deployment.replicas set to 1.
deployment:
enabled: false
replicas: 1
# Affinity for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
# Tolerations for pod assignment
# Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations: []
## deploy the kube-state-metrics deployment
## ref: https://github.com/kubernetes/charts/tree/master/stable/kube-state-metrics
##
kubeStateMetrics:
enabled: true
datadog:
## You'll need to set this to your Datadog API key before the agent will run.
## ref: https://app.datadoghq.com/account/settings#agent/kubernetes
##
apiKey: 'TOCHANGE'
## dd-agent container name
##
name: dd-agent
## Set logging verbosity.
## ref: https://github.com/DataDog/docker-dd-agent#environment-variables
## Note: For Agent6 (image `datadog/agent`) the valid log levels are
## trace, debug, info, warn, error, critical, and off
##
logLevel: WARNING
## Un-comment this to make each node accept non-local statsd traffic.
## ref: https://github.com/DataDog/docker-dd-agent#environment-variables
##
# nonLocalTraffic: true
## Set host tags.
## ref: https://github.com/DataDog/docker-dd-agent#environment-variables
##
# tags:
## Enables event collection from the kubernetes API
## ref: https://github.com/DataDog/docker-dd-agent#environment-variables
##
collectEvents: true
## Un-comment this to enable APM and tracing, on ports 7777 and 8126
## ref: https://github.com/DataDog/docker-dd-agent#tracing-from-the-host
##
apmEnabled: true
## The dd-agent supports many environment variables
## ref: https://github.com/DataDog/docker-dd-agent#environment-variables
##
env:
- name: DD_PROCESS_AGENT_ENABLED # https://docs.datadoghq.com/guides/process/
value: "true"
- name: DD_LOGS_ENABLED # https://app.datadoghq.com/logs/onboarding/container
value: "true"
- name: DD_LEADER_ELECTION # https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#kubernetes-integration
value: "true"
- name: DD_COLLECT_KUBERNETES_EVENTS # https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#event-collection
value: "true"
- name: SD_JMX_ENABLE # https://docs.datadoghq.com/agent/faq/docker-jmx/
value: "true"
## The dd-agent supports detailed process and container monitoring and
## requires control over the volume and volumeMounts for the daemonset
## or deployment.
## ref: https://docs.datadoghq.com/guides/process/
##
volumes:
- hostPath:
path: /etc/passwd
name: passwd
volumeMounts:
- name: passwd
mountPath: /etc/passwd
readOnly: true
## Enable leader election mechanism for event collection
##
leaderElection: true
## Set the lease time for leader election
##
# leaderLeaseDuration: 600
## Provide additional service definitions
## Each key will become a file in /conf.d/auto_conf
## ref: https://github.com/DataDog/docker-dd-agent#configuration-files
##
# autoconf:
# kubernetes_state.yaml: |-
# docker_images:
# - kube-state-metrics
# init_config:
# instances:
# - kube_state_url: http://%%host%%:%%port%%/metrics
## Provide additional service definitions
## Each key will become a file in /conf.d
## ref: https://github.com/DataDog/docker-dd-agent#configuration-files
##
confd:
# redisdb.yaml: |-
# init_config:
# instances:
# - host: "name"
# port: "6379"
# https://app.datadoghq.com/logs/onboarding/container
# https://github.com/DataDog/datadog-agent/tree/master/Dockerfiles/agent#configuration-file-example
logs.yaml: |-
init_config:
instances:
[{}]
logs:
- type: docker
service: myapp
source: myapp-logs
## Provide additional service checks
## Each key will become a file in /checks.d
## ref: https://github.com/DataDog/docker-dd-agent#configuration-files
##
# checksd:
# service.py: |-
## datadog-agent resource requests and limits
## Make sure to keep requests and limits equal to keep the pods in the Guaranteed QoS class
## Ref: http://kubernetes.io/docs/user-guide/compute-resources/
##
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 200m
memory: 256Mi
rbac:
## If true, create & use RBAC resources
create: false
## Ignored if rbac.create is true
serviceAccountName: default
tolerations: []
kube-state-metrics:
rbac:
create: false
## Ignored if rbac.create is true
serviceAccountName: default
All the credits to someone called C8n on the Datadog Slack.
If you have more questions, this Slack is useful !
@guizmaii thanks for posting a working solution! My ship has already sailed, but I'm going to close this issue because you've solved it.
On AWS ECS I was able to get this equivalent thing to work:
"dockerLabels": {
"com.datadoghq.ad.check_names": "[\"jmx\"]",
"com.datadoghq.ad.instances": "[ {\"host\": \"localhost\", \"port\":\"9000\"}]",
"com.datadoghq.ad.init_configs": "[{}]"
}
but not this
"dockerLabels": {
"com.datadoghq.ad.check_names": "[\"jmx\"]",
"com.datadoghq.ad.instances": "[ {\"host\": \"%%host%%\", \"port\":\"9000\"}]",
"com.datadoghq.ad.init_configs": "[{}]"
}
nor variants with jmx_url
I'm really curious if the configuration parameter SD_JMX_ENABLE is really needed, when searching in the code in the repo, it does not show any traces other than some legacy tests. I almost believe that JMX is enabled by default via SD if you run the agent with the -jmx postfix.
@JorgenG you're right, that's not needed anymore (it was for the legacy agent 5).
Most helpful comment
I succeeded to instrument my JVMs via JMX in k8s.
Here's the config I have for my containers:
and here's the
values.ymlfile I use to deploy the Datadog chart in k8s:All the credits to someone called
C8non the Datadog Slack.If you have more questions, this Slack is useful !