Output of the info page (if this is a bug)
root@ddog-cluster-agent-fdddbc496-f8zzx:/# datadog-cluster-agent status
Getting the status from the agent.
==============================
Datadog Cluster Agent (v1.0.0)
==============================
Status date: 2018-11-05 13:54:22.644670 UTC
Pid: 1
Check Runners: 4
Log Level: TRACE
Paths
=====
Config File: /etc/datadog-agent/datadog-cluster.yaml
conf.d: /etc/datadog-agent/conf.d
Clocks
======
System UTC time: 2018-11-05 13:54:22.644670 UTC
Hostnames
=========
ec2-hostname: ip-192-168-223-190.us-west-2.compute.internal
hostname: i-0c1580d88cbec55c0
instance-id: i-0c1580d88cbec55c0
socket-fqdn: ddog-cluster-agent-fdddbc496-f8zzx
socket-hostname: ddog-cluster-agent-fdddbc496-f8zzx
hostname provider: aws
unused hostname providers:
configuration/environment: hostname is empty
gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
Leader Election
===============
Leader Election Status: Running
Leader Name is: ddog-cluster-agent-fdddbc496-f8zzx
Last Acquisition of the lease: Mon, 05 Nov 2018 12:11:35 UTC
Renewed leadership: Mon, 05 Nov 2018 13:54:18 UTC
Number of leader transitions: 3 transitions
Custom Metrics Server
=====================
ConfigMap name: default/datadog-custom-metrics
External Metrics
----------------
Total: 2
Valid: 2
=========
Collector
=========
Running Checks
==============
kubernetes_apiserver
--------------------
Instance ID: kubernetes_apiserver [OK]
Total Runs: 418
Metric Samples: 0, Total: 0
Events: 0, Total: 0
Service Checks: 3, Total: 1,233
Average Execution Time : 25ms
=========
Forwarder
=========
CheckRunsV1: 417
Dropped: 0
DroppedOnInput: 0
Events: 0
HostMetadata: 0
IntakeV1: 1
Metadata: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 835
TimeseriesV1: 417
API Keys status
===============
API key ending with xxxxx on endpoint https://app.datadoghq.com: API Key valid
Describe what happened:
When I have following HPA definition the values HPA receives are different than the values at DataDog dashboard
hpa.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: statsd-demo
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: statsd-demo
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metricName: demoInGo.request.duration.new5
metricSelector:
matchLabels:
appname: statsd-demo
targetValue: 500
- type: External
external:
metricName: demoInGo.request.duration.new3
metricSelector:
matchLabels:
appname: statsd-demo
targetValue: 300
The metrics values are generated by a small demo application which generates constant value, values from the dashboard:
demoInGo.request.duration.new5: 99.2
demoInGo.request.duration.new3: 56.3
And the values HPA receives:
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
statsd-demo Deployment/statsd-demo 155/500, 155/300 1 10 1 49s
$ kubectl describe hpa statsd-demo
Name: statsd-demo
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 05 Nov 2018 17:41:36 +0530
Reference: Deployment/statsd-demo
Metrics: ( current / target )
"demoInGo.request.duration.new5" (target value): 155 / 500
"demoInGo.request.duration.new3" (target value): 155 / 300
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale the last scale time was sufficiently old as to warrant a new scale
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric demoInGo.request.duration.new5(&LabelSelector{MatchLabels:map[string]string{appname: statsd-demo,},MatchExpressions:[],})
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
Describe what you expected:
The values of both metrics should be same as or near to values from DataDog dashboard.
Steps to reproduce the issue:
kubectl getAdditional environment details (Operating System, Cloud provider, etc):
cluster-agent TRACE logs.
multiple-metrics-hpa.log
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.3", GitCommit:"a4529464e4629c21224b3d52edfe0ea91b072862", GitTreeState:"clean", BuildDate:"2018-09-09T18:02:47Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.3-eks", GitCommit:"58c199a59046dbf0a13a387d3491a39213be53df", GitTreeState:"clean", BuildDate:"2018-09-21T21:00:04Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Running platform version of EKS: eks.2
values.yaml used for helm install:
daemonset:
useHostNetwork: true
useHostPort: true
datadog:
leaderElection: true
env:
- name: DD_USE_DOGSTATSD
value: "true"
- name: DD_DOGSTATSD_PORT
value: "8125"
- name: DD_DOGSTATSD_NON_LOCAL_TRAFFIC
value: "true"
apiKey: "********************************"
appKey: "****************************************"
clusterAgent:
enabled: true
token: "*****************************************"
metricsProvider:
enabled: true
Hey @bhavin192,
Thank you for opening this!
It is not clear to me if the issue is in the Cluster Agent or in the calculation on the HorizontalPodAutoscaler controller side.
I was able to reproduce however.
As we see from your log:
2018-11-05 12:12:36 UTC | TRACE | (provider.go:143 in GetExternalMetric) | External metrics returned: []external_metrics.ExternalMetricValue{external_metrics.ExternalMetricValue{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, MetricName:"demoInGo.request.duration.new3", MetricLabels:map[string]string{"appname":"statsd-demo"}, Timestamp:v1.Time{Time:time.Time{wall:0xbef02acd112708fc, ext:160590852923, loc:(*time.Location)(0x28f4c00)}}, WindowSeconds:(*int64)(nil),
Value:resource.Quantity{i:resource.int64Amount{value:56, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"DecimalSI"}}, external_metrics.ExternalMetricValue{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, MetricName:"demoInGo.request.duration.new5", MetricLabels:map[string]string{"appname":"statsd-demo"}, Timestamp:v1.Time{Time:time.Time{wall:0xbef02acd11270f87, ext:160590854574, loc:(*time.Location)(0x28f4c00)}}, WindowSeconds:(*int64)(nil),
Value:resource.Quantity{i:resource.int64Amount{value:99, scale:0}, d:resource.infDecAmount{Dec:(*inf.Dec)(nil)}, s:"", Format:"DecimalSI"}}}
It indicates that the values are correctly computed.
From my investigation, the issue is that when kubernetes tries to get the value of one metric, it gets both and as you can see here, the autoscaler then sums the values.
That is what I'm investigating.
It appears that the configmap holds the expected values.
In the meantime, could you confirm by sharing:
kubectl describe cm datadog-custom-metricskubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/demoInGo.request.duration.new3" | jqkubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/default/demoInGo.request.duration.new5" | jq ?I'll keep you posted on our findings.
@bhavin192 I found the issue. It's indeed a bug on our end.
This will happen if the scopes of the metrics are equal.
Could you try to change the label selector of one of the metrics ?
- type: External
external:
metricName: demoInGo.request.duration.new5
metricSelector:
matchLabels:
appname: statsd-demo
targetValue: 500
- type: External
external:
metricName: demoInGo.request.duration.new3
metricSelector:
matchLabels:
otherKey: value
targetValue: 300
I'm working on a fix now and will be scheduling a bug fix release. Thank you very much for bringing this up to our attention.
@CharlyF hey, setting the different scope makes it show correct values.
Thanks for confirming - I'll work on the bugfix release ASAP.
This fix is in cluster-agent:1.1.0 that was release earlier this month.
I am going to close this issue as it is fixed, but feel free to reach out to us if you have questions or feedback!
Best,
.C