Datadog-agent: Missing values in datadog-agent on ECS Fargate platform_version 1.4.0 due to missing task metadata values

Created on 24 Apr 2020  路  9Comments  路  Source: DataDog/datadog-agent

Output of the info page (if this is a bug)
N/A

Describe what happened:
While trying to monitor my task with Datadog Agent, everything works fine with platform_version 1.3.0 , but when I upgrade to 1.4.0 , some metrics from task metadata endpoint disappear. When I go deeper in datadog agent code , it seems that the task metadata endpoint does not return anymore these values :

online_cpus

You could see a sample with a task definition metadata from 1.4.0 on the left and a task definiton metadata 1.3.0 on the right here

So ecs.fargate.cpu.percent is not compute anymore with platform_version 1.4.0 of fargate.

Describe what you expected:
Get ecs.fargate.cpu.percent value as expected. I Open a ticket on ECS Roadmap Project here

Steps to reproduce the issue:
Launch a task with platform_version 1.4.0

Additional environment details (Operating System, Cloud provider, etc):
AWS Fargate with platform_version 1.4.0 and 1.3.0
Datadog agent version 7.18.1

Most helpful comment

@DataDog/container-integrations team here. FYI, we are still investigating this issue but can confirm the findings that exhibit the 1.4.0 vs 1.3.0 difference:

Same container image reports these stats:

Fargate 1.4.0:

'cpu_stats': {'cpu_usage': {'total_usage': 399244944, 'percpu_usage': [367286436, 31958508, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'usage_in_kernelmode': 30000000, 'usage_in_usermode': 360000000}, 'throttling_data': {'periods': 0, 'throttled_periods': 0, 'throttled_time': 0}}

Fargate 1.3.0:

 'cpu_stats': {'cpu_usage': {'total_usage': 431866773, 'percpu_usage': [222238847, 209627926, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'usage_in_kernelmode': 30000000, 'usage_in_usermode': 340000000}, 'system_cpu_usage': 522320000000, 'online_cpus': 2, 'throttling_data': {'periods': 0, 'throttled_periods': 0, 'throttled_time': 0}}

We will update once we know more and if there's a workaround we can offer.

To clarify re: PR #5411 - it fixes a different issue, where JSON format from /v2/stats/{container_id} API changes across these versions.

All 9 comments

I'm seeing the same thing. All I changed for my ECS Fargate service was change the platform version to 1.4.0 and I no longer ecs.fargate.cpu.percent.

FYI I just upgraded to version 7.19.0 https://github.com/DataDog/datadog-agent/releases/tag/7.19.0 and I'm still seeing this issue.

+1. I just upgraded to Fargate version 1.4.0 (From 1.3.0) and I can confirm that the ecs.fargate.cpu.percent is missing. All other values seem to be working as expected. I'm using the datadog/agent:latest image.

@DataDog/container-integrations team here. FYI, we are still investigating this issue but can confirm the findings that exhibit the 1.4.0 vs 1.3.0 difference:

Same container image reports these stats:

Fargate 1.4.0:

'cpu_stats': {'cpu_usage': {'total_usage': 399244944, 'percpu_usage': [367286436, 31958508, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'usage_in_kernelmode': 30000000, 'usage_in_usermode': 360000000}, 'throttling_data': {'periods': 0, 'throttled_periods': 0, 'throttled_time': 0}}

Fargate 1.3.0:

 'cpu_stats': {'cpu_usage': {'total_usage': 431866773, 'percpu_usage': [222238847, 209627926, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'usage_in_kernelmode': 30000000, 'usage_in_usermode': 340000000}, 'system_cpu_usage': 522320000000, 'online_cpus': 2, 'throttling_data': {'periods': 0, 'throttled_periods': 0, 'throttled_time': 0}}

We will update once we know more and if there's a workaround we can offer.

To clarify re: PR #5411 - it fixes a different issue, where JSON format from /v2/stats/{container_id} API changes across these versions.

I'm experiencing the same issue

Any updates on this issue? This is starting to become critical for us as we need to move to ECS Fargate 1.4.0.

This is caused by the regression of fargate1.4.0 as in the description. cpu_stats and precpu_stats are incomplete.
The ticket opened in the description (aws/containers-roadmap#855) was closed as fixed, but the fix was a partial one. Only cpu_stats was fixed and precpu_stats is still incomplete. cpu_stats reports accumulated values and datadog-agent doesn鈥檛 work without precpu_stats fixed.

The ticket for precpu_stats of Fargate 1.4.0 is here: aws/containers-roadmap#1062

Received an update from Datadog support 2 weeks ago on this - we upgraded back to Fargate PV 1.4.0 with Datadog agent version 7.23.1 and ecs.fargate.cpu.percent has been present since then.

Was this page helpful?
0 / 5 - 0 ratings