Nomad: [0.7.0-rc3] prometheus telemetry keep emitting data for stopped allocations

Created on 31 Oct 2017  路  5Comments  路  Source: hashicorp/nomad

this is a DatsDog chart using prometheus to show nomad_client_allocs_memory_rss grouped by alloc_id - all those flat-lines are in fact allocations that is stopped, replaced by a new one (during a new submission of a job)

I've verified those allocation IDs still exist in /v1/metrics?format=prometheus when when the allocation is stopped/dead

Telemetry config:

disable_hostname: true,
publish_allocation_metrics: true
publish_node_metrics: true

image

themmetrics typbug

Most helpful comment

@jippi Yep. PR is up

All 5 comments

Thanks for reporting this issue- a task shouldn't emit events when it is dead. We have added this to our near-term roadmap.

personally i find it a pretty big bug, will make alerting etc seriously hard on the telemetry when stopped/dead allocations keep emitting the usage as if they are running, making 0.7 telemetry useless for cases like that :( i hope it can make it into 0.7-final considering this

This bug is still present in 0.7.0 stable.
We used nomad-exporter (https://github.com/Nomon/nomad-exporter) before and were super happy to see that prometheus metrics now come out of the box... sadly this bug makes them unusable.

Is this target for next 0.7.* release?

@jippi Yep. PR is up

Was this page helpful?
0 / 5 - 0 ratings