Nomad: [feature] easier method to understand node driver health

Created on 9 Aug 2019  路  1Comment  路  Source: hashicorp/nomad

Currently as I believe, the only way to programatically check the status of a driver on a Nomad client is to process the /v1/node/:node_id API endpoint. In situations where a driver fails, but the cluster has capacity to place the workload on another node, it is possible the driver failure could go unnoticed.

It would be helpful if there was an easier way to monitor the health of a Nomad client node driver, which could in-turn be integrated into an alerting system. A potential thought on this could be to register the detected drivers in Consul as a health check under the Nomad client catalog entry. The health check could be updated as the driver health changes, allowing for easier operation and better observability of cluster issues.

cc @stevenscg

stagneeds-discussion themclient themmetrics

Most helpful comment

Good call, It would potentially be interesting to emit metrics based on driver/plugin health for folks who run alerting through them too.

>All comments

Good call, It would potentially be interesting to emit metrics based on driver/plugin health for folks who run alerting through them too.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Smuerdt picture Smuerdt  路  3Comments

clinta picture clinta  路  3Comments

jippi picture jippi  路  3Comments

dvusboy picture dvusboy  路  3Comments

mlafeldt picture mlafeldt  路  3Comments