Nomad: [feature] easier method to understand node driver health

Created on 9 Aug 2019 · 1Comment · Source: hashicorp/nomad

Currently as I believe, the only way to programatically check the status of a driver on a Nomad client is to process the /v1/node/:node_id API endpoint. In situations where a driver fails, but the cluster has capacity to place the workload on another node, it is possible the driver failure could go unnoticed.

It would be helpful if there was an easier way to monitor the health of a Nomad client node driver, which could in-turn be integrated into an alerting system. A potential thought on this could be to register the detected drivers in Consul as a health check under the Nomad client catalog entry. The health check could be updated as the driver health changes, allowing for easier operation and better observability of cluster issues.

cc @stevenscg

stagneeds-discussion themclient themmetrics

Source