What did you do?
hack/cluster-monitoring/deploy
What did you expect to see?
No error logs in prometheus logs and no http error from grafana accessing http://prometheus-k8s.monitoring.svc:9090
What did you see instead? Under which circumstances?
The prometheus datasource in grafana returns an HTTP error. In prometheus targets I see 404s for alertmanager-main and prometheus-k8s endpoints.
The logs for prometheus-k8s contain:
time="2017-05-04T17:37:07Z" level=error msg="Error sending alerts: bad response status 404 Not Found" alertmanager="http://10.112.1.7:9093/api/v1/alerts" count=1 source="notifier.go:370"
And for grafana:
t=2017-05-04T17:24:18+0000 lvl=info msg="Request Completed" logger=context userId=0 orgId=1 uname= method=GET path=/api/v1/series status=404 remote_addr="127.0.0.1, 10.128.0.2" time_ms=6ns size=19
t=2017-05-04T17:24:19+0000 lvl=info msg="Request Completed" logger=context userId=0 orgId=1 uname= method=GET path=/api/v1/query_range status=404 remote_addr="127.0.0.1, 10.128.0.2" time_ms=88ns size=19
Environment
GKE with kubernetes 1.6.2
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:33:11Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
GKE
There were errors until I was able to create the appropriate roles, now the logs for the operator are clean.
I would suspect this is again related to RBAC but now that I've successfully been able to create the Roles that I opened the other issue https://github.com/coreos/prometheus-operator/issues/335 about I don't know where to start to figure out what is wrong/missing.
Well this is a pain... It is the setting of externalUrl in the Prometheus and Alertmanager configs that is the issue. After changing the datasource to http://prometheus-k8s.monitoring.svc:9090/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web/ to match the path of the externalUrl https://127.0.0.1:8001/api/v1/proxy/namespaces/monitoring/services/prometheus-k8s:web/ then grafana is able to read from the datasource.
I assume this is also why alertmanager-main and prometheus-k8s endpoints scraping is returning 404.
routePrefix! Sorry, originally in the docs https://coreos.com/operators/prometheus/docs/latest/user-guides/exposing-prometheus-and-alertmanager.html I missed that routePrefix was also added. Its purpose or that it is added isn't mentioned in the text.
Adding this seems to have fixed the issue.
In case anyone else tries doing the same thing and finds this: AlertManager does not have a routePrefix option, so instead I updated prometheus-k8s-service-monitor-alertmanager.yaml endpoints to include
path: /api/v1/proxy/namespaces/monitoring/services/alertmanager-main:web/metrics
Most helpful comment
In case anyone else tries doing the same thing and finds this: AlertManager does not have a
routePrefixoption, so instead I updatedprometheus-k8s-service-monitor-alertmanager.yamlendpoints to includepath: /api/v1/proxy/namespaces/monitoring/services/alertmanager-main:web/metrics