Hi! I try to deploy prometheus cluster with 3 standalone prometheus + victoriametrics backed and promxy as frontend (https://github.com/jacksontj/promxy), it's great work with prometheus, but if I define backend victoriametrics (:8428) in promxy i get result for query "count(count(node_cpu_seconds_total) by (cpu))" and empty result for "node_cpu_guest_seconds_total" with answer "{"status":"success","data":{"resultType":"matrix","result":[]}}".
At the same time direct requests to the VictoriaMetrics give data. Manual requests and those that promxy generates are different judging by tcpdump.
And I strongly lack extended debag logs.
@rumanzo , thanks for the bug report.
In the mean time it would be great if you could provide tcpdump logs for requests generated by Promxy to VictoriaMetrics for the node_cpu_guest_seconds_total query and the corresponding response from VictoriaMetrics.
I tried to collect all possible useful info and attach archive
victoriametrics_debug.zip
Thanks for the additional info. BTW, which VictoriaMetrics version do you use? The similar issue with Promxy has been fixed in v1.13.1. The version may be determined by running ./victoria-metrics-prod --version. cc'ing @ThomasADavis , the original reporter of the issue #20.
VictoriaMetrics version victoria-metrics-20190531-150038-tags-v1.18.9-0-gf2cf5d8e
Promxy version v0.0.38
And I saw issue #20 and it is about another case
From the pcap:
POST /api/v1/query HTTP/1.1
Host: akostin-prometheus-1.openstacklocal:8428
User-Agent: Go-http-client/1.1
Content-Length: 169
Content-Type: application/x-www-form-urlencoded
query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09%3A01%3A14%2B03%3A00HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Thu, 06 Jun 2019 09:24:05 GMT
Content-Length: 63
{"status":"success","data":{"resultType":"matrix","result":[]}}
which means that this host responded with no data to the following query:
{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]&time=2019-06-06T09:01:14+03:00
Although the timestamp is a bit odd (usually its epoch looking) the query looks fine. So for whatever reason the datastore (I'm assuming VictoriaMetrics from this) is not returning data to that query. @rumanzo can you run that same query directly against VictoriaMetrics and see if you get the same result? (I assume you will, but once there is a reproducible query promxy is out of the debug loop-- which should speed things up a bit ;) ).
I made two requests - to prometheus and to VictoriaMetrics, and VictoriaMetrics returned values, while prometheus is not oO
$ curl -vvv -s 'http://akostin-prometheus-3.openstacklocal:9090/api/v1/query_range?' \
--data-urlencode 'time=2019-06-06T09:01:14+03:00' \
--data-urlencode 'query=node_cpu_guest_seconds_total{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]'
* Trying 172.27.42.170:9090...
* TCP_NODELAY set
* Connected to akostin-prometheus-3.openstacklocal (172.27.42.170) port 9090 (#0)
> POST /api/v1/query_range? HTTP/1.1
> Host: akostin-prometheus-3.openstacklocal:9090
> User-Agent: curl/7.65.0
> Accept: */*
> Content-Length: 197
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 197 out of 197 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: application/json
< Date: Thu, 06 Jun 2019 16:28:56 GMT
< Content-Length: 117
<
* Connection #0 to host akostin-prometheus-3.openstacklocal left intact
{"status":"error","errorType":"bad_data","error":"invalid parameter 'start': cannot parse \"\" to a valid timestamp"}%
$ curl -vvv -s 'http://akostin-prometheus-3.openstacklocal:8428/api/v1/query_range?' \
--data-urlencode 'time=2019-06-06T09:01:14+03:00' \
--data-urlencode 'query=node_cpu_guest_seconds_total{instance="akostin-prometheus-3.openstacklocal:9100",__name__="node_cpu_guest_seconds_total"}[1234s]'
* Trying 172.27.42.170:8428...
* TCP_NODELAY set
* Connected to akostin-prometheus-3.openstacklocal (172.27.42.170) port 8428 (#0)
> POST /api/v1/query_range? HTTP/1.1
> Host: akostin-prometheus-3.openstacklocal:8428
> User-Agent: curl/7.65.0
> Accept: */*
> Content-Length: 197
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 197 out of 197 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Access-Control-Allow-Origin: *
< Content-Type: application/json
< Date: Thu, 06 Jun 2019 16:29:10 GMT
< Content-Length: 914
<
* Connection #0 to host akostin-prometheus-3.openstacklocal left intact
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"0","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"nice"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"0","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"user"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"1","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"nice"},"values":[[1559838250,"0"],[1559838550,"0"]]},{"metric":{"__name__":"node_cpu_guest_seconds_total","cpu":"1","instance":"akostin-prometheus-3.openstacklocal:9100","job":"akostin-prometheus-servers","mode":"user"},"values":[[1559838250,"0"],[1559838550,"0"]]}]}}
I have not seen anything wrong lately - but we use promxy v0.0.38, with VM 1.18.8, and Prometheus 2.10.0, with Grafana v6.2.0
From the prometheus API docs:
Input timestamps may be provided either in RFC3339 format or as a Unix timestamp in seconds, with optional decimal places for sub-second precision. Output timestamps are always represented as Unix timestamps in seconds.
So it seems that VictoriaMetrics support more timestamp types than prom does :) Whats interesting is that when I send the timestamp as you have on my prom stack with promxy I do get data back. So interesting, but not the issue we are seeing here.
In your query here you are showing data back when you hit http://akostin-prometheus-3.openstacklocal but the TCPdump shows no data from http://akostin-prometheus-1.openstacklocal. Do your 3 hosts all have different data? Right now you have all 3 hosts configured in the same servergroup -- which means promxy assumes they all have the same data (more detail at https://github.com/jacksontj/promxy#what-is-a-servergroup). If these 3 nodes are actually separate you can either have separate servergroups for each or use the relabel_configs to generate different labels per host.
I get VM works, but Prometheus/Promxy does not, they both complain about the date/timestamp
I have identical configs (prometheus, promxy, victoriametrics) on 3 identical servers. Promxy.yml config attached earlier.
The important thing is that when you change the port in promxy configuration to 9090 (prometheus) , everything starts working.
Ah, then definitely sounds like an issue with VictoriaMetrics or the routing of metrics from prom to VictoriaMetrics. Promxy in this configuration will hit all 3 VM servers and take the first one that responds (assuming no "holes" in the data), so if (for example) the prometheus servers were only configured to write to 1 VM server promxy could return no data if one of the other 2 responded first. If the writes are being properly routed then it does sound like some issue VM side.
So it seems that VictoriaMetrics support _more_ timestamp types than prom does :)
VictoriaMetrics just silently replaces invalid time with the current time. This needs to be fixed, since it hides the real error - invalid time.
@rumanzo , note that Promxy converts the initial request to /api/v1/query_range to range request to /api/v1/query. Could you issue the following request to all the VM nodes and make sure they return identical non-empty results:
http://akostin-prometheus-1.openstacklocal:8428/api/v1/query?query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09:01:14Z
Adjust the time arg in the url if necessary.
Make the same with Prometheus nodes and compare results.
That's a bit scary, I fixed a similar bug on trickster. Imo if the time is
wrong it should just error :)
On Thu, Jun 6, 2019, 12:10 PM Aliaksandr Valialkin notifications@github.com
wrote:
So it seems that VictoriaMetrics support more timestamp types than prom
does :)VictoriaMetrics just silently replaces invalid time with the current time
https://github.com/VictoriaMetrics/VictoriaMetrics/blob/5cbd4cfca949a3c85354e2408e7ce1a7f9a683bc/app/vmselect/prometheus/prometheus.go#L479.
This needs to be fixed, since it hides the real error - invalid time.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/52?email_source=notifications&email_token=AAMLPHOSH2TOM7F3KSD4W33PZFOJZA5CNFSM4HTU3LM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXD3S4Y#issuecomment-499628403,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMLPHPLRNCFFITCQZGGY33PZFOJZANCNFSM4HTU3LMQ
.
That's a bit scary, I fixed a similar bug on trickster. Imo if the time is
wrong it should just error :)
Agreed and fixed it in the https://github.com/VictoriaMetrics/VictoriaMetrics/commit/667115a5c789cc2e62cbde4382490acb8dfa9fee :) Proper error handling for invalid time parsing will be available in the next release of VictoriaMetrics.
First I build VictoriaMeters from master and update all nodes to newer version (-20190606-203122-heads-master-0-gf4e63cd)
http://akostin-prometheus-1.openstacklocal:8428/api/v1/query?query=%7Binstance%3D%22akostin-prometheus-3.openstacklocal%3A9100%22%2C__name__%3D%22node_cpu_guest_seconds_total%22%7D%5B1234s%5D&time=2019-06-06T09:01:14Z
I checked on my nodes with this query - now all replies empty {"status":"success","data":{"resultType":"matrix","result":[]}}
And I try another query with another metrics. I get query from grafana query inspector, curl all endpoints, prometheus and victoriametrics gave me normal result, promxy gave me nothing. I found query to victoriametrics from promxy with tcpdump, and try curl with that modified query mannualy:
curl -v 'http://akostin-prometheus-3.openstacklocal:8428/api/v1/query?query=%7Binstance%3D~%22akostin-prometheus-1.openstacklocal%3A9100%22%2Cjob%3D~%22akostin-prometheus-servers%22%2C__name__%3D%22node_memory_Cached_bytes%22%7D%5B621s%5D&time=2019-06-06T23%3A56%3A20%2B03%3A00'
And result with prometheus (:9090) is normal, victoriametrics gave me empty result.
And notice that not all queries broken (standard node_exporter)

It looks like I found and fixed the root cause of the bug in v1.18.11. VictoriaMetrics improperly handled {__name__ op "string"} label filters. The bug was introduced in v1.18.9 while working on this feature request.
@rumanzo , could you confirm the issue is fixed in v1.18.11?
It looks like I found and fixed the root cause of the bug in v1.18.11. VictoriaMetrics improperly handled
{__name__ op "string"}label filters. The bug was introduced in v1.18.9 while working on this feature request.
@rumanzo , could you confirm the issue is fixed in v1.18.11?
Hi. Yes, it works right! Thank you!
Great! Closing the issue then.