Describe the bug
All of my grafana dashboards break when upgrading to 1.34.5 or higher, because queries no longer return proper results when the timeframe is less than 12 hours
I also tried removing the cache/ directory without success
To Reproduce
Expected behavior
I'd expect victoria-metrics to always deliver the correct responses independent of the selected timeframe (it did this correctly in 1.34.5)
Screenshots
Not working because selection is < 12 hours

Working because selection > 12 hours

Version
Correctly working Docker Image: victoriametrics/victoria-metrics:v1.34.4
Not working Docker Images: victoriametrics/victoria-metrics:v1.34.5 , victoriametrics/victoria-metrics:v1.34.6 , victoriametrics/victoria-metrics:v1.34.7
Used command-line flags
--storageDataPath=/data --httpListenAddr=:8428
@Bobonium , thanks for the detailed bug report! It can be related to the following optimization added in v1.34.5:
optimize per-day inverted index search for tag filters matching big number of time series
Unfortunately I cannot reproduce this issue locally, so more help is needed from your side in order to identify the root cause of the issue. Could you provide the following additional details when running v1.34.7?
felix_active_local_endpoints without any additional label filters?felix_active_local_endpoints{some_label="foobar"}?Additionally, it would be great if you could provide archived contents of -storageDataPath directory for the minimum amounts of data where this issue is reproduced. You can publish this archive on s3 or any other file sharing service and send access credentials for this archive to my email [email protected] .
Well the good news is, I just solved the issue and it's working on 1.34.5, 1.34.6 and 1.34.7.
The bad news is I don't know why, I just now upgraded again to get you the data you asked for but it's working flawlessly today.
I tested it yesterday multiple times without success, so I've no idea what changed between then.
I also don't have a backup of the state of the data from yesterday so there's no way for me to reproduce it currently either. I should've made a backup right away, sorry for that.
As a side note, I also deleted the cache directory and the pod several times yesterday, so I'd expect that the problem is not within the cache but the data storage itself.
In case this ever happens again I'll prepare a reproducible backup for you immediately.
Feel free to either close the issue or keep it open for others to see, I'd simply open another one if it's closed and I'm able to reproduce it in the future.
Lastly I'd just like to thank you for your quick and detailed response and for all the work you do on this project.I started using it roughly a year ago, and I'm really happy with it so far. My kubernetes Cluster generates ~25gb in VictoriaMetrics data and everything's running smoothly. And I've never lost a single second of data by simply deploying a single VictoriaMetrics instance with persistence, together with a single prometheus instance (scheduled on different nodes) without persistence.
@Bobonium , thanks for the warm response :)
I guess the issue was related to response cache, which cached incomplete results somehow (probably due to writing historical data to it - see these docs for details on how to deal properly with historical data). You said that you deleted the cache dir, but, probably, VictoriaMetrics re-created this dir on graceful shutdown. Note tat the cache dir must be removed when VictoriaMetrics is stopped. If you remove it while VictoriaMetrics is running, then it will be re-created on shutdown.
Note also that VictoriaMetrics gained support for resetting response cache via /internal/resetRollupResultCache handler, so there is no need in deleting <-storageDataPath>/cache dir now in order to clean cached data.
I'm going to close this issue. Feel free re-opening this one or creating new one if you encounter the same issue again.
Most helpful comment
Well the good news is, I just solved the issue and it's working on 1.34.5, 1.34.6 and 1.34.7.
The bad news is I don't know why, I just now upgraded again to get you the data you asked for but it's working flawlessly today.
I tested it yesterday multiple times without success, so I've no idea what changed between then.
I also don't have a backup of the state of the data from yesterday so there's no way for me to reproduce it currently either. I should've made a backup right away, sorry for that.
As a side note, I also deleted the cache directory and the pod several times yesterday, so I'd expect that the problem is not within the cache but the data storage itself.
In case this ever happens again I'll prepare a reproducible backup for you immediately.
Feel free to either close the issue or keep it open for others to see, I'd simply open another one if it's closed and I'm able to reproduce it in the future.
Lastly I'd just like to thank you for your quick and detailed response and for all the work you do on this project.I started using it roughly a year ago, and I'm really happy with it so far. My kubernetes Cluster generates ~25gb in VictoriaMetrics data and everything's running smoothly. And I've never lost a single second of data by simply deploying a single VictoriaMetrics instance with persistence, together with a single prometheus instance (scheduled on different nodes) without persistence.