Describe the bug
Expected behavior
query_range will be more faster
Screenshots

Version
The line returned when passing --version command line flag to binary. For example:
./vmstorage-prod -version
vmstorage-20200623-210915-heads-cluster-0-g46c5c077
Used command-line flags
./vminsert-prod -influxSkipSingleField -insert.maxQueueDuration 10s -storageNode=10.15.38.29:8400 -storageNode=10.15.38.30:8400 -storageNode=10.15.59.47:8400 -storageNode=10.15.59.48:8400 -storageNode=10.15.59.49:8400 -storageNode=10.15.34.61:8400 -storageNode=10.15.38.19:8400 -storageNode=10.15.38.20:8400 -storageNode=10.15.53.33:8400 -storageNode=10.15.59.20:8400 -storageNode=10.15.36.41:8400
./vmselect-prod -search.maxQueryLen 32768 -storageNode=10.15.78.25:8401 -storageNode=10.15.77.18:8401 -storageNode=10.15.77.19:8401 -storageNode=10.15.77.20:8401 -storageNode=10.15.68.27:8401 -storageNode=10.15.78.44:8401 -storageNode=10.15.81.42:8401 -storageNode=10.15.81.46:8401 -storageNode=10.15.87.23:8401 -storageNode=10.15.49.19:8401 -storageNode=10.15.58.61:8401 -storageNode=10.15.38.29:8401 -storageNode=10.15.38.30:8401 -storageNode=10.15.59.47:8401 -storageNode=10.15.59.48:8401 -storageNode=10.15.59.49:8401 -storageNode=10.15.34.61:8401 -storageNode=10.15.38.19:8401 -storageNode=10.15.38.20:8401 -storageNode=10.15.53.33:8401 -storageNode=10.15.59.20:8401 -storageNode=10.15.36.41:8401
./vmstorage-prod -storageDataPath /data1/vmdata -retentionPeriod 6 -search.maxUniqueTimeseries 5000000
Additional context
vmstorage instances. total ingest rate ~ 2.6M points/sgrep '^2020-07.*error' app.loggrep -E '2020-07-01.*(error|warn)' vminsert.loggrep -E '2020-07-01.*(warn|error)' vmselect.log | grep -v 'slow query according' | grep -Ev 'VictoriaMetrics/app/vmselect/main.go:31[78]'update
i upgrade vmstorage/vminsert/vmselect to 0f754bea490b425be531704522fe4ceec7125948
but the problem still exist
i think it's relate to https://github.com/VictoriaMetrics/VictoriaMetrics/issues/430
update
by reduce the query_range request(almost to 0)
8 of 11 vmstorage nodes recovery, but other 3 nodes still have high cpu usage.
i choose 2 node, run pprof and put the result here.
pprof:
high_cpu.gz
recovery.gz
@n4mine , thanks for the detailed info!
This may be related to the issue a586b8b6d4f0ead2b60eb38711f59d0a120d82d7 , which has been fixed in v1.37.3 . The issue may be triggered by heavy queries. Try updating to the latest release and waiting for a day.
The profile shows that the most of time is spent on populating sets of timeseries ids that match the given label filters from incoming queries. It looks like these label filters match big number of time series (millions). Could you share the contents of /select/<accountID>/prometheus/api/v1/status/tsdb page for <accountID> with the biggest amounts of data? Feel free redacting sensitive labels before sharing the page contents.
Could you also explain why the number of -storageNode args passed to vmselect is 22, while the number of -storageNode args passed to vminsert is 11?
Note that the order of -storageNode args passed to vminsert instances must be the same in order to achieve the optimal data distribition and optimal performance for both data ingestion and data querying.
@valyala
This may be related to the issue a586b8b , which has been fixed in v1.37.3 . The issue may be triggered by heavy queries. Try updating to the latest release and waiting for a day.
i upgrade vmstorage/vminsert/vmselect to 0f754be 40 hours ago
but the problem still exist
the contents of /select/\
with the biggest amounts of data?
i use single accountID =0
{
"status": "success",
"data": {
"seriesCountByMetricName": [
{
"name": "by_tags.rpcdisf.latency",
"value": 10412944
},
{
"name": "by_tags.rpc_dirpc_call.latency",
"value": 7764388
},
{
"name": "cpu.irq.core",
"value": 7125778
},
{
"name": "cpu.iowait.core",
"value": 7124495
},
{
"name": "cpu.idle.core",
"value": 7121998
},
{
"name": "cpu.steal.core",
"value": 7120264
},
{
"name": "cpu.softirq.core",
"value": 7120208
},
{
"name": "cpu.sys.core",
"value": 7119071
},
{
"name": "cpu.user.core",
"value": 7118806
},
{
"name": "rpc.latency",
"value": 6644584
}
],
"labelValueCountByLabelName": [
{
"name": "local",
"value": 50332
},
{
"name": "key",
"value": 43511
},
{
"name": "callee",
"value": 32500
},
{
"name": "__name__",
"value": 31361
},
{
"name": "�",
"value": 31065
},
{
"name": "host",
"value": 16881
},
{
"name": "container",
"value": 13405
},
{
"name": "pod",
"value": 12647
},
{
"name": "topic",
"value": 8032
},
{
"name": "url",
"value": 7921
}
],
"seriesCountByLabelValuePair": [
{
"name": "__ns__=__sensitive__ns1",
"value": 41615557
},
{
"name": "protocolType=ALL",
"value": 41607369
},
{
"name": "host=appNetMonitor",
"value": 41607368
},
{
"name": "os=ALL",
"value": 41607368
},
{
"name": "tripCountry=BR",
"value": 38124039
},
{
"name": "sid=gulfstream",
"value": 30609327
},
{
"name": "__ns__=__sensitive__ns2",
"value": 27117483
},
{
"name": "__ns__=__sensitive__ns3",
"value": 19210597
},
{
"name": "caller-cluster=us01-v",
"value": 17705344
},
{
"name": "appType=ALL",
"value": 16604810
}
]
}
}
Could you also explain why the number of -storageNode args passed to vmselect is 22, while the number of -storageNode args passed to vminsert is 11?
cause i switch storage nodes for trying to solve https://github.com/VictoriaMetrics/VictoriaMetrics/issues/581
i have to provide all new and old vmstorage nodes to vmselect, so old data can be provide for obtain.
i think it's suitable for this situation? (switch vmstorage nodes temporary)
step1:
vmselect --> node1 ~ node11
vminsert --> node1 ~ node11
step2: (current state)
vmselect --> node1 ~ node11, node12 ~ node22
vminsert --> node12 ~ node22
step3: (after a few days)
vmselect --> node12 ~ node22
vminsert --> node12 ~ node22
spent on populating sets of timeseries ids that match the given label filters from incoming queries. It looks like these label filters match big number of time series (millions)
are the incoming queries include /insert request?
spent on populating sets of timeseries ids that match the given label filters from incoming queries. It looks like these label filters match big number of time series (millions)
are the incoming queries include /insert request?
answer my self, seem it's positive, now the /select traffic almost to 0.
but the cpu usage is still high.
and the pprof result prove my guess

"seriesCountByLabelValuePair": [
{
"name": "__ns__=__sensitive__ns1",
"value": 41615557
},
{
"name": "protocolType=ALL",
"value": 41607369
},
{
"name": "host=appNetMonitor",
"value": 41607368
},
{
"name": "os=ALL",
"value": 41607368
},
{
"name": "tripCountry=BR",
"value": 38124039
},
this part is same series but was inactive for a long time(30 days+).
i thinks it's shouldn't exist here.
so if /insert path use same label name/value --> TSID mapping as /tsdb/status result above, it will be performance issue?

seems it's slow when determine MetricID in DeletedMetricIDs
a few questions
__ns__=__sensitive__ns1, but it's not workthis part is same series but was inactive for a long time(30 days+). i thinks it's shouldn't exist here.
The /api/v1/status/tsdb returns time series seen during the current day. If doesn't filter out time series deleted on the current day. The /api/v1/status/tsdb accepts optional arg date=YYYY-MM-DD in order to see time series stats for the particular day - see https://victoriametrics.github.io/#troubleshooting for details.
- any solution to reduce the high label -> tsid cache effective immediately?
How much RAM is available per each vmstorage node? It looks like the MetricName -> TSID cache size isn't enough for storing the data for active time series. Could you try increasing RAM size for vmstorage nodes for a few hours and see whether this helps reducing CPU time?
- i delete the highest labelvaluepair series: __ns__=__sensitive__ns1, but it's not work
This will increase the size of dmis set, which contains ids for deleted time series. This may increase CPU usage at dmis.Has call mentioned in the comment above. Note that is itsn't recommended to use delete API on a frequent basis, since it has non-zero overhead - all the deleted time series ids trap into dmis set and stay there for at least -retentionPeriod months.
- do you suggest to split to multi tenant?
This should improve query performance on a per-tenant basis. But this will remove the ability to use data from multiple tenants in a single query. So let's try optimizing the current case.
How much RAM is available per each vmstorage node?
the nodes which low cpu usage, has 16 - 26GB memory available.
and nodes(3 nodes) which high cpu usage, has 38, 48 and 69GB memory available.
and total memory is 128GB.
Could you try increasing RAM size for vmstorage nodes for a few hours and see whether this helps reducing CPU time?
emmm, we use the physical machine, increasing RAM is not easy to operate
all the deleted time series ids trap into dmis set and stay there for at least -retentionPeriod months.
is there a way to clear that dmis sets?
This should improve query performance on a per-tenant basis. But this will remove the ability to use data from multiple tenants in a single query.
i think change tenant will lost all data. it's not a good choose.
So let's try optimizing the current case.
Great :)
the nodes which low cpu usage, has 16 - 26GB memory available.
and nodes(3 nodes) which high cpu usage, has 38, 48 and 69GB memory available.
It is recommend to have nodes with identical hardware configuration per particular service type in the cluster. Otherwise hardware resources may be left underutilized or overutilized.
and total memory is 128GB.
This should be enough for up to 100M active time series in ideal case when time series are evenly distributed across vmstorage nodes. The distribution may be uneven when certain vmstorage nodes are overloaded - in this case vminsert re-routes time series to less loaded vmstorage nodes. This reduces the total capacity of the cluster for active time series. If the metainfo for active time series doesn't fit available RAM, then VictoriaMetrics could start using more CPU and disk IO resources as outlined in troubleshooting docs. Could you check for the number of active time series in the cluster on the official grafana dashboard?
is there a way to clear that dmis sets?
unfortunately VictoriaMetrics doesn't provide mechanisms to delete dmis (ids for deleted time series) :( The dmis contents is stored in the inverted index (indexdb folder under -storageDataPath), so the easiest way to delete it is to delete inverted index. But this is equivalent to deleting all the stored data, since it is impossible to search for the required data without inverted index.
It is recommend to have nodes with identical hardware configuration per particular service type in the cluster. Otherwise hardware resources may be left underutilized or overutilized.
sorry, may be my describe is not clear
every single vmstorage node, has 128G memory
i have 11 vmstorage nodes
Could you check for the number of active time series in the cluster on the official grafana dashboard?
it's ~209M(total)
and these high cpu usage nodes, has the lowest active series
other infomation(btw, all vmstorage node start same time, so the stats of *_total should be consistent)
the vmstorage node which has name *001, *003, *007 is abnormal, and everything(cpu usage going up) begin at 07-01 12:00(+0800)










@valyala
according the suggestion of delete api scenarios
The delete API is intended mainly for the following cases:
- One-off deleting of accidentally written invalid (or undesired) time series.
- One-off deleting of user data due to GDPR.
it's reasonable to check dmis while add() in insert path?
for accidentally written invalid (or undesired) time series or user data due to GDPR
after delete operate finish, these data will not be queryable any more.
all the deleted time series ids trap into dmis set and stay there for at least -retentionPeriod months.
these deleted data exist until retentionPeriod reached
in this case, /delete has called.
so in insert path, to check those series ids which in dmis will be affect performance obviously when /delete impact huge number series(8M vm_deleted_metrics_total{type="indexdb"} in this case)
so, if there is a way to avoid check dmis in insert path? it will be improve performance.
i try to delete && !dmis.Has(r.TSID.MetricID) in #604 , but failed when test in TestStorageDeleteMetrics.
i build it locally and deploy to my environment, it seem worked, the cpu usage was gone down, but i don't known is a good solution or not.
While the idea of removing the check for deleted time series in fast path during data ingestion is OK, it is impossible to just remove the check, since this will lead to inability to insert data for the deleted time series in the future. For example, after the time series foo{bar="baz"} is deleted, Storage.add will continue adding incoming data points to the deleted time series, so they will remain invisible until the corresponding entry in MetricName->TSID cache for the deleted time series is evicted. This is unexpected behavior for users, since they won't see newly added data after deleting time series.
In theory there is another approach - to synchronously delete MetricName->TSID entries for the deleted time series on every delete API call. I doubt it is possible.
Note that the Storage.add function contains two calls to dmis.Has:
MetricName->TSID cache has no the corresponding entry - see https://github.com/VictoriaMetrics/VictoriaMetrics/blob/ff6a0955eb99e1fbb16b60e66153d19dbb8dd9f6/lib/storage/storage.go#L1133 .The attached CPU profile shows that the CPU time is spent on the slow path. This means that the size of MetricName->TSID cache isn't enough for holding information for active time time series. See slow inserts percentage graph on the official dashboard for VictoriaMetrics - if it shows values bigger than a few percents, then it is likely there is no enough memory for holding information for active time series in RAM and this will likely result in increased usage for CPU and disk IO. If your dashboard has no this graph, then try upgrading the dashboard from Grafana site.
There are the following solutions for this case:
vmstorage nodes, so the summary RAM size for them would be enough for holding information for active time series.vmstorage node. As I understood, this case doesn't fit you, since vmstorage runs on bare metal machines.vmstorage nodes. The most effective approach is to investigate /api/v1/status/tsdb output and to remove the source of time series with high cardinality labels.See slow inserts percentage graph on the official dashboard for VictoriaMetrics - if it shows values bigger than a few percents, then it is likely there is no enough memory for holding information for active time series in RAM and this will likely result in increased usage for CPU and disk IO.
the slow inserts percentage seems ok

but the follow image confuse me, these abnormal nodes have enough available memory, but the vm_cache_size_bytes{type="storage/tsid"} on these nodes is strange(bottom 3 lines), is there a tools to debug why the TSID cache is flapping?

@valyala
a new clues,
from 2020-07-01 04:00(utc), something changed, and then tsid cache size start to flapping, and the cpu usage of few vmstorage nodes rising up.
At the time, the vmstorage's log like
2020-07-01T04:00:00.002Z info VictoriaMetrics/lib/mergeset/table.go:167 opening table "/data1/vmdata/indexdb/161B62429BB1E5B2"...
2020-07-01T04:00:00.084Z info VictoriaMetrics/lib/mergeset/table.go:201 table "/data1/vmdata/indexdb/161B62429BB1E5B2" has been opened in 0.08
2 seconds; partsCount: 0; blocksCount: 0, itemsCount: 0; sizeBytes: 0
2020-07-01T04:00:00.090Z info VictoriaMetrics/lib/mergeset/table.go:217 waiting for raw items flusher to stop on "/data1/vmdata/indexdb/161B62
429BB1E5B0"...
2020-07-01T04:00:00.090Z info VictoriaMetrics/lib/mergeset/table.go:220 raw items flusher stopped in 0.000 seconds on "/data1/vmdata/indexdb/1
61B62429BB1E5B0"
2020-07-01T04:00:00.090Z info VictoriaMetrics/lib/mergeset/table.go:222 waiting for converters to stop on "/data1/vmdata/indexdb/161B62429BB1E
5B0"...
2020-07-01T04:00:00.090Z info VictoriaMetrics/lib/mergeset/table.go:225 converters stopped in 0.000 seconds on "/data1/vmdata/indexdb/161B6242
9BB1E5B0"
2020-07-01T04:00:00.090Z info VictoriaMetrics/lib/mergeset/table.go:227 waiting for part mergers to stop on "/data1/vmdata/indexdb/161B62429BB
1E5B0"...
2020-07-01T04:00:00.091Z info VictoriaMetrics/lib/mergeset/table.go:230 part mergers stopped in 0.000 seconds on "/data1/vmdata/indexdb/161B62
429BB1E5B0"
2020-07-01T04:00:00.091Z info VictoriaMetrics/lib/mergeset/table.go:232 flushing inmemory parts to files on "/data1/vmdata/indexdb/161B62429BB
1E5B0"...
2020-07-01T04:00:00.091Z info VictoriaMetrics/lib/mergeset/table.go:256 0 inmemory parts have been flushed to files in 0.000 seconds on "/data
1/vmdata/indexdb/161B62429BB1E5B0"
2020-07-01T04:00:00.466Z info VictoriaMetrics/lib/storage/index_db.go:370 dropping indexDB "/data1/vmdata/indexdb/161B62429BB1E5B0"
2020-07-01T04:00:00.574Z info VictoriaMetrics/lib/storage/index_db.go:372 indexDB "/data1/vmdata/indexdb/161B62429BB1E5B0" has been dropped
2020-07-01T04:00:00.581Z error VictoriaMetrics/lib/storage/index_db.go:2607 missing metricName by metricID 1592974931433885494; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time
2020-07-01T04:00:00.582Z error VictoriaMetrics/lib/storage/index_db.go:2607 missing metricName by metricID 1592974931435360975; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time
2020-07-01T04:00:00.582Z error VictoriaMetrics/lib/storage/index_db.go:2607 missing metricName by metricID 1592974931435362390; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time
2020-07-01T04:00:00.582Z error VictoriaMetrics/lib/storage/index_db.go:2607 missing metricName by metricID 1592974931435381426; this could be the case after unclean shutdown; deleting the metricID, so it could be re-created next time
VictoriaMetrics keeps indexdb instance (inverted index for matching labels to time series) that covers -retentionPeriod. It looks like the log captured indexdb rotation, which occurs every -retentionPeriod - the /data1/vmdata/indexdb/161B62429BB1E5B0 has been substituted by /data1/vmdata/indexdb/161B62429BB1E5B2. It is likely the missing metricName by metricID error is related to a bug with handling of non-empty dmis aka a set of deleted metric ids for each indexdb instance. How many missing metricName by metricID... log lines the log contains after the indexdb rotation?
How many missing metricName by metricID... log lines the log contains after the indexdb rotation?
min logs lines is 43, and max is 306 in different nodes
and the time distribution is from 07-01 04:00(utc) to 07-01 05:57(utc)
@n4mine , thanks for the idea about removing dmis.Has check inside Storage.add! It has been appeared it is quite easy to implement it by just resetting MetricName->TSID cache after the deletion of time series. This guarantees that the cache won't contain entries for the deleted time series. See the commit fe58462bef9f6c211a036fa1e4f9cf3ced4b9ad4 .
@valyala
thanks a lot.
BTW, i have split cluster horizontally from 1 to 4 yesterday by namespace in our business scenario.
i will continue to observe this case until next retentionPeriod reach.
thanks again :)
FYI, the commit that removes dmis.Has calls from data ingestion path has been included in v1.38.0 release.