Im trying to see if i can utilize the victoria metrics deduplication deduplication description when I/my jobs inject duplicate data. Perhaps I misinterpreted the intended use but I tried to inject data via influx db cli:
(venv) my_vm@ubuntu:~$ curl -G 'http://localhost:8428/api/v1/export' -d 'match={__name__=~"hi.*"}'
(venv) my_vm@ubuntu:~$ curl -d 'hi,tag1=value1 value=15 1581919391144000000' -X POST 'http://localhost:8428/write'
(venv) my_vm@ubuntu:~$ curl -G 'http://localhost:8428/api/v1/export' -d 'match={__name__=~"hi.*"}'
{"metric":{"__name__":"hi_value","tag1":"value1"},"values":[15],"timestamps":[1581919391144]}
(venv) my_vm@ubuntu:~$ curl -d 'hi,tag1=value1 value=15 1581919391144000000' -X POST 'http://localhost:8428/write'
(venv) my_vm@ubuntu:~$ curl -G 'http://localhost:8428/api/v1/export' -d 'match={__name__=~"hi.*"}'
{"metric":{"__name__":"hi_value","tag1":"value1"},"values":[15,15],"timestamps":[1581919391144,1581919391144]}
(venv) my_vm@ubuntu:~$
It looks like it didn't work but when I try to instantiate my victoria metric docker container with the argument I get the following error (that it doesn't support the argument):
(venv) my_vm@ubuntu:~$ docker container run -d -p 8428:8428 -v victoria_metrics:/victoria-metrics-data valyala/victoria-metrics:latest -retentionPeriod=24 -dedup.minScrapeInterval=1s
7b41cf84c9250f4f657910e5941dfa91afe40f35f38aa4fab43b926f7cdc7207
(venv) my_vm@ubuntu:~$ docker container ls --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7b41cf84c925 valyala/victoria-metrics:latest "/victoria-metrics-p…" 5 seconds ago Exited (2) 4 seconds ago vigorous_brahmagupta
(venv) my_vm@ubuntu:~$ docker container
attach commit cp create diff exec export inspect kill logs ls pause port prune rename restart rm run start stats stop top unpause update wait
(venv) my_vm@ubuntu:~$ docker container logs vigorous_brahmagupta
flag provided but not defined: -dedup.minScrapeInterval
victoria-metrics-20191202-131048-tags-v1.30.2-0-g62a915f2
Usage of /victoria-metrics-prod:
-bigMergeConcurrency int
The maximum number of CPU cores to use for big merges. Default value is used if set to 0
-deleteAuthKey string
authKey for metrics' deletion via /api/v1/admin/tsdb/delete_series
-enableTCP6
Whether to enable IPv6 for listening and dialing. By default only IPv4 TCP is used
-graphiteListenAddr string
TCP and UDP address to listen for Graphite plaintext data. Usually :2003 must be set. Doesn't work if empty
-http.disableResponseCompression
Disable compression of HTTP responses for saving CPU resources. By default compression is enabled to save network bandwidth
-httpAuth.password string
Password for HTTP Basic Auth. The authentication is disabled if -httpAuth.username is empty
-httpAuth.username string
Username for HTTP Basic Auth. The authentication is disabled if empty. See also -httpAuth.password
-httpListenAddr string
TCP address to listen for http connections (default ":8428")
-influxMeasurementFieldSeparator {measurement}{separator}{field_name}
Separator for {measurement}{separator}{field_name} metric name when inserted via Influx line protocol (default "_")
-influxSkipSingleField {measurement}
Uses {measurement} instead of `{measurement}{separator}{field_name}` for metic name if Influx line contains only a single field
-loggerLevel string
Minimum level of errors to log. Possible values: INFO, ERROR, FATAL, PANIC (default "INFO")
-maxConcurrentInserts int
The maximum number of concurrent inserts (default 4)
-maxInsertRequestSize int
The maximum size of a single insert request in bytes (default 33554432)
-maxLabelsPerTimeseries int
The maximum number of labels accepted per time series. Superflouos labels are dropped (default 30)
-memory.allowedPercent float
Allowed percent of system memory VictoriaMetrics caches may occupy (default 60)
-metricsAuthKey string
Auth key for /metrics. It overrides httpAuth settings
-opentsdbHTTPListenAddr string
TCP address to listen for OpentTSDB HTTP put requests. Usually :4242 must be set. Doesn't work if empty
-opentsdbListenAddr string
TCP and UDP address to listen for OpentTSDB put messages. Usually :4242 must be set. Doesn't work if empty
-pprofAuthKey string
Auth key for /debug/pprof. It overrides httpAuth settings
-precisionBits int
The number of precision bits to store per each value. Lower precision bits improves data compression at the cost of precision loss (default 64)
-retentionPeriod int
Retention period in months (default 1)
-search.disableCache
Whether to disable response caching. This may be useful during data backfilling
-search.latencyOffset duration
The time when data points become visible in query results after the colection. Too small value can result in incomplete last points for query results (default 30s)
-search.logSlowQueryDuration duration
Log queries with execution time exceeding this value. Zero disables slow query logging (default 5s)
-search.maxConcurrentRequests int
The maximum number of concurrent search requests. It shouldn't exceed 2*vCPUs for better performance. See also -search.maxQueueDuration (default 2)
-search.maxLookback -search.lookback-delta
Synonim to -search.lookback-delta from Prometheus. The value is dynamically detected from interval between time series datapoints if not set. It can be overridden on per-query basis via `max_lookback` arg
-search.maxPointsPerTimeseries int
The maximum points per a single timeseries returned from the search (default 30000)
-search.maxQueryDuration duration
The maximum time for search query execution (default 30s)
-search.maxQueryLen int
The maximum search query length in bytes (default 16384)
-search.maxQueueDuration duration
The maximum time the request waits for execution when -search.maxConcurrentRequests limit is reached (default 10s)
-search.maxTagKeys int
The maximum number of tag keys returned per search (default 100000)
-search.maxTagValues int
The maximum number of tag values returned per search (default 100000)
-search.maxUniqueTimeseries int
The maximum number of unique time series each search can scan (default 300000)
-smallMergeConcurrency int
The maximum number of CPU cores to use for small merges. Default value is used if set to 0
-snapshotAuthKey string
authKey, which must be passed in query string to /snapshot* pages
-storageDataPath string
Path to storage data (default "victoria-metrics-data")
-tls -tlsCertFile
Whether to enable TLS (aka HTTPS) for incoming requests. -tlsCertFile and `-tlsKeyFile` must be set if `-tls` is set
-tlsCertFile -tls
Path to file with TLS certificate. Used only if -tls is set. Prefer ECDSA certs instead of RSA certs, since RSA certs are slow
-tlsKeyFile -tls
Path to file with TLS key. Used only if -tls is set
-version
Show VictoriaMetrics version
Did I misinterpret the "deduplication" capability or I'm not using it correctly when I instantiate a docker container? Victoria metrics deduplicates the data from prometheus export/remote_write?
Hi @ozn0417 looks like you use wrong image valyala/victoria-metrics:latest (this obsolete path). The right one you can find here https://hub.docker.com/r/victoriametrics/victoria-metrics
The de-duplication has been added in v1.31.0, while it looks like you use v1.30.2 - see the victoria-metrics-20191202-131048-tags-v1.30.2-0-g62a915f2 line in the log output.
The de-duplication is applied to all the ingested data via all the supported ingestion protocols.
As @tenmozes already said, the http://hub.docker.com/r/valyala/victoria-metrics has been deprecated long time ago, so it is recommended switching to https://hub.docker.com/r/victoriametrics/victoria-metrics and using the latest image from there.
This worked. Thanks to both of you.
Didn't realize the valyala/vict... was deprecated. I'll use the suggested image.