__System info:__
version 1.2.0, branch master, commit b7bb7e8359642b6e071735b50ae41f5eb343fd42
Docker image: influxdb:1.2.0-alpine with id 3a859cac1ae5
Docker is running inside Docker For Mac 1.13.1 (15353) Build 94675c5a76 on host with macOS 10.12.3
__Steps to reproduce:__
docker run --name influxdb -d -p 8086:8086 -v $PWD:/var/lib/influxdb influxdb:1.2.0-alpineinfluxdb-java client.__Expected behavior:__
Query returns a count of 1,000,000.
__Actual behavior:__
Query sometimes returns 1,000,000 but often returns a count of slightly less (10-100 less on average).
__Additional info:__
I've observed the first query return a count of 1,000,000 and a subsequent query after compaction logs return a lower count.
Here's the logs from a bad run where the first query returned the correct count and a query after compaction returned a lower count.
Client logs:
Generated and wrote 1000000 points.
Querying for how many were persisted:
QueryResult [results=[Result [series=[Series [name=jordan|app1|0|1, tags=null, columns=[time, count], values=[[0.0, 1000000.0]]]], error=null]], error=null]
Querying again after waiting a bit:
QueryResult [results=[Result [series=[Series [name=jordan|app1|0|1, tags=null, columns=[time, count], values=[[0.0, 999947.0]]]], error=null]], error=null]
InfluxDB logs:
https://gist.github.com/jganoff/db12b62ece9f7dcb9b2e90ff5a73b4bc#file-influxdb-log
Here's the requested debug information: https://gist.github.com/jganoff/db12b62ece9f7dcb9b2e90ff5a73b4bc
It looks like the missing points are from somewhere near the middle this time though they can be scattered around so don't over index on their locality here:
> select count("value") from "jordan|app1|0|1" where time >= 0 group by time(1m) fill(none)
name: jordan|app1|0|1
time count
---- -----
0 60000
60000000000 60000
120000000000 60000
180000000000 60000
240000000000 60000
300000000000 60000
360000000000 59970
420000000000 59977
480000000000 60000
540000000000 60000
600000000000 60000
660000000000 60000
720000000000 60000
780000000000 60000
840000000000 60000
900000000000 60000
960000000000 40000
With more resolution:
> select count("value") from "jordan|app1|0|1" where time >= 0 group by time(5s) fill(none)
time count
---- -----
...
385000000000 5000
390000000000 5000
395000000000 4994
400000000000 4994
405000000000 4994
410000000000 4994
415000000000 4994
420000000000 4997
425000000000 4996
430000000000 4996
435000000000 4996
440000000000 4996
445000000000 4996
450000000000 5000
455000000000 5000
460000000000 5000
...
Would you be able to attach your script that you use to repro this? Also, can you run influx_inspect dumptsm XXXX.tsm on each of the tsm files in the /var/lib/influxdb/data/test/autogen/2 dir and attach the output?
Here's the tsm dump:
/var/lib/influxdb/data/test/autogen/2 # influx_inspect dumptsm 000000003-000000002.tsm
Summary:
File: 000000003-000000002.tsm
Time Range: 1970-01-01T00:00:00Z - 1970-01-01T00:16:39.999Z
Duration: 16m39.999s Series: 5 File Size: 825355
Statistics
Blocks:
Total: 5005 Size: 684894 Min: 26 Max: 251 Avg: 136
Index:
Total: 5005 Size: 140448
Points:
Total: 4999735
Encoding:
Timestamp: none: 0 (0%) s8b: 85 (1%) rle: 4920 (98%)
Float: none: 0 (0%) gor: 4004 (80%)
Int: none: 0 (0%) s8b: 0 (0%) rle: 1001 (20%)
Compression:
Per block: 0.14 bytes/point
Total: 0.17 bytes/point
Is there only 1 tsm file? The stats indicate there should be two.
Only 1 for test.autogen:
/var/lib/influxdb/data # find .
.
./_internal
./_internal/monitor
./_internal/monitor/1
./_internal/monitor/1/000000001-000000001.tsm
./test
./test/autogen
./test/autogen/2
./test/autogen/2/000000003-000000002.tsm
The points might still be in the wal. Try running influxd w/ INFLUXDB_DATA_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1s influxd. If they are in the wal, that should cause them to get snapshotted to a new TSM file after 1s.
Here's the test application I've been able to reproduce the issue with: https://github.com/jganoff/influxdb-concurrent-write-test
I wish it were a more simple example without Flink but I couldn't reproduce this morning using my repro steps above.
I'll test with a shorter write cold duration next.
I've updated the README in the test repo to describe how the test works.
I ran it a few times with INFLUXDB_DATA_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1s and still see missing data points.
INFLUXDB_DATA_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1s should cause any points still in the WAL to get written to tsm files. Something is off because that 1 tsm file you ran influx_inspect on only had 4.9M points and the stats indicate there should be 2 files.
After running w/ the above env var, can you see if there are more than 1 tsm files on disk and that the contents of /var/lib/influxdb/wal/test/autogen/wal is empty or just 0 length files?
If there is nothing in the wal, please run influx_inspect again on all the tsm files for that shard.
Thanks for the repro script. I'll see if I can repro as well.
Here's the state of the tsm and wal files after a few runs ending in a failed one with INFLUXDB_DATA_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1s:
/var/lib/influxdb # find wal/
wal/
wal/_internal
wal/_internal/monitor
wal/_internal/monitor/1
wal/_internal/monitor/1/_00201.wal
wal/test
wal/test/autogen
wal/test/autogen/11
wal/test/autogen/11/_00006.wal
/var/lib/influxdb # ls -l wal/test/autogen/11/_00006.wal
-rw-r--r-- 1 root root 0 Mar 2 19:33 wal/test/autogen/11/_00006.wal
/var/lib/influxdb # find data
data
data/_internal
data/_internal/monitor
data/_internal/monitor/1
data/_internal/monitor/1/000000017-000000004.tsm
data/_internal/monitor/1/000000193-000000005.tsm
data/_internal/monitor/1/000000197-000000003.tsm
data/_internal/monitor/1/000000201-000000003.tsm
data/_internal/monitor/1/000000202-000000001.tsm
data/test
data/test/autogen
data/test/autogen/11
data/test/autogen/11/000000002-000000002.tsm
data/test/autogen/11/000000003-000000001.tsm
/var/lib/influxdb # ls -l data/test/autogen/11/
total 804
-rw-r--r-- 1 root root 654164 Mar 2 19:33 000000002-000000002.tsm
-rw-r--r-- 1 root root 165996 Mar 2 19:33 000000003-000000001.tsm
And here's the tsm dump for those 2 files:
/var/lib/influxdb # influx_inspect dumptsm data/test/autogen/11/000000002-000000002.tsm
Summary:
File: data/test/autogen/11/000000002-000000002.tsm
Time Range: 1970-01-01T00:00:00Z - 1970-01-01T00:13:17.765Z
Duration: 13m17.765s Series: 5 File Size: 654164
Statistics
Blocks:
Total: 3990 Size: 542118 Min: 26 Max: 243 Avg: 135
Index:
Total: 3990 Size: 112033
Points:
Total: 3988815
Encoding:
Timestamp: none: 0 (0%) s8b: 15 (0%) rle: 3975 (99%)
Float: none: 0 (0%) gor: 3192 (80%)
Int: none: 0 (0%) s8b: 0 (0%) rle: 798 (20%)
Compression:
Per block: 0.14 bytes/point
Total: 0.16 bytes/point
/var/lib/influxdb # influx_inspect dumptsm data/test/autogen/11/000000003-000000001.tsm
Summary:
File: data/test/autogen/11/000000003-000000001.tsm
Time Range: 1970-01-01T00:13:17.766Z - 1970-01-01T00:16:39.999Z
Duration: 3m22.233s Series: 5 File Size: 165996
Statistics
Blocks:
Total: 1015 Size: 137250 Min: 26 Max: 158 Avg: 135
Index:
Total: 1015 Size: 28733
Points:
Total: 1011170
Encoding:
Timestamp: none: 0 (0%) s8b: 0 (0%) rle: 1015 (100%)
Float: none: 0 (0%) gor: 812 (80%)
Int: none: 0 (0%) s8b: 0 (0%) rle: 203 (20%)
Compression:
Per block: 0.14 bytes/point
Total: 0.16 bytes/point
@jganoff I'm not able to reproduce the issue with the repo you provided. One thing I noticed in the client code is that it looks like you are using System.currentTimeMillis() via the InfluxDB java client. Can you verify that the points you are writing do not have the same timestamp? It looks like only 1 series is getting written so duplicate timestamps will cause fewer points to be stored.
For example,
curl "http://localhost:8086/write?db=test" --data-binary "cpu value=1 0
cpu value=2 0"
Would send 2 points, but only 1 would be stored because the second is overwriting the first.
I'm generating monotonically increasing timestamps starting at 0 and ending at 999,999. ConcurrentWriteTestApp.java @ line 80 creates points with a known timestamp value.
You may have to run the test a bunch of times. I can usually reproduce it in under 10 runs.
I'm able to repro it now.
@jganoff I have a fix in #8094 for this issue. If you are able to pull down that PR and test it that would be great. I can no longer repro the issue using your test script. Your test repo was really helpful in tracking this down.
I'm running 1,000 iterations of my test against branch jw-8084 (commit af46b29257cfb055d4f23548e89390f4167201ac) now. I'll let you know how it goes.
1000 iterations completed without fail. Looks like you nailed it. Thanks @jwilder!