Influxdb: Points missing after compaction

Created on 2 Mar 2017 · 15Comments · Source: influxdata/influxdb

Bug report

__System info:__
version 1.2.0, branch master, commit b7bb7e8359642b6e071735b50ae41f5eb343fd42
Docker image: influxdb:1.2.0-alpine with id 3a859cac1ae5
Docker is running inside Docker For Mac 1.13.1 (15353) Build 94675c5a76 on host with macOS 10.12.3

__Steps to reproduce:__

Start a fresh container: docker run --name influxdb -d -p 8086:8086 -v $PWD:/var/lib/influxdb influxdb:1.2.0-alpine
Create a database named "test"
Create batches (BatchPoints) of 5000 points with timestamps starting at 0 and incrementing to 1,000,000. Batches should be configured to use write consistency = quorum. Each point has 2 tags and 5 fields.
Start 5 threads and send the batches as quickly as possible. I used the influxdb-java client.
Query for a count of points inserted once all requests are complete.

__Expected behavior:__
Query returns a count of 1,000,000.

__Actual behavior:__
Query sometimes returns 1,000,000 but often returns a count of slightly less (10-100 less on average).

__Additional info:__

I've observed the first query return a count of 1,000,000 and a subsequent query after compaction logs return a lower count.

Here's the logs from a bad run where the first query returned the correct count and a query after compaction returned a lower count.

Client logs:

Generated and wrote 1000000 points.
Querying for how many were persisted:
QueryResult [results=[Result [series=[Series [name=jordan|app1|0|1, tags=null, columns=[time, count], values=[[0.0, 1000000.0]]]], error=null]], error=null]

Querying again after waiting a bit:
QueryResult [results=[Result [series=[Series [name=jordan|app1|0|1, tags=null, columns=[time, count], values=[[0.0, 999947.0]]]], error=null]], error=null]

InfluxDB logs:
https://gist.github.com/jganoff/db12b62ece9f7dcb9b2e90ff5a73b4bc#file-influxdb-log

Here's the requested debug information: https://gist.github.com/jganoff/db12b62ece9f7dcb9b2e90ff5a73b4bc

It looks like the missing points are from somewhere near the middle this time though they can be scattered around so don't over index on their locality here:

> select count("value") from "jordan|app1|0|1" where time >= 0 group by time(1m) fill(none)
name: jordan|app1|0|1
time         count
----         -----
0            60000
60000000000  60000
120000000000 60000
180000000000 60000
240000000000 60000
300000000000 60000
360000000000 59970
420000000000 59977
480000000000 60000
540000000000 60000
600000000000 60000
660000000000 60000
720000000000 60000
780000000000 60000
840000000000 60000
900000000000 60000
960000000000 40000

With more resolution:

> select count("value") from "jordan|app1|0|1" where time >= 0 group by time(5s) fill(none)
time         count
----         -----
...
385000000000 5000
390000000000 5000
395000000000 4994
400000000000 4994
405000000000 4994
410000000000 4994
415000000000 4994
420000000000 4997
425000000000 4996
430000000000 4996
435000000000 4996
440000000000 4996
445000000000 4996
450000000000 5000
455000000000 5000
460000000000 5000
...

aretsm need more info

Source

jganoff

All 15 comments

Would you be able to attach your script that you use to repro this? Also, can you run influx_inspect dumptsm XXXX.tsm on each of the tsm files in the /var/lib/influxdb/data/test/autogen/2 dir and attach the output?

jwilder on 2 Mar 2017

Here's the tsm dump:

/var/lib/influxdb/data/test/autogen/2 # influx_inspect dumptsm 000000003-000000002.tsm
Summary:
  File: 000000003-000000002.tsm
  Time Range: 1970-01-01T00:00:00Z - 1970-01-01T00:16:39.999Z
  Duration: 16m39.999s   Series: 5   File Size: 825355

Statistics
  Blocks:
    Total: 5005 Size: 684894 Min: 26 Max: 251 Avg: 136
  Index:
    Total: 5005 Size: 140448
  Points:
    Total: 4999735
  Encoding:
    Timestamp:  none: 0 (0%)    s8b: 85 (1%)    rle: 4920 (98%)
    Float:      none: 0 (0%)    gor: 4004 (80%)
    Int:        none: 0 (0%)    s8b: 0 (0%)     rle: 1001 (20%)
  Compression:
    Per block: 0.14 bytes/point
    Total: 0.17 bytes/point

jganoff on 2 Mar 2017

Is there only 1 tsm file? The stats indicate there should be two.

jwilder on 2 Mar 2017

Only 1 for test.autogen:

/var/lib/influxdb/data # find .
.
./_internal
./_internal/monitor
./_internal/monitor/1
./_internal/monitor/1/000000001-000000001.tsm
./test
./test/autogen
./test/autogen/2
./test/autogen/2/000000003-000000002.tsm

jganoff on 2 Mar 2017

The points might still be in the wal. Try running influxd w/ INFLUXDB_DATA_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1s influxd. If they are in the wal, that should cause them to get snapshotted to a new TSM file after 1s.

jwilder on 2 Mar 2017

Here's the test application I've been able to reproduce the issue with: https://github.com/jganoff/influxdb-concurrent-write-test

I wish it were a more simple example without Flink but I couldn't reproduce this morning using my repro steps above.

I'll test with a shorter write cold duration next.

jganoff on 2 Mar 2017

I've updated the README in the test repo to describe how the test works.

I ran it a few times with INFLUXDB_DATA_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1s and still see missing data points.

jganoff on 2 Mar 2017

INFLUXDB_DATA_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1s should cause any points still in the WAL to get written to tsm files. Something is off because that 1 tsm file you ran influx_inspect on only had 4.9M points and the stats indicate there should be 2 files.

After running w/ the above env var, can you see if there are more than 1 tsm files on disk and that the contents of /var/lib/influxdb/wal/test/autogen/wal is empty or just 0 length files?

If there is nothing in the wal, please run influx_inspect again on all the tsm files for that shard.

Thanks for the repro script. I'll see if I can repro as well.

jwilder on 2 Mar 2017

Here's the state of the tsm and wal files after a few runs ending in a failed one with INFLUXDB_DATA_CACHE_SNAPSHOT_WRITE_COLD_DURATION=1s:

/var/lib/influxdb # find wal/
wal/
wal/_internal
wal/_internal/monitor
wal/_internal/monitor/1
wal/_internal/monitor/1/_00201.wal
wal/test
wal/test/autogen
wal/test/autogen/11
wal/test/autogen/11/_00006.wal
/var/lib/influxdb # ls -l wal/test/autogen/11/_00006.wal
-rw-r--r--    1 root     root             0 Mar  2 19:33 wal/test/autogen/11/_00006.wal
/var/lib/influxdb # find data
data
data/_internal
data/_internal/monitor
data/_internal/monitor/1
data/_internal/monitor/1/000000017-000000004.tsm
data/_internal/monitor/1/000000193-000000005.tsm
data/_internal/monitor/1/000000197-000000003.tsm
data/_internal/monitor/1/000000201-000000003.tsm
data/_internal/monitor/1/000000202-000000001.tsm
data/test
data/test/autogen
data/test/autogen/11
data/test/autogen/11/000000002-000000002.tsm
data/test/autogen/11/000000003-000000001.tsm
/var/lib/influxdb # ls -l data/test/autogen/11/
total 804
-rw-r--r--    1 root     root        654164 Mar  2 19:33 000000002-000000002.tsm
-rw-r--r--    1 root     root        165996 Mar  2 19:33 000000003-000000001.tsm

And here's the tsm dump for those 2 files:

/var/lib/influxdb # influx_inspect dumptsm data/test/autogen/11/000000002-000000002.tsm
Summary:
  File: data/test/autogen/11/000000002-000000002.tsm
  Time Range: 1970-01-01T00:00:00Z - 1970-01-01T00:13:17.765Z
  Duration: 13m17.765s   Series: 5   File Size: 654164

Statistics
  Blocks:
    Total: 3990 Size: 542118 Min: 26 Max: 243 Avg: 135
  Index:
    Total: 3990 Size: 112033
  Points:
    Total: 3988815
  Encoding:
    Timestamp:  none: 0 (0%)    s8b: 15 (0%)    rle: 3975 (99%)
    Float:      none: 0 (0%)    gor: 3192 (80%)
    Int:        none: 0 (0%)    s8b: 0 (0%)     rle: 798 (20%)
  Compression:
    Per block: 0.14 bytes/point
    Total: 0.16 bytes/point
/var/lib/influxdb # influx_inspect dumptsm data/test/autogen/11/000000003-000000001.tsm
Summary:
  File: data/test/autogen/11/000000003-000000001.tsm
  Time Range: 1970-01-01T00:13:17.766Z - 1970-01-01T00:16:39.999Z
  Duration: 3m22.233s   Series: 5   File Size: 165996

Statistics
  Blocks:
    Total: 1015 Size: 137250 Min: 26 Max: 158 Avg: 135
  Index:
    Total: 1015 Size: 28733
  Points:
    Total: 1011170
  Encoding:
    Timestamp:  none: 0 (0%)    s8b: 0 (0%)     rle: 1015 (100%)
    Float:      none: 0 (0%)    gor: 812 (80%)
    Int:        none: 0 (0%)    s8b: 0 (0%)     rle: 203 (20%)
  Compression:
    Per block: 0.14 bytes/point
    Total: 0.16 bytes/point

jganoff on 2 Mar 2017

@jganoff I'm not able to reproduce the issue with the repo you provided. One thing I noticed in the client code is that it looks like you are using System.currentTimeMillis() via the InfluxDB java client. Can you verify that the points you are writing do not have the same timestamp? It looks like only 1 series is getting written so duplicate timestamps will cause fewer points to be stored.

For example,

curl "http://localhost:8086/write?db=test" --data-binary "cpu value=1 0
cpu value=2 0"

Would send 2 points, but only 1 would be stored because the second is overwriting the first.

jwilder on 2 Mar 2017

I'm generating monotonically increasing timestamps starting at 0 and ending at 999,999. ConcurrentWriteTestApp.java @ line 80 creates points with a known timestamp value.

You may have to run the test a bunch of times. I can usually reproduce it in under 10 runs.

jganoff on 2 Mar 2017

I'm able to repro it now.

jwilder on 2 Mar 2017

🎉1

@jganoff I have a fix in #8094 for this issue. If you are able to pull down that PR and test it that would be great. I can no longer repro the issue using your test script. Your test repo was really helpful in tracking this down.

jwilder on 3 Mar 2017

I'm running 1,000 iterations of my test against branch jw-8084 (commit af46b29257cfb055d4f23548e89390f4167201ac) now. I'll let you know how it goes.

jganoff on 3 Mar 2017

👍1

1000 iterations completed without fail. Looks like you nailed it. Thanks @jwilder!

jganoff on 4 Mar 2017

🎉1

Was this page helpful?

0 / 5 - 0 ratings