Influxdb: [feature request] prevent query from hitting OOM conditions

Created on 9 Jan 2016 · 24Comments · Source: influxdata/influxdb

Measurement with 4M data points. Try running a select * with a where clause that'll be true for all records, and the server dies with an OOM instead of rejecting the query gracefully.

This makes influxdb fragile and unfit for production.

Attached full stack.

curl:

curl -G "http://192.168.99.100:32769/query?pretty=true" --data-urlencode "db=test" --data-urlencode "q=SELECT * from ticks where time > '2015-01-01'" -v

influxdb-oom.txt

arequeries kinfeature-request

Source

raulk

Most helpful comment

Closing the loop here, 0.11.0 nightly seems to handle these queries much, MUCH better than before. We're currently running this on >2.5B points across 5924 series.

[root@carf-metrics-influx01 ~]# time /usr/local/influxdb/bin/influx -host carf-metrics-influx01 -port 8086 -database 'tg_udp' -execute 'select * from mem limit 5'
name: mem
---------
time                    available       available_percent       bu      buffered        cached          cls     dc      env             free         host             sr      total           trd     used            used_percent
1453816689000000000     24199622656     96.48625480482886       linux   370388992       542138368       server  njs     production      23500247040  njs-alextestl1   none    25080901632     false   1580654592      6.302223959856724
1453816699000000000     24199778304     96.48687538857934       linux   370388992       542142464       server  njs     production      23500402688  njs-alextestl1   none    25080901632     false   1580498944      6.301603376106252
1453816706000000000     24199655424     96.48638545403949       linux   370397184       542142464       server  njs     production      23500267520  njs-alextestl1   none    25080901632     false   1580634112      6.302142304100083
1453816716000000000     24199233536     96.48470334545269       linux   370401280       542146560       server  njs     production      23499841536  njs-alextestl1   none    25080901632     false   1581060096      6.303840743838216
1453816724000000000     24199172096     96.48445837818275       linux   370401280       542146560       server  njs     production      23499780096  njs-alextestl1   none    25080901632     false   1581121536      6.304085711108139


real    3m10.812s
user    0m0.004s
sys     0m0.004s

Here is the diagnostic output:

[root@carf-metrics-influx01 ~]# time /usr/local/influxdb/bin/influx -host carf-metrics-influx01 -port 8086 -database 'tg_udp' -execute 'show diagnostics'
name: build
-----------
Branch  Build Time      Commit                                          Version
master                  4e8004ec83d7f6748cf3b2fc716027a7200ede56        0.11.0~n1457337636

name: network
-------------
hostname
carf-metrics-influx01

name: runtime
-------------
GOARCH  GOMAXPROCS      GOOS    version
amd64   72              linux   go1.4.3

name: system
------------
PID     currentTime                     started                         uptime
17835   2016-03-14T19:36:28.392217035Z  2016-03-10T12:06:58.322490302Z  103h29m30.069726853s

sebito91 on 14 Mar 2016

🎉2 👍1

All 24 comments

@raulk can you provide a little bit more information so that we can diagnose this better? what version of influxdb are you running? can you tell us more about the structure of the data in the ticks meaurement? thanks!

toddboom on 11 Jan 2016

i am experiencing the same issue with large datasets

https://github.com/influxdata/influxdb/issues/5321

currently running 0.9.6

mhiller on 12 Jan 2016

I'm experiencing similar issues on 0.9.6.1 Hosted InfluxDB (10GB / 1GB memory / 1 core). I'm just playing around with InfluxDB right now and only have a handful measurements totaling a few hundred thousand points spanning the last 18 months or so.

Even doing a query over 2 months of data for a single measurement with no complex groupings or WHERE statements, InfluxDB appears to run out of memory before completing the querty (and restart itself on Hosted Influx).

Here is a screenshot with various measurements from _internal. The annotations are set when runtime.HeapIdle drops below 1mb.

Here is a shared snapshot without the annotations.

Here is an example of the measurement that i can repeatedly cause the issue with:
message_measurement

gabeweaver on 12 Jan 2016

Hi,

With a bigger database over 500 million or so, if you give such an extreme query, server doesn't respond forever and odd thing is it stops responding to other simple queries as well in future. This is easily reproducible. I have seen this behavior in latest nightly builds as well.

Thanks,
Sarat

sarat-k on 12 Jan 2016

It is still very possible to kill the system with a punishing query. Until https://github.com/influxdata/influxdb/issues/655 is implemented there is no way to prevent it. Closing this as a duplicate of that issue.

beckettsean on 13 Jan 2016

@beckettsean I understand that #655 is a _prerequisite_ to be able to kill a running query. But this ticket is about _intelligently detecting_ when a query is exhausting (or would exhaust) too many resources and killing it or rejecting it straightaway. I don't think they are duplicates.

raulk on 14 Jan 2016

@raulk I'll buy that argument.

beckettsean on 15 Jan 2016

Same here. A query like select * from my_measurement limit 1 will exhaust the server too. 0.9.6.1, 64GB RAM.

xvtom on 15 Jan 2016

@xvtom could you try that same query on v0.10.0-beta1?

toddboom on 15 Jan 2016

Certainly improved performance using 0.10.1 nightly, we're not OOM'ing but the query is taking forever to complete (as of write time, still has not completed 30+ mins later). For example, in this case we have 465M points and ran:

select * from mem limit 5

Still waiting for results after 30 mins, but can still process other queries and grafana dashboards. I'm going to leave this query for a while longer, maybe it will finish soon?

[root@carf-metrics-influx01 ~]# /usr/local/influxdb/bin/influx -host carf-metrics-influx01
Visit https://enterprise.influxdata.com to register for updates, InfluxDB server management, and monitoring.
Connected to http://carf-metrics-influx01:8086 version 0.10.1
InfluxDB shell 0.10.1
> show diagnostics
name: build
-----------
Branch  Build Time                      Commit                                          Version
HEAD    2016-02-18T20:44:27.807242      df902a4b077bb270984303b8e4f8a320e3954b40        0.10.1


name: network
-------------
hostname
carf-metrics-influx01


name: runtime
-------------
GOARCH  GOMAXPROCS      GOOS    version
amd64   72              linux   go1.4.3


name: system
------------
PID     currentTime                     started                         uptime
76475   2016-03-01T19:50:26.792768703Z  2016-02-29T21:32:45.50999402Z   22h17m41.282774845s

> show stats
name: httpd
tags: bind=carf-metrics-influx01:8086
pingReq pointsWrittenOK queryReq        queryRespBytes  req     writeReq        writeReqBytes
------- --------------- --------        --------------  ---     --------        -------------
5       345819          12860           2056370564      358684  345819          27960175


name: shard
tags: engine=tsm1, id=49, path=/data/influxdb-data/metrics/_internal/monitor/49
fieldsCreate    seriesCreate    writePointsOk   writeReq
------------    ------------    -------------   --------
0               8               6759            845


name: shard
tags: engine=tsm1, id=50, path=/data/influxdb-data/metrics/_internal/monitor/50
fieldsCreate    seriesCreate    writePointsOk   writeReq
------------    ------------    -------------   --------
45              9               64277           7142


name: shard
tags: engine=tsm1, id=48, path=/data/influxdb-data/metrics/ryan_test/default/48
fieldsCreate    seriesCreate    writePointsOk   writeReq
------------    ------------    -------------   --------
0               99              345819          345819


name: shard
tags: engine=tsm1, id=47, path=/data/influxdb-data/metrics/tg_udp/default/47
fieldsCreate    seriesCreate    writePointsOk   writeReq
------------    ------------    -------------   --------
5               274187          509316225       79625


name: subscriber
----------------
pointsWritten
509691936


name: udp
tags: bind=carf-metrics-influx01:8089
batchesTx       bytesRx         pointsRx        pointsTx
---------       -------         --------        --------
79624           143555191639    509322415       509316225


name: write
-----------
pointReq        pointReqLocal   req     subWriteDrop    subWriteOk      writeOk
509738430       509738430       433431  1773            431658          433430


name: runtime
-------------
Alloc           Frees           HeapAlloc       HeapIdle        HeapInUse       HeapObjects     HeapReleased    HeapSys         Lookups Mallocs         NumGC   NumGoroutine    PauseTotalNs    Sys             TotalAlloc
16290165848     120288076475    16290165848     19548143616     17527291904     95654301        6322061312      37075435520     719293  120383730776    6113    152             2488085868702   39877768256     20534181977800

sebito91 on 1 Mar 2016

Update, 2.5 hours to complete. Although that took an insane amount of time (across a relatively small set of data), at least we didn't OOM.

I'm going to re-test in a few days after we have >2B points within the system.

sebito91 on 1 Mar 2016

@sebito91 It's great news that the query didn't OOM, since it did pull 465M points from disk in order to return 5, as silly as that is. The new query engine in 0.11 uses LIMIT to restrict the results sampled, not just returned, so it won't pull all 465M points in order to sort them and return only 5. I suspect we will see at least a 1000x improvement in the performance of this specific query.

beckettsean on 2 Mar 2016

@beckettsean, that is literally revolutionary! Let us know when this hits a nightly and we can check it out.

Very big :+1:

sebito91 on 2 Mar 2016

@sebito91 the current 0.11 nightlies have the updated query engine. We are doing compatibility and regression testing now, we'd love to hear how it works for you.

beckettsean on 2 Mar 2016

Closing the loop here, 0.11.0 nightly seems to handle these queries much, MUCH better than before. We're currently running this on >2.5B points across 5924 series.

[root@carf-metrics-influx01 ~]# time /usr/local/influxdb/bin/influx -host carf-metrics-influx01 -port 8086 -database 'tg_udp' -execute 'select * from mem limit 5'
name: mem
---------
time                    available       available_percent       bu      buffered        cached          cls     dc      env             free         host             sr      total           trd     used            used_percent
1453816689000000000     24199622656     96.48625480482886       linux   370388992       542138368       server  njs     production      23500247040  njs-alextestl1   none    25080901632     false   1580654592      6.302223959856724
1453816699000000000     24199778304     96.48687538857934       linux   370388992       542142464       server  njs     production      23500402688  njs-alextestl1   none    25080901632     false   1580498944      6.301603376106252
1453816706000000000     24199655424     96.48638545403949       linux   370397184       542142464       server  njs     production      23500267520  njs-alextestl1   none    25080901632     false   1580634112      6.302142304100083
1453816716000000000     24199233536     96.48470334545269       linux   370401280       542146560       server  njs     production      23499841536  njs-alextestl1   none    25080901632     false   1581060096      6.303840743838216
1453816724000000000     24199172096     96.48445837818275       linux   370401280       542146560       server  njs     production      23499780096  njs-alextestl1   none    25080901632     false   1581121536      6.304085711108139


real    3m10.812s
user    0m0.004s
sys     0m0.004s

Here is the diagnostic output:

[root@carf-metrics-influx01 ~]# time /usr/local/influxdb/bin/influx -host carf-metrics-influx01 -port 8086 -database 'tg_udp' -execute 'show diagnostics'
name: build
-----------
Branch  Build Time      Commit                                          Version
master                  4e8004ec83d7f6748cf3b2fc716027a7200ede56        0.11.0~n1457337636

name: network
-------------
hostname
carf-metrics-influx01

name: runtime
-------------
GOARCH  GOMAXPROCS      GOOS    version
amd64   72              linux   go1.4.3

name: system
------------
PID     currentTime                     started                         uptime
17835   2016-03-14T19:36:28.392217035Z  2016-03-10T12:06:58.322490302Z  103h29m30.069726853s

sebito91 on 14 Mar 2016

🎉2 👍1

@sebito91 that still looks way off to me. It looks like it's actually churning through all of the data, rather than doing a real LIMIT query. If this were working properly I'd expect it to return in under a second.

@benbjohnson: thoughts?

pauldix on 14 Mar 2016

@sebito91 How long does it take if you select a specific field? Like:

select free from mem limit 5

pauldix on 14 Mar 2016

@pauldix I'll do some profiling against the LIMIT queries again. I agree that it should be much faster for only ~6k series with a LIMIT 5.

benbjohnson on 14 Mar 2016

:clap:

toddboom on 14 Mar 2016

@pauldix I looked into this further and I think it's a jitter issue. If there are overlapping tsm blocks then it has to deduplicate at query time which gets really expensive. I see similar multi-minute results with a 0.5B points across 6,000 series. However, without jitter the query time goes down to 1.5s.

benbjohnson on 31 Mar 2016

@benbjohnson so to restate in a way we might explain to customers, the actual determination of which 5 points are the earliest takes a long time when there are thousands of series, since the nanoseconds for each first point in the shard group have to be compared.

beckettsean on 4 Apr 2016

@beckettsean it has to do with how data is saved. if points come in out of timestamp order then the tsm engine will create multiple blocks and then it has to deduplicate those points at query time. There's a fast path if there is only one block. The engine will clean up those duplicate blocks and merge them eventually.

We need to get some more visibility in the inspect tool so we can get more stats on tsm1 data files.

benbjohnson on 4 Apr 2016

@benbjohnson thanks for the clarification, that makes more sense than my mistaken understanding. I assume the deduplication is done during compactions, so queries farther from now() are less likely to suffer this issue?

beckettsean on 5 Apr 2016

@beckettsean Yes, older data will be deduplicated but recent data will not be. I'm not exactly sure what the compaction schedule looks like with regard to deduplication. It might be worth adding a small buffer in front of tsm to account for jitter.

benbjohnson on 5 Apr 2016

Was this page helpful?

0 / 5 - 0 ratings