Influxdb: High memory usage

Created on 25 Jan 2016 · 38Comments · Source: influxdata/influxdb

Hi,
I install Influxdb on ubuntu server with 4G of memory and use python's requests module to write 10M points into db.
The python script inserts 20k points per second into db successfully but ubuntu's memory goes high until Influxdb use 97% of Ram.
After that I can't query a simple request like "select * from srcipm limit 1".
Even after write process is finished, Influxdb doesn't release memory.

Details:

Server: 64bit Ubuntu Server 15.10, 1 Core CPU, 4G RAM
InfluxDB Version 0.9.6

Write query in python:

requests.post("http://192.168.1.104:8086/write?db=mydb", "srcipm,device_id=4,input_snmp=2,output_snmp=3,direction=1,ip=192.168.1.1 bytes=25478")

Where is the problem? should I config something or my query is wrong?

Source

tahmasebi

Most helpful comment

@carbolymer It looks like you have sparse data (stock prices) which ends up creating hundreds of small shards. In your docker sample, I'd recommend increasing the shard group duration on your default retention policy after creating the databases.

For example, running the following _before_ writing data:

alter retention policy default on no_memory shard group duration 520w

will change the shard group duration to 10y which should reduce the number of shards from ~1500 to 4.

I would also suggest setting cache-snapshot-write-cold-duration = "10s". You should not need to change compact-full-write-cold-duration or cache-snapshot-memory-size from the default values though.

jwilder on 18 Jul 2016

👍6

All 38 comments

@tahmasebi what do you mean by I can't query a simple request like "select * from srcipm limit 1".? Does the query not return? Does the instance OOM?

desa on 25 Jan 2016

@mjdesa, The query execution takes a lot of time, one time, it didn't return anything for 30 minutes, then i cancel the query by ctr+c. even after restarting Influxdb service manually (because of memory usage), the query returns result very slow and memory is full again.

tahmasebi on 25 Jan 2016

How many cores does the machine have?

desa on 25 Jan 2016

1 Core, for 2 or 3 seconds cpu usage goes 100% and then return to under 5% when i request a query. the problem I can see in htop is memory usage of influxdb by 97-8%.

tahmasebi on 25 Jan 2016

How many unique series are you writing to the database?

desa on 25 Jan 2016

How should I get number of series? I'm new in Influxdb.
I don't have access to machine now, I should check it tomorrow at company. I'm sorry.

tahmasebi on 25 Jan 2016

No need to be sorry. :)

The query show series will give you all of the series the are stored in a database.

desa on 25 Jan 2016

@mjdesa , I use show series to find number of series in my db but it displays a list of all series. so I used python to count number of series in my db like this:

SHOW SERIES FROM srcipm
len(result['results'][0]['series'][0]['values'])

and I get 153K of series. Is it too many series for 10M points?

tahmasebi on 26 Jan 2016

@tahmasebi No, you should be well within your limits. We've had problems with 0.9.6 in the past consuming too much memory. Can you run the test against 0.10.0beta2 and see if Influx behave more stably

https://influxdb.s3.amazonaws.com/influxdb_0.10.0-0.beta2_amd64.deb
https://influxdb.s3.amazonaws.com/influxdb-0.10.0-0.beta2.x86_64.rpm

desa on 26 Jan 2016

Sure, I'll test this version of Influxdb and I'll inform you tomorrow. (It's time difference :) )
Thank you.

tahmasebi on 26 Jan 2016

Hi @mjdesa ,
I removed influxdb from ubuntu by apt-get remove influxdb and then install that package you gave me above.

Database from previous version exist yet, but when I execute a query on old db, it returns an error ERR: read message type: read tcp 127.0.0.1:8088: i/o timeout.
So I create a new db and write 10M point to it. the write process was very slow than before (6k/s) and this time the memory goes up to 74%.

I take a screenshot of htop and iotop for more details.

capture

tahmasebi on 27 Jan 2016

Try reduce cache-snapshot-memory-size and cache-snapshot-write-cold-duration in the configuration.
I set mine cache-snapshot-memory-size = 2621440 and cache-snapshot-write-cold-duration = "1m" during massive writes.

lpc921 on 25 Feb 2016

👍3 👎1

I resolved ERR: read message type: read tcp 127.0.0.1:8088: i/o timeout. by increasing shard-mapper-timeout under [cluster] section, FYI

fffw on 4 Mar 2016

@tahmasebi Are you still experiencing this problem?

desa on 5 Mar 2016

I am brand new to influxdb and ran into this problem.

I started by inserting 6928 price points for one stock. Ex of data points is such:

INSERT price,ric=VOD.L,open=12.3,close=12.3,volume=1200 dummy=1 12345644687

Then i did a query like this :

SELECT ric, close, volume FROM price WHERE ric = 'VOD.L' LIMIT 1;

This killed my laptop by consuming all 16Gb of RAM.

adilbaig on 12 Apr 2016

I'm making a similar stock system like @adilbaig, and counting just the same problem with InfluxDB 0.13.0.

When I'm tried to insert about 4k trading points all at once using batch insert via http api, InfluxDB just took all of my 8G memory and all swap memory, which made the whole system deadly slow.

Is there a way to limit the total memory usage?

My OS is Ubuntu 16.04 LTS

dy1901 on 12 Jun 2016

@adilbaig In the example point you listed

price,ric=VOD.L,open=12.3,close=12.3,volume=1200 dummy=1 12345644687

ric, open, close, and volume are all tags. Meaning that both key and value will be treated as strings. Additionally they will be index. And presumably will have an absurdly high cardinality. This is most likely why you're seeing so much memory used. Try changing open, close, and volume to be fields instead.

desa on 13 Jun 2016

@dy1901 What does your schema look like?

desa on 13 Jun 2016

what is the result now about the question? , if someone know that,please tell me, thank you

DavidSoong128 on 20 Jun 2016

@DavidSoong128 what specifically would you like to know?

desa on 20 Jun 2016

@mjdesa Thank you for your reply.
i install influxdb V0.13.0 on linux server with 16G of memory and use java's requests to write some points into db. but memory is continuously reduced, even consumed 11G，then , the only operation what i can do is just restart influxdb， i think it is the same problem @dy1901

DavidSoong128 on 21 Jun 2016

@DavidSoong128
I have a few more questions:

Can you give some examples of the data that you're writing in line protocol?
About how long was InfluxDB running before you hit this problem?
Did you have any queries running when you hit this problem? If so, what were they?

desa on 21 Jun 2016

first question: 

   point like this :  tps is the measurement, appName, serviceName are tags,  tpsAmount, currency are fields .  
   example:  tps, order, addOrder   2000, 300  1466557284000

second question:

  the memory  will be reduced once i insert  data to db, more data，consume more memory

third question:

  there have no any queries, i just insert  some datas for a test

@Test(enabled = true)
    public void maxWritePointsPerformance() {
        String dbName = "d";
        this.influxDB.createDatabase(dbName);
        this.influxDB.enableBatch(100000, 60, TimeUnit.SECONDS);

        Stopwatch watch = Stopwatch.createStarted();
        for (int i = 0; i < 2000000; i++) {
            Point point = Point.measurement("s").addField("v", 1.0).build();
            this.influxDB.write(dbName, "default", point);
        }
        System.out.println("5Mio points:" + watch);
        this.influxDB.deleteDatabase(dbName);
    }

this test code is copied from https://github.com/influxdata/influxdb-java, and will
cause the same problem

DavidSoong128 on 22 Jun 2016

I'm having a similar problem with memory usage growing to consuming the entire machine and never really dropping back. There are three main data streams of incoming data using the InfluxDB Go client to different databases:

50 points/sec in batches of 1000
Telegraf data at 10s intervals for system metrics of a few hosts
84 points/sec in batches of 10000

Occasionally, a Grafana client will pull 5-20 queries to draw charts but it isn't a constant request rate. We're recently starting to consume considerable swap space. The node has 128GB (usually between 80-98% of memory used) and has been up for 30d.

nickjones on 15 Jul 2016

I'm having the exact issues as everyone here. During inserts my memory usage jumps over 8G and then influx throws error about memory allocation failure (my VM is limited to ~8GB RAM). I've tried settings proposed by @lpc921 in his post here:

  cache-snapshot-memory-size = 262144
  cache-snapshot-write-cold-duration = "0h1m0s"
  compact-full-write-cold-duration = "0h1m0s"

It didn't change a thing.
I am using your official docker image: influxdb:0.13-alpine

Guys, seriously. This issue has been open for half a year. For me this is a critical issue which rules out usage of influx on a production environments.

EDIT:
I've prepared docker containers which demonstrate this bug in influxdb 0.13. You can find them here: https://github.com/carbolymer/influxdb-large-memory-proof
EDIT2:
The same happens for the 1.0.0-beta2-alpine image.
EDIT3:
Many thanks to @jwilder for the advice! The latest commit on master: https://github.com/carbolymer/influxdb-large-memory-proof contains working solution.

carbolymer on 17 Jul 2016

👍2

For example, running the following _before_ writing data:

alter retention policy default on no_memory shard group duration 520w

will change the shard group duration to 10y which should reduce the number of shards from ~1500 to 4.

jwilder on 18 Jul 2016

👍6

@carbolymer My server has 48 GB of RAM and 24 high-end CPUs, but it's still not enough for InfluxDB (having 30 GB RAM limit) with just several tens of thousands of daily series (several GB of data). Some queries end after a timeout, other end because Influx reaches the memory limit and crashes. I'm beginning to deeply regret that choice... I have no idea what to do now.

adampl on 18 Jul 2016

@adampl What kind of data are your writing and what is writing it? I suspect you have an issue with your schema, but grabbing some profiles when memory is high would help to diagnose:

curl -o heap.txt "localhost:8086/debug/pprof/heap?debug=2"
curl -o goroutine.txt "localhost:8086/debug/pprof/goroutine?debug=2"
curl -o block.txt "localhost:8086/debug/pprof/block?debug=2"

Also, can you attach the output of the following:

influx -execute "show shards" > shards.txt
influx -execute "show stats" > stats.txt
influx -execute "show diagnostics" > diagnostics.txt

jwilder on 18 Jul 2016

@jwilder In my case the data is very sparse - just one point a day in each series - so I've dropped the entire database in order to try that trick with shard duration, and now the data is being loaded again. If the timeouts and crashes don't disappear, I'll provide you with the diagnostics.

adampl on 18 Jul 2016

@jwilder Increasing shard duration indeed helped (set to 1000w) - now I don't get OOMs as it takes "only" 5 GB and doesn't go up.

Still, requests covering all of the measurement's data take long to complete (15 seconds) - much longer than simply reading all of the measurement's rows from text file, filtering them by tags and aggregating in Python on a single process (2 seconds).

adampl on 19 Jul 2016

@adampl What version are you running?

jwilder on 19 Jul 2016

Version 0.13 on CentOS 7

adampl on 20 Jul 2016

@adampl I'd suggest upgrading to the 1.0beta3 release or latest nightly. There are many query optimization since 0.13.

jwilder on 20 Jul 2016

Ok, I'll give it a try. Meanwhile, please look into #6994 which is a very serious functional bug IMHO.

adampl on 20 Jul 2016

@jwilder Many thanks! It helped.

carbolymer on 23 Jul 2016

@carbolymer Have you solved the problem？ please tell me some solutions, thank you

DavidSoong128 on 4 Aug 2016

@DavidSoong128, yes. This worked for me: https://github.com/influxdata/influxdb/issues/5440#issuecomment-233230997

You can find working configuration in the latest commit on master on https://github.com/carbolymer/influxdb-large-memory-proof

carbolymer on 4 Aug 2016

@carbolymer ok, thank you for your reply, i will do some tests。 If there is any question, ask again

DavidSoong128 on 5 Aug 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

port 8086 8083 can not start!

Witee · 3Comments

Download influxdb-0.11.0-1 ?

deepujain · 3Comments

Please do not delete old versions from package repositories

davidgubler · 3Comments

Authorization failed everytime with: no user provided error

ricco24 · 3Comments

Error upgrading to InfluxDB version 1.4.0 and 1.4.1 from 1.3.7

binary0111 · 3Comments