Influxdb: High memory usage

Created on 25 Jan 2016  Â·  38Comments  Â·  Source: influxdata/influxdb

Hi,
I install Influxdb on ubuntu server with 4G of memory and use python's requests module to write 10M points into db.
The python script inserts 20k points per second into db successfully but ubuntu's memory goes high until Influxdb use 97% of Ram.
After that I can't query a simple request like "select * from srcipm limit 1".
Even after write process is finished, Influxdb doesn't release memory.

Details:

  • Server: 64bit Ubuntu Server 15.10, 1 Core CPU, 4G RAM
  • InfluxDB Version 0.9.6

Write query in python:

requests.post("http://192.168.1.104:8086/write?db=mydb", "srcipm,device_id=4,input_snmp=2,output_snmp=3,direction=1,ip=192.168.1.1 bytes=25478")

Where is the problem? should I config something or my query is wrong?

Most helpful comment

@carbolymer It looks like you have sparse data (stock prices) which ends up creating hundreds of small shards. In your docker sample, I'd recommend increasing the shard group duration on your default retention policy after creating the databases.

For example, running the following _before_ writing data:

alter retention policy default on no_memory shard group duration 520w

will change the shard group duration to 10y which should reduce the number of shards from ~1500 to 4.

I would also suggest setting cache-snapshot-write-cold-duration = "10s". You should not need to change compact-full-write-cold-duration or cache-snapshot-memory-size from the default values though.

All 38 comments

@tahmasebi what do you mean by I can't query a simple request like "select * from srcipm limit 1".? Does the query not return? Does the instance OOM?

@mjdesa, The query execution takes a lot of time, one time, it didn't return anything for 30 minutes, then i cancel the query by ctr+c. even after restarting Influxdb service manually (because of memory usage), the query returns result very slow and memory is full again.

How many cores does the machine have?

1 Core, for 2 or 3 seconds cpu usage goes 100% and then return to under 5% when i request a query. the problem I can see in htop is memory usage of influxdb by 97-8%.

How many unique series are you writing to the database?

How should I get number of series? I'm new in Influxdb.
I don't have access to machine now, I should check it tomorrow at company. I'm sorry.

No need to be sorry. :)

The query show series will give you all of the series the are stored in a database.

@mjdesa , I use show series to find number of series in my db but it displays a list of all series. so I used python to count number of series in my db like this:

SHOW SERIES FROM srcipm
len(result['results'][0]['series'][0]['values'])

and I get 153K of series. Is it too many series for 10M points?

@tahmasebi No, you should be well within your limits. We've had problems with 0.9.6 in the past consuming too much memory. Can you run the test against 0.10.0beta2 and see if Influx behave more stably

https://influxdb.s3.amazonaws.com/influxdb_0.10.0-0.beta2_amd64.deb
https://influxdb.s3.amazonaws.com/influxdb-0.10.0-0.beta2.x86_64.rpm

Sure, I'll test this version of Influxdb and I'll inform you tomorrow. (It's time difference :) )
Thank you.

Hi @mjdesa ,
I removed influxdb from ubuntu by apt-get remove influxdb and then install that package you gave me above.

Database from previous version exist yet, but when I execute a query on old db, it returns an error ERR: read message type: read tcp 127.0.0.1:8088: i/o timeout.
So I create a new db and write 10M point to it. the write process was very slow than before (6k/s) and this time the memory goes up to 74%.

I take a screenshot of htop and iotop for more details.

capture

Try reduce cache-snapshot-memory-size and cache-snapshot-write-cold-duration in the configuration.
I set mine cache-snapshot-memory-size = 2621440 and cache-snapshot-write-cold-duration = "1m" during massive writes.

I resolved ERR: read message type: read tcp 127.0.0.1:8088: i/o timeout. by increasing shard-mapper-timeout under [cluster] section, FYI

@tahmasebi Are you still experiencing this problem?

I am brand new to influxdb and ran into this problem.

I started by inserting 6928 price points for one stock. Ex of data points is such:

INSERT price,ric=VOD.L,open=12.3,close=12.3,volume=1200 dummy=1 12345644687

Then i did a query like this :

SELECT ric, close, volume FROM price WHERE ric = 'VOD.L' LIMIT 1;

This killed my laptop by consuming all 16Gb of RAM.

I'm making a similar stock system like @adilbaig, and counting just the same problem with InfluxDB 0.13.0.

When I'm tried to insert about 4k trading points all at once using batch insert via http api, InfluxDB just took all of my 8G memory and all swap memory, which made the whole system deadly slow.

Is there a way to limit the total memory usage?

My OS is Ubuntu 16.04 LTS

@adilbaig In the example point you listed

price,ric=VOD.L,open=12.3,close=12.3,volume=1200 dummy=1 12345644687

ric, open, close, and volume are all tags. Meaning that both key and value will be treated as strings. Additionally they will be index. And presumably will have an absurdly high cardinality. This is most likely why you're seeing so much memory used. Try changing open, close, and volume to be fields instead.

@dy1901 What does your schema look like?

what is the result now about the question? , if someone know that,please tell me, thank you

@DavidSoong128 what specifically would you like to know?

@mjdesa Thank you for your reply.
i install influxdb V0.13.0 on linux server with 16G of memory and use java's requests to write some points into db. but memory is continuously reduced, even consumed 11G,then , the only operation what i can do is just restart influxdb, i think it is the same problem @dy1901

@DavidSoong128
I have a few more questions:

  • Can you give some examples of the data that you're writing in line protocol?
  • About how long was InfluxDB running before you hit this problem?
  • Did you have any queries running when you hit this problem? If so, what were they?
  1. first question: 
    
       point like this :  tps is the measurement, appName, serviceName are tags,  tpsAmount, currency are fields .  
       example:  tps, order, addOrder   2000, 300  1466557284000 
    
  2. second question:
    
      the memory  will be reduced once i insert  data to db, more data,consume more memory
    
  3. third question:
    
      there have no any queries, i just insert  some datas for a test
    
@Test(enabled = true)
    public void maxWritePointsPerformance() {
        String dbName = "d";
        this.influxDB.createDatabase(dbName);
        this.influxDB.enableBatch(100000, 60, TimeUnit.SECONDS);

        Stopwatch watch = Stopwatch.createStarted();
        for (int i = 0; i < 2000000; i++) {
            Point point = Point.measurement("s").addField("v", 1.0).build();
            this.influxDB.write(dbName, "default", point);
        }
        System.out.println("5Mio points:" + watch);
        this.influxDB.deleteDatabase(dbName);
    }

this test code is copied from https://github.com/influxdata/influxdb-java, and will
cause the same problem

I'm having a similar problem with memory usage growing to consuming the entire machine and never really dropping back. There are three main data streams of incoming data using the InfluxDB Go client to different databases:

  • 50 points/sec in batches of 1000
  • Telegraf data at 10s intervals for system metrics of a few hosts
  • 84 points/sec in batches of 10000

Occasionally, a Grafana client will pull 5-20 queries to draw charts but it isn't a constant request rate. We're recently starting to consume considerable swap space. The node has 128GB (usually between 80-98% of memory used) and has been up for 30d.

I'm having the exact issues as everyone here. During inserts my memory usage jumps over 8G and then influx throws error about memory allocation failure (my VM is limited to ~8GB RAM). I've tried settings proposed by @lpc921 in his post here:

  cache-snapshot-memory-size = 262144
  cache-snapshot-write-cold-duration = "0h1m0s"
  compact-full-write-cold-duration = "0h1m0s"

It didn't change a thing.
I am using your official docker image: influxdb:0.13-alpine

Guys, seriously. This issue has been open for half a year. For me this is a critical issue which rules out usage of influx on a production environments.

EDIT:
I've prepared docker containers which demonstrate this bug in influxdb 0.13. You can find them here: https://github.com/carbolymer/influxdb-large-memory-proof
EDIT2:
The same happens for the 1.0.0-beta2-alpine image.
EDIT3:
Many thanks to @jwilder for the advice! The latest commit on master: https://github.com/carbolymer/influxdb-large-memory-proof contains working solution.

@carbolymer It looks like you have sparse data (stock prices) which ends up creating hundreds of small shards. In your docker sample, I'd recommend increasing the shard group duration on your default retention policy after creating the databases.

For example, running the following _before_ writing data:

alter retention policy default on no_memory shard group duration 520w

will change the shard group duration to 10y which should reduce the number of shards from ~1500 to 4.

I would also suggest setting cache-snapshot-write-cold-duration = "10s". You should not need to change compact-full-write-cold-duration or cache-snapshot-memory-size from the default values though.

@carbolymer My server has 48 GB of RAM and 24 high-end CPUs, but it's still not enough for InfluxDB (having 30 GB RAM limit) with just several tens of thousands of daily series (several GB of data). Some queries end after a timeout, other end because Influx reaches the memory limit and crashes. I'm beginning to deeply regret that choice... I have no idea what to do now.

@adampl What kind of data are your writing and what is writing it? I suspect you have an issue with your schema, but grabbing some profiles when memory is high would help to diagnose:

curl -o heap.txt "localhost:8086/debug/pprof/heap?debug=2"
curl -o goroutine.txt "localhost:8086/debug/pprof/goroutine?debug=2"
curl -o block.txt "localhost:8086/debug/pprof/block?debug=2"

Also, can you attach the output of the following:

influx -execute "show shards" > shards.txt
influx -execute "show stats" > stats.txt
influx -execute "show diagnostics" > diagnostics.txt

@jwilder In my case the data is very sparse - just one point a day in each series - so I've dropped the entire database in order to try that trick with shard duration, and now the data is being loaded again. If the timeouts and crashes don't disappear, I'll provide you with the diagnostics.

@jwilder Increasing shard duration indeed helped (set to 1000w) - now I don't get OOMs as it takes "only" 5 GB and doesn't go up.

Still, requests covering all of the measurement's data take long to complete (15 seconds) - much longer than simply reading all of the measurement's rows from text file, filtering them by tags and aggregating in Python on a single process (2 seconds).

@adampl What version are you running?

Version 0.13 on CentOS 7

@adampl I'd suggest upgrading to the 1.0beta3 release or latest nightly. There are many query optimization since 0.13.

Ok, I'll give it a try. Meanwhile, please look into #6994 which is a very serious functional bug IMHO.

@jwilder Many thanks! It helped.

@carbolymer Have you solved the problem? please tell me some solutions, thank you

@DavidSoong128, yes. This worked for me: https://github.com/influxdata/influxdb/issues/5440#issuecomment-233230997

You can find working configuration in the latest commit on master on https://github.com/carbolymer/influxdb-large-memory-proof

@carbolymer ok, thank you for your reply, i will do some tests。 If there is any question, ask again

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Witee picture Witee  Â·  3Comments

deepujain picture deepujain  Â·  3Comments

davidgubler picture davidgubler  Â·  3Comments

ricco24 picture ricco24  Â·  3Comments

binary0111 picture binary0111  Â·  3Comments