Influxdb: [0.13] Higher memory usage and system load after upgrade from 0.12.2

Created on 12 May 2016  路  15Comments  路  Source: influxdata/influxdb

I updated from InfluxDB 0.12.2 to 0.13 and memory usage and load have increased substantially...

load

memory

Nothing else seems out of the ordinary, no errors in logs etc. What information can I get you guys to see what is going on with 0.13?

Most helpful comment

@cnelissen You can delete those 0 length tombstone files. There was a bug fixed to prevent writing those.

All 15 comments

Can you grab a heap dump and profile?

curl -o heap.txt http://localhost:8086/debug/pprof/heap?debug=1
curl -o goroutine.txt http://localhost:8086/debug/pprof/goroutine?debug=1

Here you go:

goroutine.txt
heap.txt

Also I read back through some logs a little more thoroughly and noticed a bunch of lines like the following when the system starts up:

[store] 2016/05/12 21:25:14 Failed to open shard: 9064: [shard 9064] error opening memory map for file /var/lib/influxdb/data/mydb/rp-30d/9064/000000134-000000007.tsm: init: read tombstones: EOF

@cnelissen What is the size of /var/lib/influxdb/data/mydb/rp-30d/9064/000000134-000000007.tsm.tombstone?

# ls -la /var/lib/influxdb/data/mydb/rp-30d/9064/000000134-000000007*
-rw-------.  1 influxdb influxdb         0 May 10 21:37 000000134-000000007.tombstone
-rw-r--r--.  1 influxdb influxdb 323772580 May  9 21:20 000000134-000000007.tsm

@cnelissen You can delete those 0 length tombstone files. There was a bug fixed to prevent writing those.

Are you writing to the same series frequently or overwriting points by chance?

Writes come in approximately every 60 seconds, roughly 50,000 points per interval, plus maybe ~15000 points coming from Kapacitor for aggregations, so not a very high write load. Points are not being overwritten.

I downgraded back to 0.12.2 and restarted, no other changes, and memory and CPU are back down to normal levels.

load

memory

+1
We're having the same issues, especially with simple SELECT queries

For example this simple query

> select * from senders where time > now() - 1d limit 10
name: senders
-------------
time                    customer        from_domain_uniq        from_uniq       hostname        mail_blocked    rcpt_domain_uniq        rcpt_uniq       total   user
2016-05-15T10:20:00Z    Debian-exim     1                       1               angara1         0               1                       1               14      Debian-exim
2016-05-15T10:20:00Z    Debian-exim     1                       1               angara6         0               1                       1               1       Debian-exim
2016-05-15T10:20:00Z    Debian-exim     1                       1               argo            0               1                       6               17      Debian-exim
2016-05-15T10:20:00Z    Debian-exim     1                       1               blake           0               2                       3               9       Debian-exim
2016-05-15T10:20:00Z    Debian-exim     1                       1               klipper         0               1                       6               83      Debian-exim
2016-05-15T10:20:00Z    dmitriqu        1                       1               doom2           0               1                       1               10      dmitriqu
2016-05-15T10:20:00Z    syavorl1        1                       1               bane            0               1                       1               10      syavorl1
2016-05-15T10:20:00Z    sxmedi93        1                       1               rembo           0               1                       1               2       sxmedi93
2016-05-15T10:20:00Z    dmitril0        1                       1               doom1           0               1                       1               20      dmitril0
2016-05-15T10:20:00Z    avidambi        1                       1               robin           0               1                       1               1       avidambi__avida_ru__3i

> 

increases memory usage for influxdb server from 2Gb up to 13Gb

this influxdb server was built manually with go1.4.3 version

> show diagnostics
name: build
-----------
Branch  Build Time      Commit                                          Version
0.13                    e57fb88a051ee40fd9277094345fbd47bb4783ce        0.13.0


name: network
-------------
hostname
logstorage.beget.ru


name: runtime
-------------
GOARCH  GOMAXPROCS      GOOS    version
amd64   12              linux   go1.4.3


name: system
------------
PID     currentTime                     started                         uptime
120     2016-05-16T10:12:37.831934558Z  2016-05-15T22:03:01.435989947Z  12h9m36.395944872s

> 

but when I use stock influxd from official deb package - I have the same problems
The same queries start using a lot of memory, sometimes leading to OOMs

Similar problem here.

RHEL 7.2, 64GB Memory.
~1400 Telegraf clients with sysstat plugin configured.

With version 0.12.2 we have 20% mem_commit.

After upgrading to 0.13 I killed the influxdb after 10000% mem_commit. The memory was still increasing. Downgraded to 0.12.2.

Hi. In our case, on the other hand, the memory consumption dropped significantly after upgrade - from tens of gigabytes down to single GBs. But now most of the data is inaccessible - probably because of errors like:

[store] ... Failed to open shard ... error opening memory map for file ...

UPDATE:
Removing empty tombstones brought the missing data back, but then the memory consumption skyrocketed back to tens of GBs which eventually leads to a crash and Influxdb restarts. I put a 30 GB limit on the service and it's always reached, though there is hardly a GB of data in the database!

Also experiencing this issue, as in after removing all empty tombstones mem usage is through the roof, is there any update?

I experienced similar grow of CPU usage after upgrade to 1.0.0 but it went down again by ~20% after upgrade to 1.0.2. Now it's around the same level as before upgrade to 1.0.0.

This should be improved with 1.1.0.

To whom it may concern:

I've done an upgrade to the latest 1.1 version and I can't see any CPU usage difference between 0.11 and 1.1.

So I can recommend trying to install new version to see if it works fine in your environment.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

srfraser picture srfraser  路  90Comments

corylanou picture corylanou  路  42Comments

dmke picture dmke  路  45Comments

beckettsean picture beckettsean  路  44Comments

TechniclabErdmann picture TechniclabErdmann  路  80Comments