Influxdb: failed to store statistics: timeout 1.2.0

Created on 21 Feb 2017  路  21Comments  路  Source: influxdata/influxdb

Hi guys,
I saw there are some tickets with this error but without a solution. Because this still happens on influx 1.2.0 I create another one.

__System info:__ [Include InfluxDB version, operating system name, and other relevant details]

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
$:/var/lib/influxdb/data$ uname -a
Linux influx 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 9 10:09:10 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

$ influxd version
InfluxDB v1.2.0 (git: master b7bb7e8359642b6e071735b50ae41f5eb343fd42)

32GB ram
50GB ssd for /var/lib/influxdb
4 cores
  1. systemctl start influxd

  2. Wait some minutes

__Actual behavior:__
2017-02-21T16:37:50Z failed to store statistics: timeout service=monitor

Querys and writes works correctly, although some writes timeouts.

Thank you,
Claudiu

1.x wontfix

Most helpful comment

+1 keep getting this as well

All 21 comments

Any information you could need feel free to ask.

We need more information in order to diagnose what is going on. Can you update the issue description with the instructions listed in our issue template? Profile data when the timeouts occur would be useful.

Ok, I have just installed on new vm influxdb and this is the info:

__System info:__

$ uname -a
Linux influx 3.10.0-327.10.1.el7.x86_64 #1 SMP Sat Jan 23 04:54:55 EST 2016 x86_64 x86_64 x86_64 GNU/Linu

$ cat /etc/*release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.2 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.2"
PRETTY_NAME="Red Hat Enterprise Linux"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.2:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.2
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.2"
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Red Hat Enterprise Linux Server release 7.2 (Maipo)

$ influxd version
InfluxDB v1.2.0 (git: master b7bb7e8359642b6e071735b50ae41f5eb343fd42)

$ df -h  #this is a ssd disk
/dev/mapper/vg_influx-influx_data   49G   63M   46G   1% /var/lib/influxdb

__Steps to reproduce:__

  1. wget https://dl.influxdata.com/influxdb/releases/influxdb-1.2.0.x86_64.rpm
  2. sudo yum localinstall influxdb-1.2.0.x86_64.rpm
  3. sudo systemctl start influxdb

__Expected behavior:__
service=monitor shouldn't timeout

__Actual behavior:__

 2017-02-21T19:58:48Z retention policy shard deletion check commencing service=retention
feb 21 21:03:10 influx[1078]: [I] 2017-02-21T20:03:10Z failed to store statistics: timeout service=monitor
feb 21 21:03:20 influx[1078]: [I] 2017-02-21T20:03:20Z failed to store statistics: timeout service=monitor

__Additional info:__
There is no custom configuration, just install and run. I'm not writing or reading anything from influxdb.
logs: https://gist.github.com/claubrz/02b973e8d4c6ab198d5689a09ff8943d
block: https://gist.github.com/claubrz/1411ca3c371f6e24cb2a64d6cf05a691
goroutine: https://gist.github.com/claubrz/a737d399596c161ef6356bbe188cc538
heap: https://gist.github.com/claubrz/338153a34a9288336c3c9484ce8607e4
vars: https://gist.github.com/claubrz/5a5f0420c3d9e4bdc442a1bb7fa1b283
iostat: https://gist.github.com/claubrz/9668562d7fdc1a277d159c4d9962599c
shards:
name: _internal
id database retention_policy shard_group start_time end_time expiry_time owners
-- -------- ---------------- ----------- ---------- -------- ----------- ------
1 _internal monitor 1 2017-02-21T00:00:00Z 2017-02-22T00:00:00Z 2017-03-01T00:00:00Z
2 _internal monitor 2 2017-02-22T00:00:00Z 2017-02-23T00:00:00Z 2017-03-02T00:00:00Z

stgats: https://gist.github.com/claubrz/6cb969b66a32955614a68ab1091fce62
diagonostics: https://gist.github.com/claubrz/eded534953d006754dd989c4052a76c0

I hope this helps.

Regards,
Claudiu

+1 keep getting this as well

+1 same for 1.2.2

I'm also getting this error occasionally, and after a few days, no more data is written to InfluxDB. If I restart InfluxDB it works again for a few days and then stops receiving data until I restart it again.
Edit: I'm using InfluxDB 1.3.0. I downgraded to to 1.2.4 to see if it still happens, will report back in a few days

+1 same for 1.2.4

I'm also seeing this since 1.3.1, I think:

Aug 21 18:07:00 ubik influxd[2669]: [I] 2017-08-21T15:07:00Z failed to store statistics: timeout service=monitor
Aug 21 18:07:10 ubik influxd[2669]: [I] 2017-08-21T15:07:10Z failed to store statistics: timeout service=monitor
Aug 21 18:07:20 ubik influxd[2669]: [I] 2017-08-21T15:07:20Z failed to store statistics: timeout service=monitor
Aug 21 18:07:30 ubik influxd[2669]: [I] 2017-08-21T15:07:30Z failed to store statistics: timeout service=monitor
Aug 21 18:07:40 ubik influxd[2669]: [I] 2017-08-21T15:07:40Z failed to store statistics: timeout service=monitor
Aug 21 18:07:50 ubik influxd[2669]: [I] 2017-08-21T15:07:50Z failed to store statistics: timeout service=monitor
Aug 21 18:08:00 ubik influxd[2669]: [I] 2017-08-21T15:08:00Z failed to store statistics: timeout service=monitor
Aug 21 18:08:10 ubik influxd[2669]: [I] 2017-08-21T15:08:10Z failed to store statistics: timeout service=monitor
Aug 21 18:08:20 ubik influxd[2669]: [I] 2017-08-21T15:08:20Z failed to store statistics: timeout service=monitor
Aug 21 18:08:30 ubik influxd[2669]: [I] 2017-08-21T15:08:30Z failed to store statistics: timeout service=monitor
Aug 21 18:08:40 ubik influxd[2669]: [I] 2017-08-21T15:08:40Z failed to store statistics: timeout service=monitor
Aug 21 18:08:50 ubik influxd[2669]: [I] 2017-08-21T15:08:50Z failed to store statistics: timeout service=monitor
Aug 21 18:09:00 ubik influxd[2669]: [I] 2017-08-21T15:09:00Z failed to store statistics: timeout service=monitor
Aug 21 18:09:10 ubik influxd[2669]: [I] 2017-08-21T15:09:10Z failed to store statistics: timeout service=monitor
Aug 21 18:09:20 ubik influxd[2669]: [I] 2017-08-21T15:09:20Z failed to store statistics: timeout service=monitor
Aug 21 18:09:30 ubik influxd[2669]: [I] 2017-08-21T15:09:30Z failed to store statistics: timeout service=monitor
Aug 21 18:09:50 ubik influxd[2669]: [I] 2017-08-21T15:09:50Z failed to store statistics: timeout service=monitor
Aug 21 18:10:00 ubik influxd[2669]: [I] 2017-08-21T15:10:00Z failed to store statistics: timeout service=monitor
Aug 21 18:10:10 ubik influxd[2669]: [I] 2017-08-21T15:10:10Z failed to store statistics: timeout service=monitor

Disabling monitor.store-enabled makes the issue go away.

Same here. All writes end up in a timeout after a few hours of operation. No workaround seems to help. Restarting helps but then it gets stuck again.

> use some_database
> insert some_metric,whatever=foo,uuid=d1bffb9339bf value=20.5 1503395025755619588
ERR: {"error":"timeout"}
  • A single host
  • Using tsm1
  • InfluxDB version: 1.3.0
  • Linux 4.12.4-1-ARCH
  • ~ 20 writes / minute, 1-3 connections
  • Memory usage seems really high for the use case, although it doesn't hit OOM.
  • CPU usage is flat
  • Example load average: 0.73, 1.45, 0.90
             total       used       free     shared    buffers     cached
Mem:          7.7G       6.6G       1.0G       2.1M       1.6G       3.0G
-/+ buffers/cache:       2.0G       5.6G
Swap:         5.8G         0B       5.8G

SHOW STATS

> show diagnostics
name: build
Branch Build Time Commit                                   Version
------ ---------- ------                                   -------
master            76124df5c121e411e99807b9473a03eb785cd43b 1.3.0

name: config
bind-address   reporting-disabled
------------   ------------------
127.0.0.1:8088 false

name: config-coordinator
log-queries-after max-concurrent-queries max-select-buckets max-select-point max-select-series query-timeout write-timeout
----------------- ---------------------- ------------------ ---------------- ----------------- ------------- -------------
0s                0                      0                  0                0                 0s            10s

name: config-cqs
enabled run-interval
------- ------------
true    1s

name: config-data
cache-max-memory-size cache-snapshot-memory-size cache-snapshot-write-cold-duration compact-full-write-cold-duration dir                    max-concurrent-compactions max-series-per-database max-values-per-tag wal-dir               wal-fsync-delay
--------------------- -------------------------- ---------------------------------- -------------------------------- ---                    -------------------------- ----------------------- ------------------ -------               ---------------
1073741824            26214400                   10m0s                              4h0m0s                           /var/lib/influxdb/data 0                          1000000                 100000             /var/lib/influxdb/wal 0s

name: config-graphite
enabled bind-address protocol database retention-policy batch-size batch-pending batch-timeout
------- ------------ -------- -------- ---------------- ---------- ------------- -------------
true    :2003        tcp      graphite                  5000       10            1s

name: config-httpd
bind-address enabled https-enabled max-connection-limit max-row-limit
------------ ------- ------------- -------------------- -------------
:8086        true    false         0                    0

name: config-meta
dir
---
/var/lib/influxdb/meta

name: config-monitor
store-database store-enabled store-interval
-------------- ------------- --------------
_internal      true          10s

name: config-precreator
advance-period check-interval enabled
-------------- -------------- -------
30m0s          10m0s          true

name: config-retention
check-interval enabled
-------------- -------
30m0s          true

name: config-subscriber
enabled http-timeout write-buffer-size write-concurrency
------- ------------ ----------------- -----------------
true    30s          1000              40

name: graphite:tcp::2003
local remote connect time
----- ------ ------------

name: network
hostname
--------
87e69fc37624

name: runtime
GOARCH GOMAXPROCS GOOS  version
------ ---------- ----  -------
amd64  4          linux go1.8.3

name: system
PID currentTime                    started                        uptime
--- -----------                    -------                        ------
1   2017-08-22T21:51:21.200698369Z 2017-08-22T11:46:38.093657177Z 10h4m43.107041749s
# cat /etc/influxdb/influxdb.conf
[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  engine = "tsm1"
  wal-dir = "/var/lib/influxdb/wal"

# env | grep INFLUX
INFLUXDB_GRAPHITE_ENABLED=true
INFLUXDB_ADMIN_ENABLED=true
INFLUXDB_VERSION=1.3.0

I get this for Influx 1.3.7 as well.
~
November 15th 2017, 22:32:18.000 ERROR - failed to store statistics: timeout service=monitor - influx-log
November 15th 2017, 22:06:12.000 ERROR - failed to store statistics: timeout service=monitor - influx-log
November 15th 2017, 21:44:15.000 ERROR - failed to store statistics: timeout service=monitor - influx-log
November 15th 2017, 21:24:17.000 ERROR - failed to store statistics: timeout service=monitor - influx-log
November 15th 2017, 20:56:17.000 ERROR - failed to store statistics: timeout service=monitor - influx-log
November 15th 2017, 18:35:17.000 ERROR - failed to store statistics: timeout service=monitor - influx-log
November 15th 2017, 17:23:13.000 ERROR - failed to store statistics: timeout service=monitor - influx-log
~

I get this also sometimes (every 2-3 days ) at midnight

[I] 2017-12-06T00:00:10Z failed to store statistics: timeout service=monitor 
[httpd] 172.17.0.1 - root [06/Dec/2017:00:00:57 +0000] "POST /write?db=smarthome&p=%5BREDACTED%5D&precision=n&rp=&u=root HTTP/1.1" 500 20 "-" "-" 893c8ee6-da18-11e7-b64b-000000000000 10014690
[E] 2017-12-06T00:01:07Z [500] - "timeout" service=httpd | stderr
...

I am also getting this but very infrequently (and without a pattern).
The last logs April 26th were with influxdb version: 1.5.2-1

Mar 31 10:51:33 hostname influxd[1271]: ts=2018-03-31T09:51:33.073975Z lvl=info msg="failed to store statistics" log_id=077M7CBl000 service=monitor error=timeout
Mar 31 20:51:40 hostname influxd[1271]: ts=2018-03-31T19:51:40.213325Z lvl=info msg="failed to store statistics" log_id=077M7CBl000 service=monitor error=timeout
Apr 07 07:02:40 hostname influxd[3197]: ts=2018-04-07T06:02:40.101796Z lvl=info msg="failed to store statistics" log_id=07Egike0000 service=monitor error=timeout
Apr 26 14:42:10 hostname influxd[1258]: ts=2018-04-26T13:42:10.467667Z lvl=info msg="failed to store statistics" log_id=07hJ4RHG000 service=monitor error=timeout
Apr 26 17:30:10 hostname influxd[1258]: ts=2018-04-26T16:30:10.575370Z lvl=info msg="failed to store statistics" log_id=07hJ4RHG000 service=monitor error=timeout

+1.
I happened to get this on version: 1.6.0
One thing I noticed was that, I was running another process which involved writing to disk (postgres writes) when influxdb started throwing these errors. Not sure if that has any relation to this, but just throwing it out there in case someone else also has similar correlation.

I have the same problem with influx. I use a raspi 3 with the latest influx InfluxDB v1.7.2
=> the problem started after the update to the latest version in December, the last previous update I did around in July (maybe a bit before).
=> I had also a power outage in November, but the system worked ok after it.

When the issues started, I saw that some of the files at /var/lib/influxdb/ where at once belonging to root and influx could not write to it. As consequence, influx started to make the systen in-responsive and used >90% of CPU (usually it is at 0.3%)
=> I did a chown of /var/lin/influxdb dir, but had an in-responsive system within 1h
=> I now rebooted the system and hope that the error is gone (hope dies last)

This is the last message before influx crashed:

Dez 26 07:00:22 OpenHabCat influxd[606]: ts=2018-12-26T06:00:20.087443Z lvl=error msg="[500] - \"timeout\"" log_id=0Caz0HYW000 service=httpd
Dez 26 07:00:32 OpenHabCat influxd[606]: ts=2018-12-26T06:00:30.151848Z lvl=info msg="failed to store statistics" log_id=0Caz0HYW000 service=monitor error=timeout

@cbarzu what is your settings for wal-fsync-delay in influxdb.conf ?

+1
I have the same issue on version 1.7.5 (docker latest version)

2019-04-04T03:24:00.958588Z error [500] - "timeout" {"log_id": "0Ea_2oP0000", "service": "httpd"} [httpd] 172.17.0.1 - admin [04/Apr/2019:03:24:00 +0000] "POST /write?db=waf_log&rp=autogen&precision=n&consistency=one HTTP/1.1" 500 20 "-" "okhttp/3.11.0" 171a2d77-5689-11e9-83dd-0242ac110002 10000297 2019-04-04T03:24:10.961376Z error [500] - "timeout" {"log_id": "0Ea_2oP0000", "service": "httpd"} [httpd] 172.17.0.1 - admin [04/Apr/2019:03:24:10 +0000] "POST /write?db=waf_log&rp=autogen&precision=n&consistency=one HTTP/1.1" 500 20 "-" "okhttp/3.11.0" 1d10694b-5689-11e9-83de-0242ac110002 10000663

@dracula92107 1.7.5 is broken, either use 1.7.4 or wait for 1.7.6, see #13010.

Thank @conet ,
Let me try it.

the fix for #13010 is in the 1.7 branch if you are building from source and our plan is to have a 1.7.6 tagged and built next week.

Also useful to review the best practices related to monitoring Influx itself. http://docs.influxdata.com/platform/monitoring/influxdata-platform

It was noted earlier in the thread that turning off monitor.store-enabled in the config addressed some of the issues prior to 1.7 where timeout errors were being thrown. Turning this off eliminates some resource contention, but eliminates the ability for you to gather stats within the database itself. But, if you are working on a constrained environment/resources to begin with this turning this off will help.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

davidgubler picture davidgubler  路  3Comments

ricco24 picture ricco24  路  3Comments

allenbunny picture allenbunny  路  3Comments

robinjha picture robinjha  路  3Comments

deepujain picture deepujain  路  3Comments