Influxdb: [Bug] "engine is closed" - reported by "response.Error()"

Created on 12 Jan 2018 · 8Comments · Source: influxdata/influxdb

Bug report

__System info:__

Ubuntu 17: uname -a: "Linux cncftest.io 4.10.0-42-generic #46-Ubuntu SMP Mon Dec 4 14:38:01 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux"
Influx 1.3.6: influx -version: "InfluxDB shell version: 1.3.6"

__Steps to reproduce:__

Happens very rare, under heavy load. Up to 48 threads (machine is 48 cores).

Piece of code:

  q := client.Query{
    Command:  query,
    Database: ctx.IDBDB,
  }
  response, err := con.Query(q)
  FatalOnError(err)
  FatalOnError(response.Error()) // <--- this line fails with "engine is closed"
  return response.Results

__Expected behavior:__ No error

__Actual behavior:__ error

__Additional info:__ --- (was not able to find a way to reproduce, happens once per few days).

arestorage revisit in the future

Source

lukaszgryglicki

Most helpful comment

Since this is marked as "revisit" and "more info", this might be helpful in the reproduction of the error or in cases this error occurs:
I was getting a lot of "engine is closed" errors during a scripted online restore process. In my case this happend because I ran a SELECT query immediately after the influxd restore command finished. So I gave InfluxDB a second (sleeping my programm execution) before running my SELECT statement and this fixed the "engine is closed" error for me.
My script works as follows:

restore a portable backup to a temporary database
select the data of temporary database into existing database
drop temporary database

The error would always occur at step 2. So I put a sleep second between step 1 and 2 - and the error was gone.

iwittkau on 19 Apr 2018

👍3

All 8 comments

Would be great to at least know what does it mean "engine is closed" - what causes that error?

lukaszgryglicki on 12 Jan 2018

Engine is closed usually means the system is just starting up or is shutting down. Can you check your logs to see if anything that looks like that may exist? Each shard in the system is considered its own "engine" and a large system can sometimes take awhile to open all of its shards.

jsternberg on 29 Jan 2018

I'll check next time when it happens (it didn't happen for 5+ days already).
I'm 100% sure nothing is shutting down or starting - unless Influxd does it without my knowledge sometimes?
Where should I look?

lukaszgryglicki on 29 Jan 2018

I would say the log file will give the best insight. Like if the server crashed for whatever reason and was starting up again that would be a good indicator. If the engine is closed happens a bunch and then just stops happening then that would be another sign. Since you say it hasn't happened for 5 days, I'm guessing that means it has been running stable for at least 5 days and that's why you haven't encountered it.

jsternberg on 29 Jan 2018

Yes, it is very rare and only under heavy load.

lukaszgryglicki on 29 Jan 2018

The engine is closed error occurs when writes or queries run against a shard that is not open/ready. If you are getting this during a query, it's likely that the planning step picked an old shard and before the query ran on the shard, the retention service closed and started removing the shard.

jwilder on 1 Feb 2018

restore a portable backup to a temporary database
select the data of temporary database into existing database
drop temporary database

The error would always occur at step 2. So I put a sleep second between step 1 and 2 - and the error was gone.

iwittkau on 19 Apr 2018

👍3

I can reproduce these errors by concurrently removing independent continuous queries, measurements, and retention policies.

In my tests, I have multiple measurements each with a seperate retention policy (call each pair A). Each measurement has a continuous query which inserts into another measurement which has its own retention policy (call each triple B). For each group, I remove A (measurement then RP), and B (CQ, measurement, then RP) concurrently. I get errors similar to these:

Statement error: shard 14: engine is closed
Statement error: shard 16: engine is closed

The error is returned when deleting measurement in group B.

farshidtz on 4 May 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[Configuration] Limit the memory usage of influxdb

FGRibreau · 45Comments

[feature request] Insert new tags to existing values, like update

mvadu · 60Comments

Support per-query timezone offsets

toddboom · 69Comments

[feature request] RENAME TAG

beckettsean · 44Comments

[[feature collection]] requested Functions and query operators

beckettsean · 105Comments