Influxdb: Renaming corrupt data files fails

Created on 11 Jun 2019  Â·  9Comments  Â·  Source: influxdata/influxdb

Summary:

When the db engine detects a corrupt tsm data file upon db startup procedure the renaming of that file fails due to a file lock on the data file created by the db engine itself.

Environment:

OS: Windows 7 x86
InfluxDb: 1.7.3

Expected behaviour:

The renaming of the corrupt data file succeeds. No file access violation error occurs.

Actual behaviour:

At startup the db engine reads every data file and checks its integrity. When the engine detects a corrupt file it tries to rename that file while the file stream for reading that file is still open. This results in a file access violation (The process cannot access the file because it is being used by another process) error.

Steps to reproduce:

  1. Stop a running influx db
  2. Manually corrupt a data (tsm) file on disk by modifying its content (e.g. delete the content and add a random string)
  3. Start influx db again and monitor startup log entries

Suggested fix:

The file stream for reading the data file needs to be closed prior to attempting to rename it.

Related log entries:

influxdb.log

1.x

All 9 comments

This may already be addressed by InfluxDB 1.7.7.

@it-vit please update to the latest influxdb and reopen if it is still an issue.

The issue still exists after updating to influxdb 1.7.7 by compiling the source code from scratch with latest Version 1.12.7 of golang on win x86 architecture.

Related log lines: (For full log output see attached file influxdb.log)
[2019-08-19 21:25:45,273] [2019-08-19 23:25:45.273] INFO Influx (7) - {"lvl":"info","ts":"2019-08-19T21:25:45.263715Z","msg":"InfluxDB starting","log_id":"0HMxBhgl000","version":"1.7.7","branch":"HEAD","commit":"f8fdf652f"}
[2019-08-19 21:25:45,274] [2019-08-19 23:25:45.274] INFO Influx (7) - {"lvl":"info","ts":"2019-08-19T21:25:45.263715Z","msg":"Go runtime","log_id":"0HMxBhgl000","version":"go1.12.7","maxprocs":3}

…..

[2019-08-19 21:25:45,445] [2019-08-19 23:25:45.445] ERROR Influx (7) - {"lvl":"error","ts":"2019-08-19T21:25:45.441733Z","msg":"Cannot read corrupt tsm file, renaming","log_id":"0HMxBhgl000","engine":"tsm1","service":"filestore","path":"c:\RuntimeDataDir\PCU\db\influxdb\data\DataHistory\short\0\000000008-000000002.tsm","id":0,"error":"indirectIndex: not enough data for max time"}
[2019-08-19 21:25:45,446] [2019-08-19 23:25:45.446] ERROR Influx (7) - {"lvl":"error","ts":"2019-08-19T21:25:45.441733Z","msg":"Cannot rename corrupt tsm file","log_id":"0HMxBhgl000","engine":"tsm1","service":"filestore","path":"c:\RuntimeDataDir\PCU\db\influxdb\data\DataHistory\short\0\000000008-000000002.tsm","id":0,"error":"rename c:\RuntimeDataDir\PCU\db\influxdb\data\DataHistory\short\0\000000008-000000002.tsm c:\RuntimeDataDir\PCU\db\influxdb\data\DataHistory\short\0\000000008-000000002.tsm.bad: The process cannot access the file because it is being used by another process."}
[2019-08-19 21:25:45,447] [2019-08-19 23:25:45.447] INFO Influx (7) - {"lvl":"info","ts":"2019-08-19T21:25:45.442733Z","msg":"Failed to open shard","log_id":"0HMxBhgl000","service":"store","trace_id":"0HMxBi80000","op_name":"tsdb_open","db_shard_id":0,"error":"[shard 0] cannot rename corrupt file c:\RuntimeDataDir\PCU\db\influxdb\data\DataHistory\short\0\000000008-000000002.tsm: rename c:\RuntimeDataDir\PCU\db\influxdb\data\DataHistory\short\0\000000008-000000002.tsm c:\RuntimeDataDir\PCU\db\influxdb\data\DataHistory\short\0\000000008-000000002.tsm.bad: The process cannot access the file because it is being used by another process."}

…...

[2019-08-19 21:25:45,460] [2019-08-19 23:25:45.460] INFO Influx (7) - {"lvl":"info","ts":"2019-08-19T21:25:45.443733Z","msg":"Listening for signals","log_id":"0HMxBhgl000"}

Thanks for Looking into this!

influxdb.log

Please reopen that issue

Please reopen that issue, patch as follows

````patch
+++ b/tsdb/engine/tsm1/file_store.go
@@ -542,6 +542,8 @@ func (f *FileStore) Open() error {
// the file, and continue loading the shard without it.
if err != nil {
f.logger.Error("Cannot read corrupt tsm file, renaming", zap.String("path", file.Name()), zap.Int("id", idx), zap.Error(err))

  • file.Close()
    +
    if e := os.Rename(file.Name(), file.Name()+"."+BadTSMFileExtension); e != nil {
    f.logger.Error("Cannot rename corrupt tsm file", zap.String("path", file.Name()), zap.Int("id", idx), zap.Error(e))
    readerC <- &res{r: df, err: fmt.Errorf("cannot rename corrupt file %s: %v", file.Name(), e)}
    ````

@runner-mei: Thanks for the patch!

@russorat: Please reopen. Would be great if that fix is included to the next release. Thanks!

Thanks!

@Elbehery Can fix #15527?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

udf2457 picture udf2457  Â·  3Comments

bigKS picture bigKS  Â·  3Comments

deepujain picture deepujain  Â·  3Comments

MayukhSobo picture MayukhSobo  Â·  3Comments

dandv picture dandv  Â·  3Comments