Etcd: rebooting a etcd member causes wal: failed to allocate space

Created on 17 Jun 2019  路  4Comments  路  Source: etcd-io/etcd

One of etcd nodes down, after rebooting it cannot start and reports:

etcd wal: failed to allocate space when creating a new WAL (structure needs cleaning)

at first we though it was caused by a broken file system so we change data-dir to a different location on a different partition but the problem still exists. So I suspect it probably caused by other reasons.

This is the booting log
boot_log

arequestion

All 4 comments

@yuchengwu hi, did you check if there is any issue with disk space? The error structure needs cleaning is possibly related to file corruption. This seems like needs a fix in your env, https://github.com/etcd-io/etcd/blob/master/wal/file_pipeline.go#L81 Thanks!

hi @spzala , we had checked disk space only 1% was used, we also tried to remove this member and add it back but the issue still exists, and as I mentioned before we even change the data-dir to a different partition, so it's probably not because file corruption.

I agree that it's more related to our env, the issue had gone after the server was rebooted again, unfortunately we didn't know why the server rebooted serval times accidentally without human intervention.

This is more likely caused by a mis-configuration env instead of a etcd related issue, so I am closing, feel free to open it if you have encountered similar situation and found more valuable clues.

@yuchengwu thanks for the details and closing the issue!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kghost picture kghost  路  4Comments

r007m4n picture r007m4n  路  3Comments

invidian picture invidian  路  3Comments

olalonde picture olalonde  路  4Comments

aphyr picture aphyr  路  4Comments