Etcd: rebooting a etcd member causes wal: failed to allocate space

Created on 17 Jun 2019 · 4Comments · Source: etcd-io/etcd

One of etcd nodes down, after rebooting it cannot start and reports:

etcd wal: failed to allocate space when creating a new WAL (structure needs cleaning)

at first we though it was caused by a broken file system so we change data-dir to a different location on a different partition but the problem still exists. So I suspect it probably caused by other reasons.

This is the booting log
boot_log

arequestion

Source

yuchengwu

All 4 comments

@yuchengwu hi, did you check if there is any issue with disk space? The error structure needs cleaning is possibly related to file corruption. This seems like needs a fix in your env, https://github.com/etcd-io/etcd/blob/master/wal/file_pipeline.go#L81 Thanks!

spzala on 27 Jun 2019

hi @spzala , we had checked disk space only 1% was used, we also tried to remove this member and add it back but the issue still exists, and as I mentioned before we even change the data-dir to a different partition, so it's probably not because file corruption.

yuchengwu on 28 Jun 2019

👍1

I agree that it's more related to our env, the issue had gone after the server was rebooted again, unfortunately we didn't know why the server rebooted serval times accidentally without human intervention.

This is more likely caused by a mis-configuration env instead of a etcd related issue, so I am closing, feel free to open it if you have encountered similar situation and found more valuable clues.

yuchengwu on 28 Jun 2019

👍1

@yuchengwu thanks for the details and closing the issue!

spzala on 28 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings