Cockroach: rocksdb: "Unable to Encode VersionEdit" preventing CockroachDB v.1.1x nodes from starting

Created on 7 Nov 2017  路  5Comments  路  Source: cockroachdb/cockroach

One of our users mentioned on gitter that 3 of 18 nodes in a v1.1.1 cluster were repeatedly failing to start with

cockroach server exited with error: inspecting engines: error writing version to engine <no-attributes>=/mnt/cockroach: Corruption: Unable to Encode VersionEdit:VersionEdit { ... }

https://gitter.im/cockroachdb/cockroach?at=5a0071f232e080696e6f3793

The "Unable to Encode" error seems to be an issue with empty ssts that was fixed shortly after our rocksdb branch for v.1.1.1 (https://github.com/facebook/rocksdb/issues/2478). We should cherry-pick that fix into a 1.1 patch release.

Most helpful comment

This has been cherry-picked for 1.1.4 which should be out in 2-3 weeks.

All 5 comments

@mberhault, @bdarnell I think we should cherry-pick this fix into 1.1.

SGTM

We saw this again overnight last night on 1 of our 18 nodes in the cluster (the nodes did all have a few GB of data this time besides the admin UI time series). (all nodes were running 1.1.2 this time). Same behavior in that we couldn't restart the node; we had to decommission it and spin up a blank new one in its place.

E171123 13:38:27.980189 1 cli/error.go:68 cockroach server exited with error: inspecting engines: error writing version to engine =/mnt/cockroach: Corruption: Unable to Encode VersionEdit:VersionEdit {
PrevLogNumber: 0
NextFileNumber: 144553
LastSeq: 5892668735
DeleteFile: 0 140564
DeleteFile: 0 140566
DeleteFile: 0 140568
DeleteFile: 0 140577
DeleteFile: 0 140582
DeleteFile: 0 140585
DeleteFile: 0 140588
DeleteFile: 0 140591
DeleteFile: 0 140594
DeleteFile: 0 140597
DeleteFile: 0 140600
DeleteFile: 0 140603
DeleteFile: 0 140606
DeleteFile: 0 140609
DeleteFile: 0 140612
DeleteFile: 0 140615
DeleteFile: 0 140618
DeleteFile: 0 140621
DeleteFile: 0 140624
DeleteFile: 0 140627
DeleteFile: 0 140630
DeleteFile: 0 140633
DeleteFile: 0 140636
DeleteFile: 0 140639
DeleteFile: 0 140642
DeleteFile: 0 140645
DeleteFile: 0 140648
DeleteFile: 0 140651
DeleteFile: 0 140654
DeleteFile: 0 140657
DeleteFile: 0 140660
DeleteFile: 0 140663
DeleteFile: 4 140540
DeleteFile: 4 140546
DeleteFile: 4 140553
DeleteFile: 4 140554
DeleteFile: 4 140555
AddFile: 4 144551 31765413 '0169B5727266746100' seq:5892268419, type:1 .. '047473641263722E73746F72652E76616C636F756E74000189F80668013800' seq:5892667849, type:2
AddFile: 4 144552 17255756 '047473641263722E73746F72652E76616C636F756E74000189F80668013900' seq:5892668445, type:2 .. 'BB8F1264616C30322E696C616E64636C6F75642E636F6D3A75726E3A76636C6F75643A6F72673A32623539626661612D643261382D343439632D383130372D3661323639313137636631320001127265706F7274696E6700011484A7978F4284D191673F1275726E3A696C616E643A7461736B3A33343764363038652D663039312D343665342D396632362D356363303239623337633166000100' seq:72057594037927935, type:15
AddFile: 4 144550 1049 (bad) .. (bad)
ColumnFamily: 0
}
Error: cockroach server exited with error: inspecting engines: error writing version to engine =/mnt/cockroach: Corruption: Unable to Encode VersionEdit:VersionEdit {

...

Failed running "start"
I1123 13:38:28.073122 31616 executor.cpp:925] Command exited with status 1 (pid: 31625)
I1123 13:38:29.074844 31624 process.cpp:1068] Failed to accept socket: future discarded

Thanks for the report @bsnyder788.

@mberhault Let's make sure the cherry-pick gets queued up for 1.1.4 release.

This has been cherry-picked for 1.1.4 which should be out in 2-3 weeks.

Was this page helpful?
0 / 5 - 0 ratings