Clickhouse: DB::Exception: Cannot create table from metadata file when no valid replicas

Created on 19 Apr 2017  Â·  5Comments  Â·  Source: ClickHouse/ClickHouse

We have a single ClickHouse server, but with ReplicatedMergeTree table on it.
Recently we got a disk failure (its a VM on Vultr, so I don't know details). After the restart ClickHouse found mismatch between ZooKeeper and on-disk data. It unsuccessfully tried to fetch missing parts. And finally stopped with DB::Exception.

The instructions from here helped to continue operation with MergeTree over on-disk data: https://groups.google.com/forum/#!topic/clickhouse/ETMmRj__wDk
I suppose the failure in initialization a table should not cause the server shutdown.

ClickHouse version: 1.1.54164


Log

2017.04.19 04:40:01.802254 [ 1 ]  : Starting daemon with revision 54164
2017.04.19 04:40:01.805189 [ 1 ]  Application: starting up
2017.04.19 04:40:01.819077 [ 1 ]  Application: Loading metadata.
2017.04.19 04:40:01.819950 [ 1 ]  DatabaseOrdinary (system): Total 0 tables.
2017.04.19 04:40:01.820185 [ 1 ]  DatabaseOrdinary (default): Total 1 tables.
2017.04.19 04:40:01.825477 [ 2 ]  BackgroundProcessingPool: Create BackgroundProcessingPool with 16 threads
2017.04.19 04:40:02.163172 [ 2 ]  default.decisions (StorageReplicatedMergeTree): Fetching missing part 20170418_20170418_250177_250177_0
2017.04.19 04:40:02.163267 [ 2 ]  default.decisions (StorageReplicatedMergeTree): Fetching missing part 20170418_20170418_250170_250170_0
...
2017.04.19 04:40:02.165187 [ 2 ]  default.decisions (StorageReplicatedMergeTree): Fetching missing part 20170418_20170418_250259_250259_0
2017.04.19 04:40:02.165199 [ 2 ]  default.decisions (StorageReplicatedMergeTree): Fetching missing part 20170418_20170418_250134_250229_22
2017.04.19 04:40:02.191543 [ 1 ]  Application: DB::Exception: Cannot create table from metadata file /mnt/clickhouse/data/metadata/default//decisions.sql, error: DB::Exception: T
he local set of parts of table decisions doesn't look like the set of parts in ZooKeeper. There are 109 unexpected parts (23 of them is not just-written), 0 unexpectedly merged parts, 0
 missing obsolete parts, 137 missing parts, stack trace:
0. clickhouse-server(StackTrace::StackTrace()+0x16) [0x1220266]
1. clickhouse-server(DB::Exception::Exception(std::string const&, int)+0x1f) [0xf85f5f]
2. clickhouse-server(DB::StorageReplicatedMergeTree::checkParts(bool)+0x1c80) [0x12d19f0]
3. clickhouse-server(DB::StorageReplicatedMergeTree::StorageReplicatedMergeTree(std::string const&, std::string const&, bool, std::string const&, std::string const&, std::string const&,
 std::shared_ptr, DB::NamesAndTypesList const&, DB::NamesAndTypesList const&, std::unordered_map, std::equa
l_to, std::allocator > > const&, DB::Context&, std::shared_ptr&, std::string const&, std::shared_ptr con
st&, unsigned long, DB::MergeTreeData::MergingParams const&, bool, DB::MergeTreeSettings const&)+0x12eb) [0x12d957b]
4. clickhouse-server(DB::StorageReplicatedMergeTree::create(std::string const&, std::string const&, bool, std::string const&, std::string const&, std::string const&, std::shared_ptr, DB::NamesAndTypesList const&, DB::NamesAndTypesList const&, std::unordered_map, std::equal_to, s
td::allocator > > const&, DB::Context&, std::shared_ptr&, std::string const&, std::shared_ptr const&, unsigned long,
DB::MergeTreeData::MergingParams const&, bool, DB::MergeTreeSettings const&)+0xcb) [0x12da76b]
5. clickhouse-server(DB::StorageFactory::get(std::string const&, std::string const&, std::string const&, std::string const&, DB::Context&, DB::Context&, std::shared_ptr&, std:
:shared_ptr, DB::NamesAndTypesList const&, DB::NamesAndTypesList const&, std::unordered_map, std::equal_to<
std::string>, std::allocator > > const&, bool, bool) const+0x1be5) [0x128d615]
6. clickhouse-server(DB::createTableFromDefinition(std::string const&, std::string const&, std::string const&, DB::Context&, bool, std::string const&)+0x1b4) [0x1d4ac44]
7. clickhouse-server() [0x113bd12]
8. clickhouse-server() [0x113c7a8]
9. clickhouse-server() [0x113cdd4]
10. clickhouse-server(ThreadPool::worker()+0x141) [0x12289e1]
11. clickhouse-server() [0x31d626f]
12. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f53d1740064]
13. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f53d0d6862d]

2017.04.19 04:40:02.191606 [ 1 ]  Application: shutting down
2017.04.19 04:40:02.720813 [ 1 ]  ~ZooKeeper: Closing ZooKeeper session
2017.04.19 04:40:02.721112 [ 1 ]  ~ZooKeeper: Removing 0 watches
2017.04.19 04:40:02.721129 [ 1 ]  ~ZooKeeper: Removed watches
2017.04.19 04:40:02.721200 [ 3 ]  BaseDaemon: Stop SignalListener thread

st-need-info

Most helpful comment

What happen if I set force_restore_data flag?
Does ClickHouse remove mentions of the missing parts from ZooKeeper?
Or for another case, when ZooKeeper is empty, does ClickHouse fill ZooKeeper with actual data?
I just try to understand what is the absolute true here: data on disk or data in ZooKeeper?
Does ClickHouse try to do the best and keep as much data as possible?

All 5 comments

This is intended as "safety threshold".

With large enough difference between local data and reference data in ZK, by default, the server refuses to start, to allow manual restore. To activate automatic restore, you could set force_restore_data flag before starting the server. This is as simple as:

sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data

This "safety threshold" rarely helps and useless in most of cases - you simply activate automatic data restore and continue. But we believe, that it will be helpful in cases of configuration errors.

What happen if I set force_restore_data flag?
Does ClickHouse remove mentions of the missing parts from ZooKeeper?
Or for another case, when ZooKeeper is empty, does ClickHouse fill ZooKeeper with actual data?
I just try to understand what is the absolute true here: data on disk or data in ZooKeeper?
Does ClickHouse try to do the best and keep as much data as possible?

@gelin this process is documented here: https://clickhouse.yandex/docs/en/operations/table_engines/replication/#recovery-after-failures

@gelin do you have any further questions?

No.

пн, 15 окт. 2018 г., 22:27 Ivan Blinkov notifications@github.com:

@gelin https://github.com/gelin do you have any further questions?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/yandex/ClickHouse/issues/712#issuecomment-429921708,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA6qKVb3D3jfY_1ltV3Wp3_QXqdOWUZJks5ulLdVgaJpZM4NBObj
.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SaltTan picture SaltTan  Â·  3Comments

hatarist picture hatarist  Â·  3Comments

lttPo picture lttPo  Â·  3Comments

bseng picture bseng  Â·  3Comments

jimmykuo picture jimmykuo  Â·  3Comments