Clickhouse marks part broken when can't allocate memory in ReplicatedMergeTreePartCheckThread
Parts wich are in work for queries were marked as broken when ReplicatedMergeTreePartCheckThread can't allocate memory. Clickhouse starts fetch these parts from replica, but these parts don't broken ( i can rename and attach them in test table and everything works good.
How to reproduce
Error message and/or stacktrace
2019.08.01 08:51:06.998053 [ 42 ] {} <Error> default.event_shard (ReplicatedMergeTreePartCheckThread): DB::CheckResult DB::ReplicatedMergeTreePartCheckThread::checkPart(const String&): Code: 173, e.displayText() = DB::ErrnoExcepti
on: Allocator: Cannot malloc 64.00 KiB., errno: 12, strerror: Cannot allocate memory, Stack trace:
2019.08.01 08:51:06.998113 [ 42 ] {} <Error> default.event_shard (ReplicatedMergeTreePartCheckThread): Part 20190801_20190801_1194_2517_7 looks broken. Removing it and queueing a fetch.
2019.07.22 12:35:55.747581 [ 50 ] {} <Error> default.rawlog_shard (ReplicatedMergeTreePartCheckThread): void DB::ReplicatedMergeTreePartCheckThread::checkPart(const String&): Code: 173, e.displayText() = DB::ErrnoException: Allocator: Ca$
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x16) [0x78e07d6]
1. /usr/bin/clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x22) [0x3a0d472]
2. /usr/bin/clickhouse-server(DB::throwFromErrno(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int)+0x171) [0x78c6841]
3. /usr/bin/clickhouse-server() [0x6c8e48c]
4. /usr/bin/clickhouse-server() [0x6c8ea7a]
5. /usr/bin/clickhouse-server(DB::DataTypeNumberBase<unsigned int>::deserializeBinaryBulk(DB::IColumn&, DB::ReadBuffer&, unsigned long, double) const+0x168) [0x6c941c8]
6. /usr/bin/clickhouse-server(DB::checkDataPart(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::MergeTreeIndexGranularity const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::a$
7. /usr/bin/clickhouse-server(DB::checkDataPart(std::shared_ptr<DB::MergeTreeDataPart const>, bool, std::vector<std::shared_ptr<DB::IDataType const>, std::allocator<std::shared_ptr<DB::IDataType const> > > const&, std::vector<std::shared$
8. /usr/bin/clickhouse-server(DB::ReplicatedMergeTreePartCheckThread::checkPart(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x6e1) [0x709ff01]
9. /usr/bin/clickhouse-server(DB::ReplicatedMergeTreePartCheckThread::run()+0xf5) [0x70a0ca5]
10. /usr/bin/clickhouse-server(DB::BackgroundSchedulePoolTaskInfo::execute()+0xfa) [0x6b945aa]
11. /usr/bin/clickhouse-server(DB::BackgroundSchedulePool::threadFunction()+0x6a) [0x6b94c8a]
12. /usr/bin/clickhouse-server() [0x6b94d09]
13. /usr/bin/clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x1af) [0x78e6a0f]
14. /usr/bin/clickhouse-server() [0xb79aaef]
15. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f6dd0d9c064]
16. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f6dd03c462d]
(version 19.8.3.8 (official build))
2019.07.22 12:35:55.747658 [ 50 ] {} <Error> default.rawlog_shard (ReplicatedMergeTreePartCheckThread): Part 20190720_20190721_718396_757812_11 looks broken. Removing it and queueing a fetch.
we also have this trouble with version 19.16.2.2
2019.12.06 10:07:23.784032 [ 25 ] {} <Error> default.rawlog_shard: DB::StorageReplicatedMergeTree::queueTask()::<lambda(DB::StorageReplicatedMergeTree::LogEntryPtr&)>: Code: 173, e.displayText() = DB::ErrnoException: Allocator: Cannot rea
lloc from 1.00 MiB to 2.00 MiB., errno: 12, strerror: Cannot allocate memory: (while reading column t_useragent): (while reading from part /var/data/clickhouse/data/default/rawlog_shard/20191202_221193_221284_4/ from mark 0 with max_rows_
to_read = 8192): Cannot fetch required block. Stream MergeTreeSequentialBlockInputStream, part 2, Stack trace:
0. 0x55d7173677b0 StackTrace::StackTrace() /usr/bin/clickhouse
1. 0x55d717378d43 DB::ErrnoException::ErrnoException(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int, std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<c
har> > > const&) /usr/bin/clickhouse
2. 0x55d716eb88bf DB::throwFromErrno(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, int) /usr/bin/clickhouse
3. 0x55d7173957ff Allocator<false, false>::realloc(void*, unsigned long, unsigned long, unsigned long) /usr/bin/clickhouse
4. 0x55d71a629e03 ? /usr/bin/clickhouse
5. 0x55d71a62e735 DB::DataTypeString::deserializeBinaryBulk(DB::IColumn&, DB::ReadBuffer&, unsigned long, double) const /usr/bin/clickhouse
6. 0x55d71af242c1 DB::MergeTreeReader::readData(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::IDataType const&, DB::IColumn&, unsigned long, bool, unsigned long, bool) /usr/bin/clickhouse
7. 0x55d71af249ac DB::MergeTreeReader::readRows(unsigned long, bool, unsigned long, DB::Block&) /usr/bin/clickhouse
8. 0x55d71aa6159d DB::MergeTreeSequentialBlockInputStream::readImpl() /usr/bin/clickhouse
9. 0x55d71a57a7f7 DB::IBlockInputStream::read() /usr/bin/clickhouse
10. 0x55d71ac8cd51 DB::ColumnGathererStream::fetchNewBlock(DB::ColumnGathererStream::Source&, unsigned long) /usr/bin/clickhouse
11. 0x55d71a86bbb1 void DB::ColumnGathererStream::gather<DB::ColumnString>(DB::ColumnString&) /usr/bin/clickhouse
12. 0x55d71ac8e26f DB::ColumnGathererStream::readImpl() /usr/bin/clickhouse
13. 0x55d71a57a7f7 DB::IBlockInputStream::read() /usr/bin/clickhouse
14. 0x55d71aa08293 DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart(DB::FutureMergedMutatedPart const&, DB::MergeListEntry&, DB::TableStructureReadLockHolder&, long, DB::DiskSpace::Reservation*, bool, bool) /usr/bin/clickhouse
15. 0x55d71a958d40 DB::StorageReplicatedMergeTree::tryExecuteMerge(DB::ReplicatedMergeTreeLogEntry const&) /usr/bin/clickhouse
16. 0x55d71a9683bb DB::StorageReplicatedMergeTree::executeLogEntry(DB::ReplicatedMergeTreeLogEntry&) /usr/bin/clickhouse
17. 0x55d71a9684f1 ? /usr/bin/clickhouse
18. 0x55d71aaa5c51 DB::ReplicatedMergeTreeQueue::processEntry(std::function<std::shared_ptr<zkutil::ZooKeeper> ()>, std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&, std::function<bool (std::shared_ptr<DB::ReplicatedMergeTreeLogEntry>&)>
) /usr/bin/clickhouse
19. 0x55d71a93e64f DB::StorageReplicatedMergeTree::queueTask() /usr/bin/clickhouse
20. 0x55d71a9b3694 DB::BackgroundProcessingPool::threadFunction() /usr/bin/clickhouse
21. 0x55d71a9b400a ? /usr/bin/clickhouse
22. 0x55d7173b1d5c ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/bin/clickhouse
23. 0x55d71d0cf1e0 ? /usr/bin/clickhouse
24. 0x7fa77b93c064 start_thread /lib/x86_64-linux-gnu/libpthread-2.19.so
25. 0x7fa77b26562d clone /lib/x86_64-linux-gnu/
CC @alesapin
We should remove all the logic from checkDataPart except reading files and validating checksums.
All the unneeded logic from checkDataPart was removed.
Most helpful comment
All the unneeded logic from
checkDataPartwas removed.