Hi all
I am using CH 19.3.6 on CentOS7.4. I am new to clickhouse and troubled by storing kafka data via materialized view.
fully follow the documentation, I created a kafka engine table, a mergetree table and a materialized view
1.
CREATE TABLE BI.S_V3 (
transactionnumber UInt64,
warehousenumber Nullable(String)
)
ENGINE = Kafka SETTINGS kafka_broker_list = 'XXXXXX:9092',
kafka_topic_list = 'bi_realtime_shippingcarrierhistory',
kafka_group_name = 'test_v3',
kafka_format = 'JSONEachRow',
kafka_row_delimiter = '\n';
2.
CREATE TABLE BI.F_V3 (
transactionnumber UInt64,
warehousenumber Nullable(String),
date_key Date) ENGINE = MergeTree(date_key, (transactionnumber), 8192);
3.
CREATE MATERIALIZED VIEW BI.V_V3 TO BI.F_V3 AS SELECT *, toDate(now())as date_key from BI.S_V3;
it's fine to execute 'select *from BI.S_V3', i could read the kafka data from kafka engine table. but when I created the materialized view, clickhouse service down . I must remove the V_V3.sql from /var/lib/clickhouse/metadata/BI/ , and then I can restart the service.
I tried clickhouse 19.1.3 and 19.3.6, meet the same issue in bot version.
here is the error log
2019.03.04 15:46:39.356000 [ 46 ] {} <Trace> StorageKafka (S_V3): Committing message with offset 2450202
2019.03.04 15:46:39.356021 [ 46 ] {} <Trace> StorageKafka (S_V3): Polled batch of 1 messages
2019.03.04 15:46:39.356097 [ 42 ] {de558461-b3f1-4a23-941f-fb9f91088ff1} <Trace> StorageKafka (S_V3): Committing message with offset 2450203
2019.03.04 15:46:39.356184 [ 42 ] {de558461-b3f1-4a23-941f-fb9f91088ff1} <Debug> MemoryTracker: Peak memory usage (for query): 1.02 MiB.
2019.03.04 15:46:39.356226 [ 42 ] {de558461-b3f1-4a23-941f-fb9f91088ff1} <Debug> MemoryTracker: Peak memory usage (total): 1.02 MiB.
2019.03.04 15:46:39.356240 [ 42 ] {de558461-b3f1-4a23-941f-fb9f91088ff1} <Information> TCPHandler: Processed in 0.014 sec.
2019.03.04 15:47:18.364107 [ 42 ] {0cb1e0a2-0c7b-4257-b613-fc856308fa7c} <Debug> executeQuery: (from [::1]:52820) CREATE MATERIALIZED VIEW BI.V_V3 TO BI.F_V3 AS SELECT *, toDate(now())as date_key from BI.S_V3;
2019.03.04 15:47:18.365229 [ 42 ] {0cb1e0a2-0c7b-4257-b613-fc856308fa7c} <Debug> MemoryTracker: Peak memory usage (for query): 4.06 KiB.
2019.03.04 15:47:18.365266 [ 42 ] {0cb1e0a2-0c7b-4257-b613-fc856308fa7c} <Debug> MemoryTracker: Peak memory usage (total): 4.06 KiB.
2019.03.04 15:47:18.365299 [ 42 ] {0cb1e0a2-0c7b-4257-b613-fc856308fa7c} <Information> TCPHandler: Processed in 0.001 sec.
2019.03.04 15:47:18.865476 [ 29 ] {} <Debug> StorageKafka (S_V3): Started streaming to 1 attached views
2019.03.04 15:47:18.891144 [ 29 ] {} <Trace> StorageKafka (S_V3): Polled batch of 65536 messages
2019.03.04 15:47:18.919831 [ 47 ] {} <Error> BaseDaemon: ########################################
2019.03.04 15:47:18.919901 [ 47 ] {} <Error> BaseDaemon: (from thread 29) Received signal Segmentation fault (11).
2019.03.04 15:47:18.919912 [ 47 ] {} <Error> BaseDaemon: Address: NULL pointer.
2019.03.04 15:47:18.919919 [ 47 ] {} <Error> BaseDaemon: Access: read.
2019.03.04 15:47:18.919926 [ 47 ] {} <Error> BaseDaemon: Address not mapped to object.
2019.03.04 15:47:18.947618 [ 47 ] {} <Error> BaseDaemon: 0. clickhouse-server(DB::JSONEachRowRowInputStream::read(std::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&)+0x30) [0x57e3600]
2019.03.04 15:47:18.947662 [ 47 ] {} <Error> BaseDaemon: 1. clickhouse-server(DB::BlockInputStreamFromRowInputStream::readImpl()+0xa4) [0x5a6b144]
2019.03.04 15:47:18.947674 [ 47 ] {} <Error> BaseDaemon: 2. clickhouse-server(DB::IBlockInputStream::read()+0x1f5) [0x5343945]
2019.03.04 15:47:18.947683 [ 47 ] {} <Error> BaseDaemon: 3. clickhouse-server(DB::KafkaBlockInputStream::readImpl()+0x19) [0x5d7a3c9]
2019.03.04 15:47:18.947689 [ 47 ] {} <Error> BaseDaemon: 4. clickhouse-server(DB::IBlockInputStream::read()+0x1f5) [0x5343945]
2019.03.04 15:47:18.947704 [ 47 ] {} <Error> BaseDaemon: 5. clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x60) [0x535c8f0]
2019.03.04 15:47:18.947710 [ 47 ] {} <Error> BaseDaemon: 6. clickhouse-server(DB::StorageKafka::streamToViews()+0x62c) [0x5d6f56c]
2019.03.04 15:47:18.947716 [ 47 ] {} <Error> BaseDaemon: 7. clickhouse-server(DB::StorageKafka::streamThread()+0x8b) [0x5d6f9bb]
2019.03.04 15:47:18.947721 [ 47 ] {} <Error> BaseDaemon: 8. clickhouse-server(DB::BackgroundSchedulePool::TaskInfo::execute()+0xd8) [0x58610d8]
2019.03.04 15:47:18.947727 [ 47 ] {} <Error> BaseDaemon: 9. clickhouse-server(DB::BackgroundSchedulePool::threadFunction()+0x62) [0x5861982]
2019.03.04 15:47:18.947732 [ 47 ] {} <Error> BaseDaemon: 10. clickhouse-server() [0x58619f4]
2019.03.04 15:47:18.947738 [ 47 ] {} <Error> BaseDaemon: 11. clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x199) [0x5da11e9]
2019.03.04 15:47:18.947743 [ 47 ] {} <Error> BaseDaemon: 12. clickhouse-server() [0x65c572f]
2019.03.04 15:47:18.947747 [ 47 ] {} <Error> BaseDaemon: 13. /lib64/libpthread.so.0(+0x7dd5) [0x7f30d6724dd5]
2019.03.04 15:47:21.807462 [ 42 ] {a89390f3-7d4e-472d-aa07-a827be851954} <Debug> executeQuery: (from [::1]:52820) SHOW TABLES;
2019.03.04 15:47:21.808194 [ 42 ] {a89390f3-7d4e-472d-aa07-a827be851954} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
2019.03.04 15:47:21.808303 [ 42 ] {a89390f3-7d4e-472d-aa07-a827be851954} <Debug> executeQuery: Query pipeline:
@BigBig-joe can you provide some examples of your messages (JSONEachRow) ?
yes. I fetch the first 2 columns only. full schema likes below.
{"transactionnumber":177732656,"warehousenumber":"09","trackingnumber":"74899998266877023548","Weight":1,"DimensionWeight":0,"BoxNumber":"1 ","OrderType":101,"InvoiceNumber":"432730694","ServiceType":"114","CalculatedCharge":3.36,"ShippingCarrierCharge":3.24,"BilledShippingCharge":null,"ShippingCarrierType":"FEDEX","CreateTime":"2019-02-25T07:42:55.880Z","Status":"Y","CompanyCode":"1003","EDIBillingDate":null,"DeliveredDateTime":"2019-03-01T15:29:00.000Z","PromisedDeliveredDay":null,"CustomerNumber":"57909506","DropShipID":166181835,"ItemNumber":"","Zone":"5 ","NetWeight":0.35,"ZipCode":"33563-6502","CurrencyCode":"USD","CurrencyExchangeRate":1,"CurrencyCalculatedCharge":3.36,"SubCode":1,"ShippingID":1,"Country":"USA","InternalProcessExpectTime":null,"InternalProcessActualTime":null,"CarrierProcessExpectTime":null,"CarrierProcessActualTime":null,"CarrierProcessFromNightTime":null,"AddressType":"R","ConsolidatedSO":"N","SOType":null}
Did you build ClickHouse by your own? Or download package somewhere?
Did you build ClickHouse by your own? Or download package somewhere?
I download from Altinity .
https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-common-19.3.6-1.el7.x86_64.rpm
It looks like a bug for sure. I will try to reproduce. There are other mentions of problems when parsing messages from Kafka in this CH version. Will fix them all soon.
We have the same problems with ClickHouse v19.3.7 and materialized views consuming from a tables with Kafka engine: CH server crashing with segmentation fault
If it helps somehow: we are having segmentation faults using kafka+mview with kafka tables from 2 different kafka clusters. For some reason, when we removed some tables and used only ones from one cluster - segfaults disappeared. Not sure, how it makes the difference, just an observation.
v19.5.3 has been more stable for me (at 48 hours into testing). Raised some similar concerns in #5085
We have the same problems with ClickHouse v19.8.3.8 and materialized views consuming from a tables with Kafka engine (protobuf): CH server crashing with segmentation fault
I don't know if it is the same bug but I have a segfault with Kafka engine, Protobuf format and MV. It seems to happen if I send a message with a conversion error (for example a Protobuf int64 field mapped to a DateTime column and the value is exceeding UInt32 capacity). I tried to build a simpler test case but didn't succeed. Here is what I get in the logs:
2019.07.08 09:48:39.903730 [ 15 ] {} <Debug> StorageKafka (KAFKA_flow_data): Started streaming to 1 attached views
2019.07.08 09:48:40.405813 [ 15 ] {} <Trace> StorageKafka (KAFKA_flow_data): Polled batch of 1 messages
2019.07.08 09:48:40.406625 [ 15 ] {} <Trace> StorageKafka (KAFKA_flow_data): Re-joining claimed consumer after failure
2019.07.08 09:48:40.406915 [ 15 ] {} <Error> void DB::StorageKafka::streamThread(): Code: 436, e.displayText() = DB::Exception: Could not convert value '1562579319311' from protobuf field 'date_time' to data type 'UInt32', Stack trace:
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x16) [0x7285206]
1. /usr/bin/clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x22) [0x39a84d2]
2. /usr/bin/clickhouse-server(DB::ProtobufReader::ConverterBaseImpl::cannotConvertValue(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe9) [0x6d979a9]
3. /usr/bin/clickhouse-server(unsigned int DB::ProtobufReader::ConverterBaseImpl::numericCast<unsigned int, unsigned long>(unsigned long)+0xa4) [0x6da54d4]
4. /usr/bin/clickhouse-server(DB::ProtobufReader::ConverterFromNumber<4, unsigned long>::readDateTime(long&)+0x41) [0x6da55d1]
5. /usr/bin/clickhouse-server(DB::DataTypeDateTime::deserializeProtobuf(DB::IColumn&, DB::ProtobufReader&, bool, bool&) const+0x3b) [0x65c82cb]
6. /usr/bin/clickhouse-server(DB::ProtobufRowInputStream::read(std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&)+0x131) [0x6accf41]
7. /usr/bin/clickhouse-server(DB::BlockInputStreamFromRowInputStream::readImpl()+0x168) [0x6d76c98]
8. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6591b08]
9. /usr/bin/clickhouse-server(DB::KafkaBlockInputStream::readImpl()+0x28) [0x72627a8]
10. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6591b08]
11. /usr/bin/clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x6b) [0x65b004b]
12. /usr/bin/clickhouse-server(DB::StorageKafka::streamToViews()+0x5cd) [0x725ca9d]
13. /usr/bin/clickhouse-server(DB::StorageKafka::streamThread()+0x1ba) [0x725cfea]
[...]
2019.07.08 09:48:40.907242 [ 11 ] {} <Debug> StorageKafka (KAFKA_flow_data): Started streaming to 1 attached views
2019.07.08 09:48:45.909893 [ 11 ] {} <Trace> StorageKafka (KAFKA_flow_data): Re-joining claimed consumer after failure
2019.07.08 09:48:45.910362 [ 11 ] {} <Error> void DB::StorageKafka::streamThread(): Code: 444, e.displayText() = DB::Exception: Protobuf messages are corrupted or don't match the provided schema. Please note that Protobuf stream is length-delimited: every message is prefixed by its length in varint., Stack trace:
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x16) [0x7285206]
1. /usr/bin/clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x22) [0x39a84d2]
2. /usr/bin/clickhouse-server() [0x6d930ed]
3. /usr/bin/clickhouse-server(DB::ProtobufReader::SimpleReader::readFieldNumber(unsigned int&)+0xe5) [0x6d94125]
4. /usr/bin/clickhouse-server(DB::ProtobufReader::readColumnIndex(unsigned long&)+0x3f) [0x6d94b1f]
5. /usr/bin/clickhouse-server(DB::ProtobufRowInputStream::read(std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&)+0xd4) [0x6accee4]
6. /usr/bin/clickhouse-server(DB::BlockInputStreamFromRowInputStream::readImpl()+0x168) [0x6d76c98]
7. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6591b08]
8. /usr/bin/clickhouse-server(DB::KafkaBlockInputStream::readImpl()+0x28) [0x72627a8]
9. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6591b08]
10. /usr/bin/clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x6b) [0x65b004b]
11. /usr/bin/clickhouse-server(DB::StorageKafka::streamToViews()+0x5cd) [0x725ca9d]
[...]
(corrupted Protobuf message above is repeated 3 times, not sure why)
[...]
2019.07.08 09:48:57.496783 [ 9 ] {} <Debug> StorageKafka (KAFKA_flow_data): Started streaming to 1 attached views
2019.07.08 09:49:02.999140 [ 9 ] {} <Trace> StorageKafka (KAFKA_flow_data): Polled batch of 0 messages
2019.07.08 09:49:02.999408 [ 48 ] {} <Error> BaseDaemon: ########################################
2019.07.08 09:49:02.999480 [ 48 ] {} <Error> BaseDaemon: (version 19.9.2.4 (official build)) (from thread 9) Received signal Segmentation fault (11).
2019.07.08 09:49:02.999497 [ 48 ] {} <Error> BaseDaemon: Address: NULL pointer.
2019.07.08 09:49:02.999509 [ 48 ] {} <Error> BaseDaemon: Access: read.
2019.07.08 09:49:02.999517 [ 48 ] {} <Error> BaseDaemon: Unknown si_code.
2019.07.08 09:49:03.067852 [ 48 ] {} <Error> BaseDaemon: 0. /usr/bin/clickhouse-server() [0x376894d]
2019.07.08 09:49:03.067913 [ 48 ] {} <Error> BaseDaemon: 1. /usr/bin/clickhouse-server(DB::ProtobufRowInputStream::read(std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&)+0x131) [0x6accf41]
2019.07.08 09:49:03.067924 [ 48 ] {} <Error> BaseDaemon: 2. /usr/bin/clickhouse-server(DB::BlockInputStreamFromRowInputStream::readImpl()+0x168) [0x6d76c98]
2019.07.08 09:49:03.067935 [ 48 ] {} <Error> BaseDaemon: 3. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6591b08]
2019.07.08 09:49:03.067942 [ 48 ] {} <Error> BaseDaemon: 4. /usr/bin/clickhouse-server(DB::KafkaBlockInputStream::readImpl()+0x28) [0x72627a8]
2019.07.08 09:49:03.067949 [ 48 ] {} <Error> BaseDaemon: 5. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x188) [0x6591b08]
2019.07.08 09:49:03.067959 [ 48 ] {} <Error> BaseDaemon: 6. /usr/bin/clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x6b) [0x65b004b]
2019.07.08 09:49:03.067981 [ 48 ] {} <Error> BaseDaemon: 7. /usr/bin/clickhouse-server(DB::StorageKafka::streamToViews()+0x5cd) [0x725ca9d]
2019.07.08 09:49:03.067988 [ 48 ] {} <Error> BaseDaemon: 8. /usr/bin/clickhouse-server(DB::StorageKafka::streamThread()+0x1ba) [0x725cfea]
@filimonov If it's already fixed, why don't close?
Most helpful comment
It looks like a bug for sure. I will try to reproduce. There are other mentions of problems when parsing messages from Kafka in this CH version. Will fix them all soon.