Hello :)
I made kafka engine with JSONAsString.
but, log said JSONAsString can only use with String not with Nullable(String).
Can't I use Nullable type?
CREATE TABLE lake.kafka_message
(
message Nullable(String)
)
ENGINE = Kafka
SETTINGS kafka_broker_list = '192.168.123.60:9092', kafka_topic_list = 'lake.public.member', kafka_group_name = 'clickhouse_consumer', kafka_format = 'JSONAsString';
2020.08.10 15:41:06.346055 [ 5937 ] {} <Error> void DB::StorageKafka::threadFunc(): Code: 49, e.displayText() = DB::Exception: This input format is only suitable for tables with a single column of type String., Stack trace (when copying this message, always include the lines below):
0. Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x12405650 in /usr/bin/clickhouse
1. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0xa2423fd in /usr/bin/clickhouse
2. ? @ 0xfc82327 in /usr/bin/clickhouse
3. ? @ 0xfc81337 in /usr/bin/clickhouse
4. DB::FormatFactory::getInputFormat(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::ReadBuffer&, DB::Block const&, DB::Context const&, unsigned long, std::__1::function<void ()>) const @ 0xfb6cd00 in /usr/bin/clickhouse
5. DB::KafkaBlockInputStream::readImpl() @ 0xf7fbdb2 in /usr/bin/clickhouse
6. DB::IBlockInputStream::read() @ 0xedf3d1d in /usr/bin/clickhouse
7. DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::__1::atomic<bool>*) @ 0xee172fe in /usr/bin/clickhouse
8. DB::StorageKafka::streamToViews() @ 0xf7dc077 in /usr/bin/clickhouse
9. DB::StorageKafka::threadFunc() @ 0xf7dcaa9 in /usr/bin/clickhouse
10. DB::BackgroundSchedulePoolTaskInfo::execute() @ 0xefa3e39 in /usr/bin/clickhouse
11. DB::BackgroundSchedulePool::threadFunction() @ 0xefa42c2 in /usr/bin/clickhouse
12. ? @ 0xefa43f2 in /usr/bin/clickhouse
13. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xa270f67 in /usr/bin/clickhouse
14. ? @ 0xa26f4a3 in /usr/bin/clickhouse
15. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
16. __clone @ 0xfe8dd in /usr/lib64/libc-2.17.so
(version 20.6.3.28 (official build))
Can I ask why do you need Nullable there? You expect that some messages will not have json object?
Hi :)
I'm using Debezium as kafka source connector in data lake project.
Here is whole process of data flow.
PostgreSQL or MariaDB (Source DB) -> kafka source connector(Debezium) -> kafka cluster -> kafka sink connector(Confluent) -> PostgreSQL or etc...(Target DB)
There is a tombstone message when i delete a row.
When row is deleted, Two messages are created and send to kafka.
One is deleted row information(json object) and the other one is null(called tombstone).
Thanks to tombstone message, the row is automatically deleted in target DB.
For target DB, We will use PostgreSQL to gather data and want to make same as source DB. (only for data gathering from multiple source DB)
For time series DB(another target DB), I am considering ClickHouse. CH's compression rate is great and it is amazing for aggregation. It will store all of event in multiple source DB. but, we want to use a kafka cluster. I think it is waste of resource to make two kafka cluster and it can make more load to source DB.
so, I need a feature which can handle null message(tombstone message). If it ignore null message, there will be great also.
Can I ask why do you need Nullable there? You expect that some messages will not have json object?
@filimonov
Also in IBM Change Data Capture (which is another data capture software just like Debezium), delete rows are sent with NULL messages. Because there is no update or inserted data, message becames NULL.
This beaviour is common among data capture softwares.
Clients can infer deleted row from the KEY part of the message.
Most helpful comment
Hi :)
I'm using Debezium as kafka source connector in data lake project.
Here is whole process of data flow.
PostgreSQL or MariaDB (Source DB) -> kafka source connector(Debezium) -> kafka cluster -> kafka sink connector(Confluent) -> PostgreSQL or etc...(Target DB)
There is a tombstone message when i delete a row.
When row is deleted, Two messages are created and send to kafka.
One is deleted row information(json object) and the other one is null(called tombstone).
Thanks to tombstone message, the row is automatically deleted in target DB.
For target DB, We will use PostgreSQL to gather data and want to make same as source DB. (only for data gathering from multiple source DB)
For time series DB(another target DB), I am considering ClickHouse. CH's compression rate is great and it is amazing for aggregation. It will store all of event in multiple source DB. but, we want to use a kafka cluster. I think it is waste of resource to make two kafka cluster and it can make more load to source DB.
so, I need a feature which can handle null message(tombstone message). If it ignore null message, there will be great also.