Describe the bug
Error in parsing json in kafka engine
How to reproduce
Create table with engine = Kafka and kafka_format = 'JSONEachRow'
post json message to kafka
see log
Error message and/or stacktrace
2019.02.19 11:53:54.931253 [ 32 ] {}
, Stack trace:
Additional context
Add any other context about the problem here.
We have seen similar kafka problem for "CSV" format after moving from 18.16.1 to 19.3.4
2019.02.19 00:58:12.444114 [ 33 ] {} <Error> void DB::StorageKafka::streamThread(): Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected , before: \0: (at row 2)
Could not print diagnostic info because two last rows aren't in buffer (rare case)
, Stack trace:
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x16) [0x6f13346]
1. /usr/bin/clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x22) [0x3399d82]
2. /usr/bin/clickhouse-server(DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&)+0x19f) [0x6f39cef]
3. /usr/bin/clickhouse-server(DB::CSVRowInputStream::read(std::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension
&)+0x272) [0x69ceab2]
4. /usr/bin/clickhouse-server(DB::BlockInputStreamFromRowInputStream::readImpl()+0x15c) [0x69cae5c]
5. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x25a) [0x61a510a]
6. /usr/bin/clickhouse-server(DB::KafkaBlockInputStream::readImpl()+0x28) [0x6ee8168]
7. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x25a) [0x61a510a]
8. /usr/bin/clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x77) [0x61c1c37]
9. /usr/bin/clickhouse-server(DB::StorageKafka::streamToViews()+0x627) [0x6eded27]
10. /usr/bin/clickhouse-server(DB::StorageKafka::streamThread()+0x1a8) [0x6edf388]
11. /usr/bin/clickhouse-server(DB::BackgroundSchedulePool::TaskInfo::execute()+0xef) [0x67744bf]
12. /usr/bin/clickhouse-server(DB::BackgroundSchedulePool::threadFunction()+0xba) [0x677528a]
13. /usr/bin/clickhouse-server() [0x6775469]
14. /usr/bin/clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x1e2) [0x6f1d4b2]
15. /usr/bin/clickhouse-server() [0xacbfecf]
16. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f7e56a646db]
17. /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f7e55fe388f]
'
we are seeing a similar message. clickhouse 19.3.4
The error msg should atleast output the table/kafka topic this relates to, so we know where to even begin debugging.
2019.02.19 19:22:05.594152 [ 31 ] {} <Error> void DB::StorageKafka::streamThread(): Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected { before: \0: (at row 2)
, Stack trace:
0. /usr/bin/clickhouse-server(StackTrace::StackTrace()+0x16) [0x6f13346]
1. /usr/bin/clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x22) [0x3399d82]
2. /usr/bin/clickhouse-server(DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&)+0x19f) [0x6f39cef]
3. /usr/bin/clickhouse-server(DB::JSONEachRowRowInputStream::readJSONObject(std::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&)+0x5a) [0x66e064a]
4. /usr/bin/clickhouse-server(DB::JSONEachRowRowInputStream::read(std::vector<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COWPtr<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&)+0x13d) [0x66e1afd]
5. /usr/bin/clickhouse-server(DB::BlockInputStreamFromRowInputStream::readImpl()+0x15c) [0x69cae5c]
6. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x25a) [0x61a510a]
7. /usr/bin/clickhouse-server(DB::KafkaBlockInputStream::readImpl()+0x28) [0x6ee8168]
8. /usr/bin/clickhouse-server(DB::IBlockInputStream::read()+0x25a) [0x61a510a]
9. /usr/bin/clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x77) [0x61c1c37]
10. /usr/bin/clickhouse-server(DB::StorageKafka::streamToViews()+0x627) [0x6eded27]
11. /usr/bin/clickhouse-server(DB::StorageKafka::streamThread()+0x1a8) [0x6edf388]
12. /usr/bin/clickhouse-server(DB::BackgroundSchedulePool::TaskInfo::execute()+0xef) [0x67744bf]
13. /usr/bin/clickhouse-server(DB::BackgroundSchedulePool::threadFunction()+0xba) [0x677528a]
14. /usr/bin/clickhouse-server() [0x6775469]
15. /usr/bin/clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x1e2) [0x6f1d4b2]
16. /usr/bin/clickhouse-server() [0xacbfecf]
17. /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f36ff2c36db]
18. /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f36fe84288f]
The error msg should atleast output the table/kafka topic this relates to, so we know where to even begin debugging.
100% agree that for any debugging print of Topic or Table name is really missing here.
Checked with release 19.3.5 after above referenced #4431 was closed... The problem in this case remains.
Checked with release 19.3.5 after above referenced #4431 was closed... The problem in this case remains.
I confirm.
this issue has rendered clickhouse unusable for anyone using kafka. cannot upgrade to 19.x till this is fixed
Adding option kafka_row_delimiter = '\n' to Kafka-engine definition fixs this problem.
We are trying to break problem down... This is what we know at the moment:
How we are using kafka-clickhouse combination in 18.16.1
The message sent to kafka is:
-CSV message delimited with "," - around 20 fields in total
-CSV fields are plain - no quotes for string fields : NOT "stringdata1","stringdata2",8000 just "stringdata1,stringdata2,8000"
-every message has only 1 row, that is ending with \n
we have ddl as:
create table if not exists test5pba.testeventdata_queueraw
(
tenantid Int64,
eventtime Int64,
field3 String,
field4 String,
...
...
...
field20 String
)
engine = Kafka('server1:9092,server2:9092', 'topicname', 'chraw', 'CSV', '');
What I see as interesting is that we had to send with \n and define kafka_row_delimiter as '' - EMPTY ... And Clickhouse was correctly processing all records
Moving forward to ClickHouse 19 we tried to use same configuration as in 18.16.1, just using clickhouse 19 docker image - no changes in metadata, tables not redefined, just plain change of version of clickhouse-server version in docker-compose file
We got that error message
2019.02.19 00:58:12.444114 [ 33 ] {}
Could not print diagnostic info because two last rows aren't in buffer (rare case)
, Stack trace:
...
We also tried some ideas (unsuccesful):
-change from old to new kafka synthax
before:
engine = Kafka('server1:9092,server2:9092', 'topicname', 'chraw', 'CSV', '');
after change:
engine = Kafka SETTINGS kafka_broker_list = 'server1:9092,server2:9092'
kafka_topic_list = 'topicname',
kafka_group_name = 'chraw',
kafka_format = 'CSV',
kafka_num_consumers = 1
kafka_row_delimiter = '\0'
kafka_skip_broken_messages = 1
we also tried:
-removing line with "kafka_row_delimiter = '\n'"
-setting "kafka_row_delimiter = '\n'"
and using original kafka messages as in 18.
And this is successfull in CH 19.3.6:
engine = Kafka SETTINGS kafka_broker_list = 'server1:9092,server2:9092'
kafka_topic_list = 'topicname',
kafka_group_name = 'chraw',
kafka_format = 'CSV',
kafka_num_consumers = 1
kafka_row_delimiter = '\n'
kafka_skip_broken_messages = 1
and changing format of messages sent to kafka (we HAD TO REMOVE LINE ENDING \n in each kafka message). They are still one-liners - no batch in single kafka message.
To summarize:
in old Clickhouse 18 we had to send message to kafka message WITH endofline \n (although defined in kafka old format setting as '')
in Clickhouse 19 we had to send to kafka message WITHOUT endofline (and define in KAFKA SETTING AS as kafka_row_delimiter '\n')
@vladimir77 @pdeva @sobolevsv could you help to narrow it down - your kafka message type (csv/json/json.., single line/batche, using EndOfLine), DML definition,...
we use json. not sure about line endings
On CH 18.16.1 the next config works fine: ENGINE = Kafka SETTINGS kafka_broker_list = '', kafka_topic_list = 'read-events', kafka_group_name = '', kafka_format = 'JSONEachRow'.
It stops working on CH 19.3.5, got error 'Cannot parse input: expected { before: \0'.
Addding kafka_row_delimiter = '\n' fixs the problem.
_kafka_broker_list_ and _kafka_group_name_ are the empty string and defined in custom config file _config.d/kafka.xml_.
kafka_row_delimiter = '\n' indeed fixes the issue.
Although, adding \n in the end of every message instead of kafka_row_delimiter = '\n' does not help.
Should be fixed since 19.5.2.6
Most helpful comment
Adding option
kafka_row_delimiter = '\n'to Kafka-engine definition fixs this problem.