Clickhouse: 19.15.3.6 distributed+ReplicatedMergeTree table can not write to materialized columns

Created on 17 Oct 2019  路  11Comments  路  Source: ClickHouse/ClickHouse

I made a 3 machine clusters with latest 19.15.3.6 on Ubuntu 18.04 LTS

and made a distributed table like

create table rdlog
(
act String,
campaignId Int32,
... // omit columns
date String materialized formatDateTime(toDateTime(timestamp / 1000), '%y%m%d%H'),
eventTime DateTime materialized toDateTime(timestamp / 1000)
)
engine = ReplicatedMergeTree('/clickhouse/tables/{shard}/rdlog', '{replica}') PARTITION BY toYYYYMMDD(eventTime) ORDER BY (eventTime, traceId) SAMPLE BY traceId SETTINGS index_granularity = 8192;

and

create table rdlog_all
(
act String,
campaignId Int32,
... // omit columns
date String materialized formatDateTime(toDateTime(timestamp / 1000), '%y%m%d%H'),
eventTime DateTime materialized toDateTime(timestamp / 1000)
)
engine = Distributed(rd2_cluster, 'default', 'rdlog', rand());

when I launched the ingest process, there were tons of errors like

executeQuery: Code: 10, e.displayText() = DB::Exception: Not found column date in block

and you see the data column was a materialized one,

I checked the metadata in zk and found date column

And when I downgrade to 19.14.7.15 stable it worked again as expected.

hope to be fixed soon, thanks

bug prio-major v19.15

All 11 comments

Related #5429
/cc @azat

Maybe I missed something, but I cannot reproduce this, and there is a test for that change

@liaoxu can you describe how you INSERT your data?

Maybe I missed something, but I cannot reproduce this

Replicated?

Replicated?

Nope, reproduced with replicated, will take a look (also found another problem with that patch due to AddingDefaultBlockOutputStream)

backtrace for the problem

(gdb) bt
#0  0x00005633b059c29e in __cxa_throw ()
#1  0x00005633aba2da74 in DB::Block::getByName (this=this@entry=0x7f4c1bb17e30, name=...)
    at /usr/include/c++/9/ext/new_allocator.h:89
#2  0x00005633aec6f252 in DB::NativeBlockInputStream::readImpl (this=0x7f4c1bb17c10)
    at ../dbms/src/DataStreams/NativeBlockInputStream.cpp:159
#3  0x00005633aec61814 in DB::IBlockInputStream::read (this=0x7f4c1bb17c10)
    at ../dbms/src/DataStreams/IBlockInputStream.cpp:61
#4  0x00005633abd5624a in DB::TCPHandler::receiveData (this=0x7f4c1bba7000)
    at /usr/include/c++/9/bits/shared_ptr_base.h:1020
#5  0x00005633abd56ce3 in DB::TCPHandler::receivePacket (this=0x7f4c1bba7000)
    at ../dbms/programs/server/TCPHandler.cpp:855
#6  0x00005633abd56e2e in DB::TCPHandler::readDataNext (this=0x7f4c1bba7000,
    poll_interval=@0x7f3ea7c8a0a8: 10000000, receive_timeout=@0x7f3ea7c8a0a4: 300)
    at ../dbms/programs/server/TCPHandler.cpp:406
#7  0x00005633abd5733e in DB::TCPHandler::readData (this=0x7f4c1bba7000, connection_settings=...)
    at ../dbms/programs/server/TCPHandler.cpp:437
#8  0x00005633abd5756e in DB::TCPHandler::processInsertQuery (this=0x7f4c1bba7000,
    connection_settings=...) at ../dbms/programs/server/TCPHandler.cpp:463
#9  0x00005633abd58b35 in DB::TCPHandler::runImpl (this=0x7f4c1bba7000)
    at ../dbms/programs/server/TCPHandler.cpp:257
#10 0x00005633abd58d7c in DB::TCPHandler::run (this=0x7f4c1bba7000)
    at ../dbms/programs/server/TCPHandler.cpp:1223
#11 0x00005633af9875e0 in Poco::Net::TCPServerConnection::start (this=<optimized out>)
    at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#12 0x00005633af987cdd in Poco::Net::TCPServerDispatcher::run (this=0x7f4c9885e340)
    at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:114
#13 0x00005633afed4911 in Poco::PooledThread::run (this=0x7f45dfd53e80)
    at ../contrib/poco/Foundation/src/ThreadPool.cpp:214
#14 0x00005633afed288c in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>)
    at ../contrib/poco/Foundation/include/Poco/SharedPtr.h:380
#15 0x00005633b06188e0 in execute_native_thread_routine ()
#16 0x00007f4c98f38fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#17 0x00007f4c98e652ef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@filimonov can you please take a look at the following https://github.com/azat/ClickHouse/commit/ed4c1820a2129017012c26d57a2812d98edecd9c patch?

Maybe I missed something, but I cannot reproduce this, and there is a test for that change

@liaoxu can you describe how you INSERT your data?

that was not my case.
I made one *_all (distributed table) in machineA, to be a proxy for 3 ReplicatedMergeTree tables on machine[A-C], and only ReplicatedMergeTree on machineA(the same machine with "proxy") can be inserted correctly, the other 2 nodes emit "Not found column" errors.

only ReplicatedMergeTree on machineA(the same machine with "proxy") can be inserted correctly, the other 2 nodes emit "Not found column" errors.

Got it, thanks!
Attached patch should resolve the issue (but please wait before it will be merged first).

@filimonov can you please take a look at the following azat@ed4c182 patch?

Looks OK, but the test should be added.

It would be better to not materialize it at all on the Distributed side (it can be, for example, costly dictionary lookups), and there are quite a lot of theoretically possible corner cases (like what if underlying table have normal column, and on Distributed it's defined as MATERIALIZED or vice versa). But removing the column before passing forward should work / cover most common usecase.

Also - do we know why it has been working in 19.14 and which change introduce that regression?

More general: having materialized columns on Distributed side is confusing (i.e. they can't be materialized in Distributed, as Distributed doesn't store anything, and columns can be materialized only on underlying levels). Maybe less confusing would be to translate MATERIALIZED columns to something like READONLY column on Distributed side. But it sounds like more complicated task / feature.

Looks OK, but the test should be added.

The test is in #7377 , but it needs some improvements, will be done later.

It would be better to not materialize it at all on the Distributed side

Indeed.

Also - do we know why it has been working in 19.14 and which change introduce that regression?

Bisecting clickhouse is not a greatful thing to-do, but maybe @liaoxu can do this?

Also - do we know why it has been working in 19.14 and which change introduce that regression?

The following patches introduces this:

  • 959744fede85dd0a5b10b41d2addb2ad70a3a6c6
  • 7ddc8a6dded3ccb8f9007c543ee5780585767269

Cc: @KochetovNicolai

Was this page helpful?
0 / 5 - 0 ratings

Related issues

innerr picture innerr  路  3Comments

derekperkins picture derekperkins  路  3Comments

bseng picture bseng  路  3Comments

goranc picture goranc  路  3Comments

amonakhov picture amonakhov  路  3Comments