I made a 3 machine clusters with latest 19.15.3.6 on Ubuntu 18.04 LTS
and made a distributed table like
create table rdlog
(
act String,
campaignId Int32,
... // omit columns
date String materialized formatDateTime(toDateTime(timestamp / 1000), '%y%m%d%H'),
eventTime DateTime materialized toDateTime(timestamp / 1000)
)
engine = ReplicatedMergeTree('/clickhouse/tables/{shard}/rdlog', '{replica}') PARTITION BY toYYYYMMDD(eventTime) ORDER BY (eventTime, traceId) SAMPLE BY traceId SETTINGS index_granularity = 8192;
and
create table rdlog_all
(
act String,
campaignId Int32,
... // omit columns
date String materialized formatDateTime(toDateTime(timestamp / 1000), '%y%m%d%H'),
eventTime DateTime materialized toDateTime(timestamp / 1000)
)
engine = Distributed(rd2_cluster, 'default', 'rdlog', rand());
when I launched the ingest process, there were tons of errors like
and you see the data column was a materialized one,
I checked the metadata in zk and found date column
And when I downgrade to 19.14.7.15 stable it worked again as expected.
hope to be fixed soon, thanks
Related #5429
/cc @azat
Maybe I missed something, but I cannot reproduce this, and there is a test for that change
@liaoxu can you describe how you INSERT your data?
Maybe I missed something, but I cannot reproduce this
Replicated?
Replicated?
Nope, reproduced with replicated, will take a look (also found another problem with that patch due to AddingDefaultBlockOutputStream)
backtrace for the problem
(gdb) bt
#0 0x00005633b059c29e in __cxa_throw ()
#1 0x00005633aba2da74 in DB::Block::getByName (this=this@entry=0x7f4c1bb17e30, name=...)
at /usr/include/c++/9/ext/new_allocator.h:89
#2 0x00005633aec6f252 in DB::NativeBlockInputStream::readImpl (this=0x7f4c1bb17c10)
at ../dbms/src/DataStreams/NativeBlockInputStream.cpp:159
#3 0x00005633aec61814 in DB::IBlockInputStream::read (this=0x7f4c1bb17c10)
at ../dbms/src/DataStreams/IBlockInputStream.cpp:61
#4 0x00005633abd5624a in DB::TCPHandler::receiveData (this=0x7f4c1bba7000)
at /usr/include/c++/9/bits/shared_ptr_base.h:1020
#5 0x00005633abd56ce3 in DB::TCPHandler::receivePacket (this=0x7f4c1bba7000)
at ../dbms/programs/server/TCPHandler.cpp:855
#6 0x00005633abd56e2e in DB::TCPHandler::readDataNext (this=0x7f4c1bba7000,
poll_interval=@0x7f3ea7c8a0a8: 10000000, receive_timeout=@0x7f3ea7c8a0a4: 300)
at ../dbms/programs/server/TCPHandler.cpp:406
#7 0x00005633abd5733e in DB::TCPHandler::readData (this=0x7f4c1bba7000, connection_settings=...)
at ../dbms/programs/server/TCPHandler.cpp:437
#8 0x00005633abd5756e in DB::TCPHandler::processInsertQuery (this=0x7f4c1bba7000,
connection_settings=...) at ../dbms/programs/server/TCPHandler.cpp:463
#9 0x00005633abd58b35 in DB::TCPHandler::runImpl (this=0x7f4c1bba7000)
at ../dbms/programs/server/TCPHandler.cpp:257
#10 0x00005633abd58d7c in DB::TCPHandler::run (this=0x7f4c1bba7000)
at ../dbms/programs/server/TCPHandler.cpp:1223
#11 0x00005633af9875e0 in Poco::Net::TCPServerConnection::start (this=<optimized out>)
at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#12 0x00005633af987cdd in Poco::Net::TCPServerDispatcher::run (this=0x7f4c9885e340)
at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:114
#13 0x00005633afed4911 in Poco::PooledThread::run (this=0x7f45dfd53e80)
at ../contrib/poco/Foundation/src/ThreadPool.cpp:214
#14 0x00005633afed288c in Poco::ThreadImpl::runnableEntry (pThread=<optimized out>)
at ../contrib/poco/Foundation/include/Poco/SharedPtr.h:380
#15 0x00005633b06188e0 in execute_native_thread_routine ()
#16 0x00007f4c98f38fb7 in start_thread (arg=<optimized out>) at pthread_create.c:486
#17 0x00007f4c98e652ef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@filimonov can you please take a look at the following https://github.com/azat/ClickHouse/commit/ed4c1820a2129017012c26d57a2812d98edecd9c patch?
Maybe I missed something, but I cannot reproduce this, and there is a test for that change
@liaoxu can you describe how you INSERT your data?
that was not my case.
I made one *_all (distributed table) in machineA, to be a proxy for 3 ReplicatedMergeTree tables on machine[A-C], and only ReplicatedMergeTree on machineA(the same machine with "proxy") can be inserted correctly, the other 2 nodes emit "Not found column" errors.
only ReplicatedMergeTree on machineA(the same machine with "proxy") can be inserted correctly, the other 2 nodes emit "Not found column" errors.
Got it, thanks!
Attached patch should resolve the issue (but please wait before it will be merged first).
@filimonov can you please take a look at the following azat@ed4c182 patch?
Looks OK, but the test should be added.
It would be better to not materialize it at all on the Distributed side (it can be, for example, costly dictionary lookups), and there are quite a lot of theoretically possible corner cases (like what if underlying table have normal column, and on Distributed it's defined as MATERIALIZED or vice versa). But removing the column before passing forward should work / cover most common usecase.
Also - do we know why it has been working in 19.14 and which change introduce that regression?
More general: having materialized columns on Distributed side is confusing (i.e. they can't be materialized in Distributed, as Distributed doesn't store anything, and columns can be materialized only on underlying levels). Maybe less confusing would be to translate MATERIALIZED columns to something like READONLY column on Distributed side. But it sounds like more complicated task / feature.
Looks OK, but the test should be added.
The test is in #7377 , but it needs some improvements, will be done later.
It would be better to not materialize it at all on the Distributed side
Indeed.
Also - do we know why it has been working in 19.14 and which change introduce that regression?
Bisecting clickhouse is not a greatful thing to-do, but maybe @liaoxu can do this?
Also - do we know why it has been working in 19.14 and which change introduce that regression?
The following patches introduces this:
Cc: @KochetovNicolai