I'm trying to work around #11482 by having a nicely sorted primary key that can be read forward.
The goal is to be able to query "last X things" efficiently.
My schema looks like this:
CREATE TABLE bamboozle (
timestamp DateTime,
traceID String,
service LowCardinality(String)
) ENGINE MergeTree()
PARTITION BY toDate(timestamp)
ORDER BY (service, -toUnixTimestamp(timestamp))
Then I insert some data from an existing table:
insert into bamboozle (timestamp, traceID, service)
select timestamp, traceID, service from jaeger_index_snapshot
At first it works ok:
:) select traceID from bamboozle order by service, -toUnixTimestamp(timestamp) limit 10
select traceID
from bamboozle
order by service, -toUnixTimestamp(timestamp)
limit 10
ββtraceIDβββββββββββββββββββββββββββ
β 261f98a0d951b359c6aa9b24fe727202 β
β ab466ff17e0625d2042813b2f7cfb5cf β
β 43cd20538f502512200e30ab88db0b16 β
β b1569ea5d34ec77fa560d917a34ba5d8 β
β 21312c6bfdc1c851ebf16946128a39c7 β
β 23a74f78d0c488c435906e731d2b9220 β
β 98e456ed535b2cf9611d6c8005e09bf3 β
β d80223dd1de2acaee82d9d39db7b9fde β
β aa3471d4b141e9ca52bb23fac8046908 β
β 2048af53bfd4352410d70edde5415ece β
ββββββββββββββββββββββββββββββββββββ
:) select traceID from bamboozle order by service, -toUnixTimestamp(timestamp) limit 10
select traceID
from bamboozle
order by service, -toUnixTimestamp(timestamp)
limit 10
ββtraceIDβββββββββββββββββββββββββββ
β 261f98a0d951b359c6aa9b24fe727202 β
ββββββββββββββββββββββββββββββββββββ
ββtraceIDβββββββββββββββββββββββββββ
β acec67327efbe0c4ab11ac25c73568f3 β
β 101c9d8d4835593c3e5802acd27cfdac β
β 79d5f8339075e1ea55839b19d8598d85 β
β 4d797cbb1f5abb77105290a0051ddc7f β
β 47676c62de62a6140d38f834c4d5b938 β
β 1a660e52f3529e8a1d022112290fe643 β
β 0920be5617252aba563f252e3a4b4b5d β
β a6db4f497644495e51da8dad4eb43dca β
β 1faa3b6febbcf746059c3f1c93f176fb β
ββββββββββββββββββββββββββββββββββββ
But pretty quickly it breaks down:
:) select traceID from bamboozle order by service, -toUnixTimestamp(timestamp) limit 10
select traceID
from bamboozle
order by service, -toUnixTimestamp(timestamp)
limit 10
Query:
select traceID from bamboozle order by service, -toUnixTimestamp(timestamp) limit 10
Received exception from server:
Code: 171, e.displayText() = DB::Exception: Block structure mismatch in QueryPipeline stream: different number of columns:
timestamp DateTime UInt32(size = 0), traceID String String(size = 0), service LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1)))
timestamp DateTime UInt32(size = 0), traceID String String(size = 0), service LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1))), negate(toUnixTimestamp(timestamp)) Int64 Int64(size = 0) (version 20.4.4.18)
This is on 20.4.4.18, as you can see. On 20.1.4.14 I see:
Code: 15. DB::Exception: Received from 2400:cb00:36:1053::ac45:1815:9000. DB::Exception: Column 'negate(toUnixTimestamp(timestamp))' already exists.
Maybe there's a better way to achieve what I want, I'm totally open to that.
This bug was fixed in #10715. Will be available in 20.3.12, 20.4.6 and 20.1.16.
@CurtizJ, I'm still having trouble querying the table. My goal is to read the data in index order, so my query looks like this:
select traceID
from jaeger_index_v2
where service = 'edgeworker'
and -toUnixTimestamp(timestamp) <= -toUnixTimestamp(now() - 60)
order by service, -toUnixTimestamp(timestamp)
limit 10
It looks kind of awkward, but it's an equivalent of the following query:
select traceID
from jaeger_index_v2
where service = 'edgeworker'
and timestamp >= now() - 60
order by service, -toUnixTimestamp(timestamp)
limit 10
ClickHouse cannot figure out that it's ok to do the second one in index order, that's why I have to resort to the awkward first variant. The problem is that the awkward option errors out:
Received exception from server:
Code: 10, e.displayText() = DB::Exception: Not found column timestamp in block. There are only columns: traceID, service, negate(toUnixTimestamp(timestamp)), lessOrEquals(negate(toUnixTimestamp(timestamp)), negate(toUnixTimestamp(minus(now(), 60)))) (version 20.6.1.1)
The problem seems to only exist when ingestion is happening. After I stop the ingestion and optimize the table with final, the queries start working again.
I've rechecked v20.6.1.4050-testing and the issue seems to be gone now π