Clickhouse: Transformations in primary key confuse Clickhouse

Created on 9 Jun 2020  Β·  4Comments  Β·  Source: ClickHouse/ClickHouse

I'm trying to work around #11482 by having a nicely sorted primary key that can be read forward.

The goal is to be able to query "last X things" efficiently.

My schema looks like this:

CREATE TABLE bamboozle (
  timestamp DateTime,
  traceID String,
  service LowCardinality(String)
) ENGINE MergeTree()
PARTITION BY toDate(timestamp)
ORDER BY (service, -toUnixTimestamp(timestamp))

Then I insert some data from an existing table:

insert into bamboozle (timestamp, traceID, service)
select timestamp, traceID, service from jaeger_index_snapshot 

At first it works ok:

 :) select traceID from bamboozle order by service, -toUnixTimestamp(timestamp) limit 10

select traceID
  from bamboozle
 order by service, -toUnixTimestamp(timestamp)
 limit 10


β”Œβ”€traceID──────────────────────────┐
β”‚ 261f98a0d951b359c6aa9b24fe727202 β”‚
β”‚ ab466ff17e0625d2042813b2f7cfb5cf β”‚
β”‚ 43cd20538f502512200e30ab88db0b16 β”‚
β”‚ b1569ea5d34ec77fa560d917a34ba5d8 β”‚
β”‚ 21312c6bfdc1c851ebf16946128a39c7 β”‚
β”‚ 23a74f78d0c488c435906e731d2b9220 β”‚
β”‚ 98e456ed535b2cf9611d6c8005e09bf3 β”‚
β”‚ d80223dd1de2acaee82d9d39db7b9fde β”‚
β”‚ aa3471d4b141e9ca52bb23fac8046908 β”‚
β”‚ 2048af53bfd4352410d70edde5415ece β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 :) select traceID from bamboozle order by service, -toUnixTimestamp(timestamp) limit 10

select traceID
  from bamboozle
 order by service, -toUnixTimestamp(timestamp)
 limit 10


β”Œβ”€traceID──────────────────────────┐
β”‚ 261f98a0d951b359c6aa9b24fe727202 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€traceID──────────────────────────┐
β”‚ acec67327efbe0c4ab11ac25c73568f3 β”‚
β”‚ 101c9d8d4835593c3e5802acd27cfdac β”‚
β”‚ 79d5f8339075e1ea55839b19d8598d85 β”‚
β”‚ 4d797cbb1f5abb77105290a0051ddc7f β”‚
β”‚ 47676c62de62a6140d38f834c4d5b938 β”‚
β”‚ 1a660e52f3529e8a1d022112290fe643 β”‚
β”‚ 0920be5617252aba563f252e3a4b4b5d β”‚
β”‚ a6db4f497644495e51da8dad4eb43dca β”‚
β”‚ 1faa3b6febbcf746059c3f1c93f176fb β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

But pretty quickly it breaks down:

:) select traceID from bamboozle order by service, -toUnixTimestamp(timestamp) limit 10

select traceID
  from bamboozle
 order by service, -toUnixTimestamp(timestamp)
 limit 10


Query:
select traceID from bamboozle order by service, -toUnixTimestamp(timestamp) limit 10


Received exception from server:
Code: 171, e.displayText() = DB::Exception: Block structure mismatch in QueryPipeline stream: different number of columns:
timestamp DateTime UInt32(size = 0), traceID String String(size = 0), service LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1)))
timestamp DateTime UInt32(size = 0), traceID String String(size = 0), service LowCardinality(String) ColumnLowCardinality(size = 0, UInt8(size = 0), ColumnUnique(size = 1, String(size = 1))), negate(toUnixTimestamp(timestamp)) Int64 Int64(size = 0) (version 20.4.4.18)

This is on 20.4.4.18, as you can see. On 20.1.4.14 I see:

Code: 15. DB::Exception: Received from 2400:cb00:36:1053::ac45:1815:9000. DB::Exception: Column 'negate(toUnixTimestamp(timestamp))' already exists.
bug

All 4 comments

Maybe there's a better way to achieve what I want, I'm totally open to that.

This bug was fixed in #10715. Will be available in 20.3.12, 20.4.6 and 20.1.16.

@CurtizJ, I'm still having trouble querying the table. My goal is to read the data in index order, so my query looks like this:

select traceID
  from jaeger_index_v2
 where service = 'edgeworker'
   and -toUnixTimestamp(timestamp) <= -toUnixTimestamp(now() - 60)
 order by service, -toUnixTimestamp(timestamp)
 limit 10

It looks kind of awkward, but it's an equivalent of the following query:

select traceID
  from jaeger_index_v2
 where service = 'edgeworker'
   and timestamp >= now() - 60
 order by service, -toUnixTimestamp(timestamp)
 limit 10

ClickHouse cannot figure out that it's ok to do the second one in index order, that's why I have to resort to the awkward first variant. The problem is that the awkward option errors out:

Received exception from server:
Code: 10, e.displayText() = DB::Exception: Not found column timestamp in block. There are only columns: traceID, service, negate(toUnixTimestamp(timestamp)), lessOrEquals(negate(toUnixTimestamp(timestamp)), negate(toUnixTimestamp(minus(now(), 60)))) (version 20.6.1.1)

The problem seems to only exist when ingestion is happening. After I stop the ingestion and optimize the table with final, the queries start working again.

I've rechecked v20.6.1.4050-testing and the issue seems to be gone now πŸŽ‰

Was this page helpful?
0 / 5 - 0 ratings