Clickhouse: Access outer fields in subquery

Created on 21 Nov 2019 · 4Comments · Source: ClickHouse/ClickHouse

Hi, how can I query something like this (like in MySQL) with 2 MergeTree tables
SELECT TABLE_A.col1_from_a, (SELECT col1_from_b FROM TABLE_B WHERE col2_from_b = TABLE_A.col2_from_a LIMIT 1) AS some_alias FROM TABLE_A WHERE <some filter from TABLE_A>

It queries a field from TABLE_B using a field from TABLE_A.
I tried using JOIN like this
SELECT TABLE_A.col1_from_a, TABLE_B.col1_from_b FROM TABLE_A ANY LEFT JOIN TABLE_B ON TABLE_A.col2_from_a = TABLE_B.col2_from_b WHERE <some filter from TABLE_A>
but it showed processing all the rows from TABLE_B (tens of millions of rows).
col2_from_a is the sorting key of TABLE_A, col2_from_b is the sorting key of TABLE_B.
Any suggestion would be appreciated.

question

Source

qza1800

All 4 comments

Hi, how can I query something like this

CH does not support correlated subqueries.

I tried using JOIN like this
but it showed processing all the rows from TABLE_B (tens of millions of rows).

It's actually the best way (HashJoin) to solve this on BigData.

col2_from_a is the sorting key of TABLE_A, col2_from_b

HashJoin does not use sorting keys

Try

SELECT TABLE_A.col1_from_a, TABLE_B.col1_from_b 
   FROM TABLE_A 
   ANY LEFT JOIN (select col2_from_b TABLE_B where col2_from_b in 
      (select col2_from_a from TABLE_A where <some filter from TABLE_A>))
   ON TABLE_A.col2_from_a = TABLE_B.col2_from_b 
WHERE <some filter from TABLE_A>

Or you can use external dictionary with source=TABLE_B.
Though such dictionary (tens of millions of rows) could use 5-100GB RAM.

den-crane on 21 Nov 2019

❤1

6697

filimonov on 21 Nov 2019

👍1

@den-crane Thank you so much for your useful information.
Actually I thought about the query you suggested, but I'm trying to create a View which <some filter from TABLE_A> is coming later.

qza1800 on 25 Nov 2019