Hi, how can I query something like this (like in MySQL) with 2 MergeTree tables
SELECT
TABLE_A.col1_from_a,
(SELECT col1_from_b FROM TABLE_B WHERE col2_from_b = TABLE_A.col2_from_a LIMIT 1) AS some_alias
FROM TABLE_A
WHERE <some filter from TABLE_A>
It queries a field from TABLE_B using a field from TABLE_A.
I tried using JOIN like this
SELECT
TABLE_A.col1_from_a,
TABLE_B.col1_from_b
FROM TABLE_A
ANY LEFT JOIN TABLE_B ON TABLE_A.col2_from_a = TABLE_B.col2_from_b
WHERE <some filter from TABLE_A>
but it showed processing all the rows from TABLE_B (tens of millions of rows).
col2_from_a is the sorting key of TABLE_A, col2_from_b is the sorting key of TABLE_B.
Any suggestion would be appreciated.
Hi, how can I query something like this
CH does not support correlated subqueries.
I tried using JOIN like this
but it showed processing all the rows from TABLE_B (tens of millions of rows).
It's actually the best way (HashJoin) to solve this on BigData.
col2_from_a is the sorting key of TABLE_A, col2_from_b
HashJoin does not use sorting keys
Try
SELECT TABLE_A.col1_from_a, TABLE_B.col1_from_b
FROM TABLE_A
ANY LEFT JOIN (select col2_from_b TABLE_B where col2_from_b in
(select col2_from_a from TABLE_A where <some filter from TABLE_A>))
ON TABLE_A.col2_from_a = TABLE_B.col2_from_b
WHERE <some filter from TABLE_A>
Or you can use external dictionary with source=TABLE_B.
Though such dictionary (tens of millions of rows) could use 5-100GB RAM.
@den-crane Thank you so much for your useful information.
Actually I thought about the query you suggested, but I'm trying to create a View which <some filter from TABLE_A> is coming later.
6697
So I think I should close this issue here. Thank you.