Clickhouse: One Way to Bypass Memory Limit for Join ?

Created on 22 Mar 2019 · 2Comments · Source: ClickHouse/ClickHouse

Based on the document on join and global join, clickhouse seems to cache the right table in memory and then perform join.

I am running clickhouse - 19.4.1.3 in docker with 13 GB memory. I hit a memory limit for this query

select *
from small_table
JOIN large_table
using nid
INTO OUTFILE 'out.csv' FORMAT CSVWithNames;

But I am able to complete it if I switch the table

select *
from large_table 
JOIN small_table
using nid
INTO OUTFILE 'out.csv' FORMAT CSVWithNames;

The result is written into file on the fly. I have no issues if I run either query in sqlite. Is it possible to internally switch the right/left tables ?

question

Source

yu45020

Most helpful comment

I have no issues if I run either query in sqlite.

sqlite is OLTP database and able to use NestedLoop (and index access)
CH is OLAP database and always uses HashJoin (and has no index access)
So CH always puts the right table into memory.

Is it possible to internally switch the right/left tables

No. CH always puts the right table into memory. Almost all queries look like from HUGETABLE left outer join SMALLTABLE
IMHO: CH does not have an cost-based-optimizer and does not need one because HUGETABLE is the left by nature.

den-crane on 22 Mar 2019

👍3

All 2 comments

I have no issues if I run either query in sqlite.

sqlite is OLTP database and able to use NestedLoop (and index access)
CH is OLAP database and always uses HashJoin (and has no index access)
So CH always puts the right table into memory.