Based on the document on join and global join, clickhouse seems to cache the right table in memory and then perform join.
I am running clickhouse - 19.4.1.3 in docker with 13 GB memory. I hit a memory limit for this query
select *
from small_table
JOIN large_table
using nid
INTO OUTFILE 'out.csv' FORMAT CSVWithNames;
But I am able to complete it if I switch the table
select *
from large_table
JOIN small_table
using nid
INTO OUTFILE 'out.csv' FORMAT CSVWithNames;
The result is written into file on the fly. I have no issues if I run either query in sqlite. Is it possible to internally switch the right/left tables ?
I have no issues if I run either query in sqlite.
sqlite is OLTP database and able to use NestedLoop (and index access)
CH is OLAP database and always uses HashJoin (and has no index access)
So CH always puts the right table into memory.
Is it possible to internally switch the right/left tables
No. CH always puts the right table into memory. Almost all queries look like from HUGETABLE left outer join SMALLTABLE
IMHO: CH does not have an cost-based-optimizer and does not need one because HUGETABLE is the left by nature.
I have no ideas on database structures, but thank you very much !
Most helpful comment
sqlite is OLTP database and able to use NestedLoop (and index access)
CH is OLAP database and always uses HashJoin (and has no index access)
So CH always puts the right table into memory.
No. CH always puts the right table into memory. Almost all queries look like
from HUGETABLE left outer join SMALLTABLEIMHO: CH does not have an cost-based-optimizer and does not need one because HUGETABLE is the left by nature.