How is Join implemented in ClickHouse ?
There is a Join.cpp file under interpreters directory, but it looks like all the join operator implementation details is just insert data into blocks. Where is the major algorithmic procedure for join there ?
Best Regards,
Hao
Right side of JOIN is collected into hash table in memory.
Then while reading from left side of JOIN we'll do lookups in hash table and find rows to be joined.
Normally in RDBMS optimizers take care of the JOIN order (Left, Right). In the case of ClickHouse, tables with low selectivity at the right side of the JOIN should be preferred and performance will be significantly better.
Most helpful comment
https://github.com/yandex/ClickHouse/blob/e1271ae1f2a9bfb635ef83b23e48e9791195cce3/dbms/src/Interpreters/Join.h#L65-L120