Clickhouse: When I need to join twice, but Doing joins twice seems inefficient, is there another way?

Created on 4 Dec 2019  Β·  4Comments  Β·  Source: ClickHouse/ClickHouse

I'm sorry. before my question seems was not good. so I was update.

In short, I want to put context information(using mapping) in each src ip(key), dst ip(key). If some IP does not have any context, I'll put 'etc' instead of.

This mean I thought, First I have to join with src_ip and mapping list, and try one more to join dst_ip and mapping list.

I can join twice, but i jus I want to find a better way.

I just created a temporary table because I couldn't use my own table.

so I have two table. One is main table that have a lot of columns, and data continues to accumulate in real time. In here I make temporary table like below for asking. Use this default.test_src_dst table instead of the main table.

:) select src, dst from default.test_src_dst

SELECT
    src,
    dst
FROM default.test_src_dst

β”Œβ”€src─────┬─dst─────┐
β”‚ 1.1.1.1 β”‚ 4.4.4.4 β”‚
β”‚ 2.2.2.2 β”‚ 5.5.5.5 β”‚
β”‚ 3.3.3.3 β”‚ 6.6.6.6 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3 rows in set. Elapsed: 0.002 sec.

And, Create a batch of statistics tables for mapping and join the two tables. The table is default.test_mapping

:) select ip_list, context_list from test_mapping

SELECT
    ip_list,
    context_list
FROM test_mapping

β”Œβ”€ip_list─┬─context_list─┐
β”‚ 1.1.1.1 β”‚ client       β”‚
β”‚ 2.2.2.2 β”‚ server       β”‚
β”‚ 3.3.3.3 β”‚ client       β”‚
β”‚ 4.4.4.4 β”‚ server       β”‚
β”‚ 5.5.5.5 β”‚ client       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5 rows in set. Elapsed: 0.003 sec.

The alternatives I tried :

  • method1
:) select a.src AS src, a.dst as dst, if(b.context_list=='', 'etc', b.context_list) as src_context, if(c.context_list=='', 'etc', c.context_list) as dst_context from test_src_dst a  any left join test_mapping b on a.src=b.ip_list any left join test_mapping c on a.dst=c.ip_list

SELECT
    a.src AS src,
    a.dst AS dst,
    if(b.context_list = '', 'etc', b.context_list) AS src_context,
    if(c.context_list = '', 'etc', c.context_list) AS dst_context
FROM test_src_dst AS a
ANY LEFT JOIN test_mapping AS b ON a.src = b.ip_list
ANY LEFT JOIN test_mapping AS c ON a.dst = c.ip_list

β”Œβ”€src─────┬─dst─────┬─src_context─┬─dst_context─┐
β”‚ 1.1.1.1 β”‚ 4.4.4.4 β”‚ client      β”‚ server      β”‚
β”‚ 2.2.2.2 β”‚ 5.5.5.5 β”‚ server      β”‚ client      β”‚
β”‚ 3.3.3.3 β”‚ 6.6.6.6 β”‚ client      β”‚ etc         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3 rows in set. Elapsed: 0.008 sec.
  • method2
:) with     if(context_list=='', 'etc', context_list) as src_context       ,  if(c.context_list=='', 'etc', c.context_list) as dst_context   select src, dst, src_context, dst_context from ( select * from test_src_dst a  any left join test_mapping b on a.src=b.ip_list ) d any left join test_mapping c on d.dst=c.ip_list

WITH
    if(context_list = '', 'etc', context_list) AS src_context,
    if(c.context_list = '', 'etc', c.context_list) AS dst_context
SELECT
    src,
    dst,
    src_context,
    dst_context
FROM
(
    SELECT *
    FROM test_src_dst AS a
    ANY LEFT JOIN test_mapping AS b ON a.src = b.ip_list
) AS d
ANY LEFT JOIN test_mapping AS c ON d.dst = c.ip_list

β”Œβ”€src─────┬─dst─────┬─src_context─┬─dst_context─┐
β”‚ 1.1.1.1 β”‚ 4.4.4.4 β”‚ client      β”‚ server      β”‚
β”‚ 2.2.2.2 β”‚ 5.5.5.5 β”‚ server      β”‚ client      β”‚
β”‚ 3.3.3.3 β”‚ 6.6.6.6 β”‚ client      β”‚ etc         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3 rows in set. Elapsed: 0.011 sec.
  • method3
:) with     (     select groupArray(ip_list), groupArray(context_list)      from test_mapping     ) as a     , a.1 as ip_arr     , a.2 as context_arr     , context_arr[indexOf(ip_arr, src)] as _src_context     , context_arr[indexOf(ip_arr, dst)] as _dst_context     , if(_src_context='', 'etc', _src_context) as src_context     , if(_dst_context='', 'etc', _dst_context) as dst_context select src, dst, src_context, dst_context from test_src_dst

WITH

    (
        SELECT
            groupArray(ip_list),
            groupArray(context_list)
        FROM test_mapping
    ) AS a,
    a.1 AS ip_arr,
    a.2 AS context_arr,
    context_arr[indexOf(ip_arr, src)] AS _src_context,
    context_arr[indexOf(ip_arr, dst)] AS _dst_context,
    if(_src_context = '', 'etc', _src_context) AS src_context,
    if(_dst_context = '', 'etc', _dst_context) AS dst_context
SELECT
    src,
    dst,
    src_context,
    dst_context
FROM test_src_dst

β”Œβ”€src─────┬─dst─────┬─src_context─┬─dst_context─┐
β”‚ 1.1.1.1 β”‚ 4.4.4.4 β”‚ client      β”‚ server      β”‚
β”‚ 2.2.2.2 β”‚ 5.5.5.5 β”‚ server      β”‚ client      β”‚
β”‚ 3.3.3.3 β”‚ 6.6.6.6 β”‚ client      β”‚ etc         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3 rows in set. Elapsed: 0.010 sec.

I like method3, but It's not work my real table, as i mention before my table so big, so It did not work above format. I just want to make query more compact, Because the original query is also very complex.
If I use a real table (has columns a lot), only method 2 will produce the right result.


When I try to join with query (method3) I got like this error

Code: 179, e.displayText() = DB::Exception: Different expressions with the same alias make_context:
(CAST(['server', 'server', 'client', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'client', 'server', 'server', 'client', 'client', 'client', 'server', 'client', 'server', 'server', 'client', 'client', 'server', 'client', 'server'], 'Array(String)'), CAST([{some IP array}], 'Array(String)')) AS make_context
and
(CAST(['server', 'server', 'client', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'server', 'server', 'server', 'server', 'client', 'server', 'server', 'server', 'client', 'server', 'server', 'client', 'client', 'client', 'server', 'client', 'server', 'server', 'client', 'client', 'server', 'client', 'server'], 'Array(String)'), CAST([some IP array], 'Array(String)')) AS make_context
(version 19.11.2.7 (official build))

question st-need-info

All 4 comments

There is no other way with this definition of the task.
I believe you don't need join at all. Use a wide table or external dictionaries.

I'm sorry. It seems my question was not good.
I just created a temporary table because I couldn't use my own table.

In short, I want to put context information in each src ip(key), dst ip(key). If some IP does not have any context, I'll put 'etc' instead of.

This mean I thought, First I have to join with src_ip and mapping list, and try one more to join dst_ip and mapping list.

I can join twice, but i jus I want to find a better way.

so I have two table. One is main table that have a lot of columns, and data continues to accumulate in real time. In here I make temporary table like below for asking. Use this default.test_src_dst table instead of the main table.

:) select src, dst from default.test_src_dst

SELECT
    src,
    dst
FROM default.test_src_dst

β”Œβ”€src─────┬─dst─────┐
β”‚ 1.1.1.1 β”‚ 4.4.4.4 β”‚
β”‚ 2.2.2.2 β”‚ 5.5.5.5 β”‚
β”‚ 3.3.3.3 β”‚ 6.6.6.6 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3 rows in set. Elapsed: 0.002 sec.

And, Create a batch of statistics tables for mapping and join the two tables. The table is default.test_mapping

:) select ip_list, context_list from test_mapping

SELECT
    ip_list,
    context_list
FROM test_mapping

β”Œβ”€ip_list─┬─context_list─┐
β”‚ 1.1.1.1 β”‚ client       β”‚
β”‚ 2.2.2.2 β”‚ server       β”‚
β”‚ 3.3.3.3 β”‚ client       β”‚
β”‚ 4.4.4.4 β”‚ server       β”‚
β”‚ 5.5.5.5 β”‚ client       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5 rows in set. Elapsed: 0.003 sec.

And I tried like this

method1

:) select a.src AS src, a.dst as dst, if(b.context_list=='', 'etc', b.context_list) as src_context, if(c.context_list=='', 'etc', c.context_list) as dst_context from test_src_dst a  any left join test_mapping b on a.src=b.ip_list any left join test_mapping c on a.dst=c.ip_list

SELECT
    a.src AS src,
    a.dst AS dst,
    if(b.context_list = '', 'etc', b.context_list) AS src_context,
    if(c.context_list = '', 'etc', c.context_list) AS dst_context
FROM test_src_dst AS a
ANY LEFT JOIN test_mapping AS b ON a.src = b.ip_list
ANY LEFT JOIN test_mapping AS c ON a.dst = c.ip_list

β”Œβ”€src─────┬─dst─────┬─src_context─┬─dst_context─┐
β”‚ 1.1.1.1 β”‚ 4.4.4.4 β”‚ client      β”‚ server      β”‚
β”‚ 2.2.2.2 β”‚ 5.5.5.5 β”‚ server      β”‚ client      β”‚
β”‚ 3.3.3.3 β”‚ 6.6.6.6 β”‚ client      β”‚ etc         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3 rows in set. Elapsed: 0.008 sec.

method2

:) with     if(context_list=='', 'etc', context_list) as src_context       ,  if(c.context_list=='', 'etc', c.context_list) as dst_context   select src, dst, src_context, dst_context from ( select * from test_src_dst a  any left join test_mapping b on a.src=b.ip_list ) d any left join test_mapping c on d.dst=c.ip_list

WITH
    if(context_list = '', 'etc', context_list) AS src_context,
    if(c.context_list = '', 'etc', c.context_list) AS dst_context
SELECT
    src,
    dst,
    src_context,
    dst_context
FROM
(
    SELECT *
    FROM test_src_dst AS a
    ANY LEFT JOIN test_mapping AS b ON a.src = b.ip_list
) AS d
ANY LEFT JOIN test_mapping AS c ON d.dst = c.ip_list

β”Œβ”€src─────┬─dst─────┬─src_context─┬─dst_context─┐
β”‚ 1.1.1.1 β”‚ 4.4.4.4 β”‚ client      β”‚ server      β”‚
β”‚ 2.2.2.2 β”‚ 5.5.5.5 β”‚ server      β”‚ client      β”‚
β”‚ 3.3.3.3 β”‚ 6.6.6.6 β”‚ client      β”‚ etc         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3 rows in set. Elapsed: 0.011 sec.

method3

:) with     (     select groupArray(ip_list), groupArray(context_list)      from test_mapping     ) as a     , a.1 as ip_arr     , a.2 as context_arr     , context_arr[indexOf(ip_arr, src)] as _src_context     , context_arr[indexOf(ip_arr, dst)] as _dst_context     , if(_src_context='', 'etc', _src_context) as src_context     , if(_dst_context='', 'etc', _dst_context) as dst_context select src, dst, src_context, dst_context from test_src_dst

WITH

    (
        SELECT
            groupArray(ip_list),
            groupArray(context_list)
        FROM test_mapping
    ) AS a,
    a.1 AS ip_arr,
    a.2 AS context_arr,
    context_arr[indexOf(ip_arr, src)] AS _src_context,
    context_arr[indexOf(ip_arr, dst)] AS _dst_context,
    if(_src_context = '', 'etc', _src_context) AS src_context,
    if(_dst_context = '', 'etc', _dst_context) AS dst_context
SELECT
    src,
    dst,
    src_context,
    dst_context
FROM test_src_dst

β”Œβ”€src─────┬─dst─────┬─src_context─┬─dst_context─┐
β”‚ 1.1.1.1 β”‚ 4.4.4.4 β”‚ client      β”‚ server      β”‚
β”‚ 2.2.2.2 β”‚ 5.5.5.5 β”‚ server      β”‚ client      β”‚
β”‚ 3.3.3.3 β”‚ 6.6.6.6 β”‚ client      β”‚ etc         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3 rows in set. Elapsed: 0.010 sec.

I like method1, 3, but It's not work my real table, as i mention before my table so big, so It did not work above format. I just want to make query more compact, Because the original query is also very complex

If I use a real table (has columns a lot), only method 2 will produce the right result.

Could you make a sample to show it? Your sample returns the same result for each method.

If I use a real table (has columns a lot), only method 2 will produce the right result.

Could you make a sample to show it? Your sample returns the same result for each method.

It's works.
Before i got Error like this

Code: 179, e.displayText() = DB::Exception: Different expressions with the same alias make_context , Even Using same query, method3 query Sometime was work, Sometime was not

but when I change from a global table to a local (single) table, the query in method3 also seems to work. Thank you.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

opavader picture opavader  Β·  3Comments

bseng picture bseng  Β·  3Comments

fizerkhan picture fizerkhan  Β·  3Comments

SaltTan picture SaltTan  Β·  3Comments

innerr picture innerr  Β·  3Comments