Will ClickHouse support WITH clause (common table expression, cte)? Like https://www.postgresql.org/docs/9.1/static/queries-with.html
The thing that with CTE I can define temp dataset I can then query multiple times in one query statement. Currently, I'm running the same query twice as two subqueries. For example 1) to find distinct IDs and then 2) to get actual data that can be joined on those IDs I found before.
Bump.
@simPod I am interested in the same, but I've seen it mentioned in other issues that if you have identical subqueries, ClickHouse only actually runs it once.
Clickhouse supports WITH clause since mid 2017, but only for scalar values.
Sample syntax:
[1,2,3,4,5,6,7,8,9] as numbers,
arrayMap(x -> x*2, numbers) as doubled_numbers,
arrayMap(x -> x*x, numbers) as squared_numbers,
arrayMap( (x,y) -> x+y, doubled_numbers, squared_numbers) as summed_numbers
SELECT
summed_numbers;
@derekperkins cool, good to know, thanks!
Are there any plans to support WITH clause for non-scalar values?
Still not for the first half of 2019.
Thanks for update, your work is appreacited!
Do we have any trick to make a temporary table work like WITH?
@alexey-milovidov is it true that
if you have identical subqueries, ClickHouse only actually runs it once
are there any preconditions for it to work?
I have created this query
SELECT *
FROM (
SELECT (intDiv(toUInt32(`time`), 300) * 300) AS `time`,
avg(volume) AS volume
FROM (
SELECT `time`,
sum(volume) AS volume
FROM (
SELECT (intDiv(toUInt32(fetched_at), 60) * 60) AS `time`,
port_id,
avg("bits_in") / 100 AS volume
FROM counter_bit
WHERE port_id IN ('50871ea4-b52a-4179-8d2b-3a8616a0cbba', '7fcd5ffa-c4fb-4bc3-ab4e-26f47100e5ae') AND
fetched_at BETWEEN '2019-02-25 16:29:17' AND '2019-02-26 16:29:17'
GROUP BY `time`, port_id
)
GROUP BY `time`
)
GROUP BY `time`
)
CROSS JOIN (
SELECT avg(volume) AS avg,
max((`time`, volume)) . 2 AS current,
max((volume, `time`)) AS max,
quantile(0.95)(volume) AS p95,
avg(volume) * count() * 60 AS sum
FROM (
SELECT `time`,
sum(volume) AS volume
FROM (
SELECT `time`,
sum(volume) AS volume
FROM (
SELECT (intDiv(toUInt32(fetched_at), 60) * 60) AS `time`,
port_id,
avg("bits_in") / 100 AS volume
FROM counter_bit
WHERE port_id IN ('50871ea4-b52a-4179-8d2b-3a8616a0cbba', '7fcd5ffa-c4fb-4bc3-ab4e-26f47100e5ae') AND
fetched_at BETWEEN '2019-02-25 16:29:17' AND '2019-02-26 16:29:17'
GROUP BY `time`, port_id
)
GROUP BY `time`
UNION ALL
SELECT intDiv(
toUInt32(
toDateTime('2019-02-25 16:29:17') + number * 60
),
60
) * 60 AS `time`,
toFloat64(0) AS volume
FROM numbers(toUInt64(((toDateTime('2019-02-26 16:29:17') - toDateTime('2019-02-25 16:29:17')) / 60)))
)
GROUP BY `time`
)
)
ORDER BY `time`
where the inner select is identical
SELECT (intDiv(toUInt32(fetched_at), 60) * 60) AS `time`,
port_id,
avg("bits_in") / 100 AS volume
FROM counter_bit
WHERE port_id IN ('50871ea4-b52a-4179-8d2b-3a8616a0cbba', '7fcd5ffa-c4fb-4bc3-ab4e-26f47100e5ae') AND
fetched_at BETWEEN '2019-02-25 16:29:17' AND '2019-02-26 16:29:17'
GROUP BY `time`, port_id
But number of rows that are required to be processed is doubled so the result is not reused.
@derekperkins Could you please show me the issue or link proving you words about ClickHouse only actually runs it once, if you have identical subqueries? Thank you
@alexey-milovidov I noticed that WITH clause speedups my query which involves a big bitmap(cardinality 99M ) filter from 394s to 93s. Could you explain why it happen? Thanks!
The original query:
toDate ( order_complete_time ) AS rowData,
city_tier AS columnData,
count( DISTINCT mid_seq ),
toDecimal128 ( SUM( order_total_sales ), 0 ),
count( 1 ),
toDecimal128 ( avg( order_total_sales ), 0 ),
countIf (
1 = hasAny ( [ '2434', '1440', '2435', '4937', '3449', '3452', '1600', '1100', '1403', '3466' ], detail.p_item_code )
AND 0 = hasAll ( [ '2434', '1440', '2435', '4937', '3449', '3452', '1600', '1100', '1403', '3466' ], detail.p_item_code )
) AS addonGC,
avgIf (
order_total_sales,
(
1 = hasAny ( [ '2434', '1440', '2435', '4937', '3449', '3452', '1600', '1100', '1403', '3466' ], detail.p_item_code )
AND 0 = hasAll ( [ '2434', '1440', '2435', '4937', '3449', '3452', '1600', '1100', '1403', '3466' ], detail.p_item_code )
)
) AS addonAC,
countIf (
1 = hasAny ( [ '2434', '1440', '2435', '4937', '3449', '3452', '1600', '1100', '1403', '3466' ], detail.p_item_code )
AND 0 = hasAll ( [ '2434', '1440', '2435', '4937', '3449', '3452', '1600', '1100', '1403', '3466' ], detail.p_item_code )
) / count( 1 ) AS addonRate
FROM
dist_cdp_orders_test
WHERE
toDate ( order_complete_time ) GLOBAL IN (
SELECT
toDate ( order_complete_time ) AS ROW
FROM
dist_cdp_orders_test
WHERE
( order_complete_time >= '2019-07-01 00:00:00' AND order_complete_time <= '2019-07-31 23:59:59' )
AND ( bitmapContains ( ( SELECT mid_seqs FROM cdp_segments WHERE seg_id = '8afe10136d333333016d339320100006' LIMIT 1 ), mid_seq ) )
GROUP BY
ROW
ORDER BY
ROW DESC
LIMIT 0,
10
)
AND ( order_complete_time >= '2019-07-01 00:00:00' AND order_complete_time <= '2019-07-31 23:59:59' )
AND ( bitmapContains ( ( SELECT mid_seqs FROM cdp_segments WHERE seg_id = '8afe10136d333333016d339320100006' LIMIT 1 ), mid_seq ) )
GROUP BY
rowData,
columnData
ORDER BY
rowData DESC
Rewrite withWITH clause:
[ '2434', '1440', '2435', '4937', '3449', '3452', '1600', '1100', '1403', '3466' ] AS item_codes,
(SELECT mid_seqs FROM cdp_segments WHERE seg_id = '8afe10136d333333016d339320100006' LIMIT 1) AS bm
SELECT
toDate ( order_complete_time ) AS rowData,
city_tier AS columnData,
count( DISTINCT mid_seq ),
toDecimal128 ( SUM( order_total_sales ), 0 ),
count( 1 ),
toDecimal128 ( avg( order_total_sales ), 0 ),
countIf (
1 = hasAny ( item_codes, detail.p_item_code )
AND 0 = hasAll ( item_codes, detail.p_item_code )
) AS addonGC,
avgIf (
order_total_sales,
(
1 = hasAny ( item_codes, detail.p_item_code )
AND 0 = hasAll ( item_codes, detail.p_item_code )
)
) AS addonAC,
countIf (
1 = hasAny ( item_codes, detail.p_item_code )
AND 0 = hasAll ( item_codes, detail.p_item_code )
) / count( 1 ) AS addonRate
FROM
dist_cdp_orders_test
WHERE
toDate ( order_complete_time ) GLOBAL IN (
WITH (SELECT mid_seqs FROM cdp_segments WHERE seg_id = '8afe10136d333333016d339320100006' LIMIT 1) AS bm
SELECT
toDate ( order_complete_time ) AS ROW
FROM
dist_cdp_orders_test
WHERE
( order_complete_time >= '2019-07-01 00:00:00' AND order_complete_time <= '2019-07-31 23:59:59' )
AND ( bitmapContains ( bm, mid_seq ) )
GROUP BY
ROW
ORDER BY
ROW DESC
LIMIT 0,
10
)
AND ( order_complete_time >= '2019-07-01 00:00:00' AND order_complete_time <= '2019-07-31 23:59:59' )
AND ( bitmapContains ( bm, mid_seq ) )
GROUP BY
rowData,
columnData
ORDER BY
rowData DESC;
This task is still relevant, but is not assigned yet. There is high chance that we will assign it in nearest weeks.
@alexey-milovidov
will this end up like
WITH (SELECT a, b FROM t) as t1
SELECT * FROM t1 WHERE a=1
to reuse subquery result , please save a kid has suffered so much !!!
WITH (SELECT a, b, c FROM test) AS a_ table
to reuse subquery result , please save a kid has suffered so much !!!
WITH (SELECT a, b, c FROM test) AS a_ table
temporary tables https://clickhouse.yandex/docs/en/query_language/create/#temporary-tables
to reuse subquery result , please save a kid has suffered so much !!!
WITH (SELECT a, b, c FROM test) AS a_ tabletemporary tables https://clickhouse.yandex/docs/en/query_language/create/#temporary-tables
this is not a very appropriate solution
this is not a very appropriate solution
There is no other solutions. So.
to reuse subquery result , please save a kid has suffered so much !!!
Let's help this poor kid https://github.com/ClickHouse/ClickHouse/pull/14771
Thx.~
| |
梁凡
邮箱:[email protected]
|
签名由 网易邮箱大师 定制
On 09/13/2020 01:08, Amos Bird wrote:
to reuse subquery result , please save a kid has suffered so much !!!
Let's help this poor kid #14771
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
Most helpful comment
Are there any plans to support WITH clause for non-scalar values?