Use case
Somethimes, we need a predefined rows and columns like this:
ββaββ¬βbββββββ
β 1 β one β
β 2 β two β
β 3 β three β
βββββ΄ββββββββ
With CH, we can do this:
SELECT 1 AS a, 'one' AS b
UNION ALL
SELECT 2, 'two'
UNION ALL
SELECT 3, 'three'
ββaββ¬βbββββ
β 1 β one β
βββββ΄ββββββ
ββaββ¬βbββββ
β 2 β two β
βββββ΄ββββββ
ββaββ¬βbββββββ
β 3 β three β
βββββ΄ββββββββ
Or this:
SELECT *
FROM
(
SELECT
[1, 2, 3] AS a,
['one', 'two', 'three'] AS b
)
ARRAY JOIN
a,
b
ββaββ¬βbββββββ
β 1 β one β
β 2 β two β
β 3 β three β
βββββ΄ββββββββ
But considering CH query sytanx being the ritches of all, it seem like this is too much to write.
Describe the solution you'd like
PostgreSQL has a shorter way to do it: VALUES LIST:
VALUES (1, 'one'), (2, 'two'), (3, 'three');
That's it.
We can implement it with a little bit different syntax:
SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))
We can implement it with a little bit different syntax:
SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))
That would be great! Now I can do so:
SELECT *
FROM (
SELECT toUInt64(tupleElement(arr, 1)) as digit_number,
tupleElement(arr, 2) as string_number
FROM (
SELECT arrayJoin(
array(
tuple(1, 'one'),
tuple(2, 'two'),
tuple(3, 'three')
)
) as arr)
)
Less typing and no subselects:
WITH
[(1, 'one'), (2, 'two'), (3, 'three')] AS rows_array,
arrayJoin(rows_array) AS row_tuple
SELECT
row_tuple.1 AS number_decimal,
row_tuple.2 AS number_string
ββnumber_decimalββ¬βnumber_stringββ
β 1 β one β
β 2 β two β
β 3 β three β
ββββββββββββββββββ΄ββββββββββββββββ
3 rows in set. Elapsed: 0.009 sec.
-- or (to force types):
WITH
[(toUInt64(1), 'one'), (2, 'two'), (3, 'three')] AS rows_array,
arrayJoin(rows_array) AS row_tuple
SELECT
row_tuple.1 AS number_decimal,
row_tuple.2 AS number_string
BTW: creating tuple + groupArray with later array join + tuple unwrap is quite a popular operation in the absence of window functions. Maybe some syntax sugar for that could be good.
Add support for case-insensitive table functions in TableFunctionFactory similar to FunctionFactory, AggregateFunctionFactory.
Implement VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three')) as a table function.
Parse first argument as table structure; for the rest arguments - evaluate constant expressions as Field and insert them into the corresponding Columns.
Create something like StorageOneBlock that will capture single Block in constructor and return OneBlockInputStream while reading.
The syntax is unusual in that it uses table definition inside a string literal. This also makes it somewhat troublesome to parse. Can we determine column types from the first row? It would also be consistent with the above array example.
I've already thinking about it.
The syntax is unusual in that it uses table definition inside a string literal.
It is consistent with file, url, hdfs table functions.
Can we determine column types from the first row?
No. For example: (1, 'one'), (256, 'two') - will derive UInt8 from first row.
It would also be consistent with the above array example.
We can use least common supertype (the same way as it works for arrays).
But it will be unconvenient for Date and DateTime:
('2019-01-01', '2019-01-01 01:02:03')
We can use least common supertype (the same way as it works for arrays).
Determine column type from first row.
If it is not explicit, use supertype:
VALUES (1, 2.3, 'one', '2019-01-02', '2019-01-02 01:02:03)),
(2, 3.4, 'two', '2019-01-03', '2019-01-02 01:02:04'),
(3, 4.5, 'three', '2019-01-03', '2019-01-02 01:02:05')
-- inferred types: Int64, Float64, String, String
But it will be unconvenient for Date and DateTime:
('2019-01-01', '2019-01-01 01:02:03')
If it is explicitly denoted, use denoted type
VALUES (toUInt8(1), toFloat32(2.3), 'one', toDate('2019-01-02'), toDateTime('2019-01-02 01:02:03')),
(2, 3.4, 'two', '2019-01-03', '2019-01-02 01:02:04'),
(3, 4.5, 'three', '2019-01-03', '2019-01-02 01:02:05'),
-- inferred types: UInt8, Float32, Date, DateTime
Since types are inferred from only first row, we use toTYPE functions only in first row.
Alternatives for explicit types
VALUES (1, 2.3, 'one', '2019-01-02', '2019-01-02 01:02:03)),
(2, 3.4, 'two', '2019-01-03', '2019-01-02 01:02:04'),
(3, 4.5, 'three', '2019-01-03', '2019-01-02 01:02:05')
AS (a UInt8, b, Float32, c Date, d DateTime);
The following one is my favorite.
TABLE name(types..) will be optional
TABLE temp(a UInt8, b, Float32, c Date, d DateTime)
VALUES (1, 2.3, 'one', '2019-01-02', '2019-01-02 01:02:03)),
(2, 3.4, 'two', '2019-01-03', '2019-01-02 01:02:04'),
(3, 4.5, 'three', '2019-01-03', '2019-01-02 01:02:05')
It was implemented in proposed form: https://github.com/yandex/ClickHouse/issues/5984#issuecomment-513411725
The following one is my favorite.
TABLE name(types..) will be optional
That's more beautiful but adds special syntax. We still have the possibility to implement it :)
Most helpful comment
Add support for case-insensitive table functions in TableFunctionFactory similar to FunctionFactory, AggregateFunctionFactory.
Implement
VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))as a table function.Parse first argument as table structure; for the rest arguments - evaluate constant expressions as Field and insert them into the corresponding Columns.
Create something like StorageOneBlock that will capture single Block in constructor and return OneBlockInputStream while reading.