Clickhouse: Implement VALUES List

Created on 11 Jul 2019 · 8Comments · Source: ClickHouse/ClickHouse

Use case
Somethimes, we need a predefined rows and columns like this:

┌─a─┬─b─────┐
│ 1 │ one   │
│ 2 │ two   │
│ 3 │ three │
└───┴───────┘

With CH, we can do this:

SELECT 1 AS a, 'one' AS b
UNION ALL
SELECT 2, 'two'
UNION ALL
SELECT 3, 'three'

┌─a─┬─b───┐
│ 1 │ one │
└───┴─────┘
┌─a─┬─b───┐
│ 2 │ two │
└───┴─────┘
┌─a─┬─b─────┐
│ 3 │ three │
└───┴───────┘

Or this:

SELECT *
FROM
(
    SELECT
        [1, 2, 3] AS a,
        ['one', 'two', 'three'] AS b
)
ARRAY JOIN
    a,
    b

┌─a─┬─b─────┐
│ 1 │ one   │
│ 2 │ two   │
│ 3 │ three │
└───┴───────┘

But considering CH query sytanx being the ritches of all, it seem like this is too much to write.

Describe the solution you'd like
PostgreSQL has a shorter way to do it: VALUES LIST:

VALUES (1, 'one'), (2, 'two'), (3, 'three');

That's it.

feature

Source

ramazanpolat

Most helpful comment

Add support for case-insensitive table functions in TableFunctionFactory similar to FunctionFactory, AggregateFunctionFactory.

Implement VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three')) as a table function.

Parse first argument as table structure; for the rest arguments - evaluate constant expressions as Field and insert them into the corresponding Columns.

Create something like StorageOneBlock that will capture single Block in constructor and return OneBlockInputStream while reading.

alexey-milovidov on 20 Jul 2019

👍3

All 8 comments

We can implement it with a little bit different syntax:

SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))

alexey-milovidov on 12 Jul 2019

👍2

We can implement it with a little bit different syntax:
SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))

That would be great! Now I can do so:

SELECT *
FROM (
         SELECT toUInt64(tupleElement(arr, 1)) as digit_number,
                tupleElement(arr, 2)           as string_number
         FROM (
               SELECT arrayJoin(
                              array(
                                      tuple(1, 'one'),
                                      tuple(2, 'two'),
                                      tuple(3, 'three')
                                  )
                          ) as arr)
         )

sklyar on 12 Jul 2019

👍2

Less typing and no subselects:

WITH 
    [(1, 'one'), (2, 'two'), (3, 'three')] AS rows_array, 
    arrayJoin(rows_array) AS row_tuple
SELECT 
    row_tuple.1 AS number_decimal, 
    row_tuple.2 AS number_string

┌─number_decimal─┬─number_string─┐
│              1 │ one           │
│              2 │ two           │
│              3 │ three         │
└────────────────┴───────────────┘

3 rows in set. Elapsed: 0.009 sec.

-- or (to force types):

WITH 
    [(toUInt64(1), 'one'), (2, 'two'), (3, 'three')] AS rows_array, 
    arrayJoin(rows_array) AS row_tuple
SELECT 
    row_tuple.1 AS number_decimal, 
    row_tuple.2 AS number_string

BTW: creating tuple + groupArray with later array join + tuple unwrap is quite a popular operation in the absence of window functions. Maybe some syntax sugar for that could be good.

filimonov on 13 Jul 2019

Add support for case-insensitive table functions in TableFunctionFactory similar to FunctionFactory, AggregateFunctionFactory.

Implement VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three')) as a table function.

Parse first argument as table structure; for the rest arguments - evaluate constant expressions as Field and insert them into the corresponding Columns.

Create something like StorageOneBlock that will capture single Block in constructor and return OneBlockInputStream while reading.

alexey-milovidov on 20 Jul 2019

👍3

The syntax is unusual in that it uses table definition inside a string literal. This also makes it somewhat troublesome to parse. Can we determine column types from the first row? It would also be consistent with the above array example.

akuzm on 26 Jul 2019

I've already thinking about it.

The syntax is unusual in that it uses table definition inside a string literal.

It is consistent with file, url, hdfs table functions.

Can we determine column types from the first row?

No. For example: (1, 'one'), (256, 'two') - will derive UInt8 from first row.

It would also be consistent with the above array example.

We can use least common supertype (the same way as it works for arrays).
But it will be unconvenient for Date and DateTime:
('2019-01-01', '2019-01-01 01:02:03')

alexey-milovidov on 26 Jul 2019

We can use least common supertype (the same way as it works for arrays).

Determine column type from first row.

If it is not explicit, use supertype:

VALUES (1, 2.3, 'one', '2019-01-02', '2019-01-02 01:02:03)), 
       (2, 3.4, 'two', '2019-01-03', '2019-01-02 01:02:04'),
       (3, 4.5, 'three', '2019-01-03', '2019-01-02 01:02:05')
-- inferred types: Int64, Float64, String, String

But it will be unconvenient for Date and DateTime:
('2019-01-01', '2019-01-01 01:02:03')

If it is explicitly denoted, use denoted type

VALUES (toUInt8(1), toFloat32(2.3), 'one', toDate('2019-01-02'), toDateTime('2019-01-02 01:02:03')), 
       (2, 3.4, 'two', '2019-01-03', '2019-01-02 01:02:04'),
       (3, 4.5, 'three', '2019-01-03', '2019-01-02 01:02:05'),
-- inferred types: UInt8, Float32, Date, DateTime

Since types are inferred from only first row, we use toTYPE functions only in first row.

Alternatives for explicit types

VALUES (1, 2.3, 'one', '2019-01-02', '2019-01-02 01:02:03)), 
       (2, 3.4, 'two', '2019-01-03', '2019-01-02 01:02:04'),
       (3, 4.5, 'three', '2019-01-03', '2019-01-02 01:02:05')
AS (a UInt8, b, Float32, c Date, d DateTime);

The following one is my favorite.

TABLE name(types..) will be optional

TABLE temp(a UInt8, b, Float32, c Date, d DateTime)
VALUES (1, 2.3, 'one', '2019-01-02', '2019-01-02 01:02:03)), 
       (2, 3.4, 'two', '2019-01-03', '2019-01-02 01:02:04'),
       (3, 4.5, 'three', '2019-01-03', '2019-01-02 01:02:05')

ramazanpolat on 26 Jul 2019

It was implemented in proposed form: https://github.com/yandex/ClickHouse/issues/5984#issuecomment-513411725