The following query should work:
SELECT
TIMESTAMPTOSTRING(ROWTIME, 'yyyy'),
sum(1)
FROM clickstream
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY TIMESTAMPTOSTRING(ROWTIME, 'yyyy');
Currently, it fails with:
Query terminated
@tomlinsonr currently, GROUP BY only accepts column names.
To clarify, we would like group by to work with scalar functions as well, essentially we are asking this question in the context of a BI tool permitting business users to perform ad-hoc analysis on kafka streams and tables.
Any update on this one guys? This is a very common use case in BI. User will create a derived field (e.g. foo = TIMESTAMPTOSTRING(ROWTIME, 'yyyy') on the fly. They will then use this is an analytic SQL statement that will GROUP BY foo. We want to enable this without materializing the function in a stream / table first.
@miguno - any way to get this moving?
@apurvam @hjafarpour
Working on a a fix... duplicate of #1849?
Fixed. You can now group by scalar functions. You can also, optionally, use the _exact_ same clause in the select statement. So these are now supported:
-- with the group by in the select statement:
SELECT
TIMESTAMPTOSTRING(ROWTIME, 'yyyy'),
sum(1)
FROM clickstream
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY TIMESTAMPTOSTRING(ROWTIME, 'yyyy');
-- without the group by in the select statement:
SELECT
COUNT(*)
FROM clickstream
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY TIMESTAMPTOSTRING(ROWTIME, 'yyyy');
Most helpful comment
Fixed. You can now group by scalar functions. You can also, optionally, use the _exact_ same clause in the select statement. So these are now supported: