SELECT queries with windows in them should return windowing data.
This query ("1"):
SELECT userid, sum(1)
FROM clickstream
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY userid;
should return exactly what is returned from these queries ("2"):
CREATE TABLE t1 AS
SELECT userid, sum(1)
FROM clickstream
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY userid;
SELECT * FROM t1;
Currently, (1) returns:
9 | 1
5 | 4
36 | 2
24 | 5
27 | 6
and (2) returns:
1511910830000 | 28 : Window{start=1511910830000 end=-} | 28 | 3
1511910825000 | 7 : Window{start=1511910825000 end=-} | 7 | 10
1511910825000 | 8 : Window{start=1511910825000 end=-} | 8 | 9
1511910825000 | 21 : Window{start=1511910825000 end=-} | 21 | 8
1511910830000 | 26 : Window{start=1511910830000 end=-} | 26 | 4
Any update on this guys?
@hjafarpour @apurvam
@tomlinsonr just to be clear - are you talking about the columns that are returned (ROWTIME, ROWKEY), or the different data values that you're seeing (1 vs 3, 4 vs 10, etc).
@rmoff. Correct. We would like to see the ROWTIME and ROWKEY returned.
I agree, this would be a useful feature. I guess the syntax would be something like
SELECT ROWTIME, ROWKEY, userid, sum(1)
FROM clickstream
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY userid;
@tomlinsonr : We just merged the related PR https://github.com/confluentinc/ksql/pull/1993, where we provide direct access to window start time and window end time via built-in functions.
Given an existing stream such as:
CREATE STREAM TEST (ID bigint)
WITH (kafka_topic='test_topic', value_format='DELIMITED', key='ID');
You can now write something like:
CREATE STREAM s1 AS
SELECT id, WindowStart(), WindowEnd()
FROM test
WINDOW TUMBLING (SIZE 30 SECONDS)
GROUP BY id;
Could you take a look and let us know whether this would also help you here?
Most helpful comment
I agree, this would be a useful feature. I guess the syntax would be something like