Currently KSQL allows to adjust the timeframe during which late or out-of-order data can be allocated to the correct ("older") window only by setting a Streams config property, which is not at all clear or obvious to a SQL user. Therefore this is an enhancement request to allow setting this value for a query with some new SQL query syntax - basically a grammar extension to specify what a Kafka Streams user knows as SessionWindows.gracePeriod<duration>). One easy-to-grok way to present that might be to allow an optional extra parameter in the WINDOW clause, something like "WINDOW TUMBLING (SIZE 5 MINUTES UNTIL 8 HOURS)" for example.
@blueedgenick - as a side note - we were thinking about encapsulating streaming properties in the WITH clause; similar to how css attributes can be referenced in html i.e.
WITH(stream-properties:commit.interval.ms'='2000',cache.max.bytes.buffering'='10000000'
I'm not sure about the syntax ;(
This approach would mean we can capture the non-standard SQL property bits without expanding the grammar too much, keep them in one place, and also make such properties atomically referenced/encapsulated rather than sessionized.
This would be especially useful where stream-stream join timing issues are required, as well as event-emission behavior.
@bluemonk3y interesting point - perhaps there are some streaming properties which are semantically significant - like window retention - and others which are more like tuning parameters (e.g. commit intervals). the former group make sense to me as sql syntax while the second set definitely don't. (i don't consider inclusion in the WITH clause to be part of the syntax for the purpose of this discussion). That said, i'm leery of cramming too much junk into the WITH clause - there's already quite a few other things we are going to want to put in there for various purposes which have no other clean way to set them at all, whereas the streams properties at least have another mechanism available (the 'session variables' accessible via SET)
I think it makes sense to add UNTIL to the WINDOW clause
I am with @dguy and @blueedgenick . Adding UNTIL to the WINDOW clause is very intuitive in this context.
Agreed.
But I'd also add that there are further places where a user would like to have more control on how late-arriving data is being handled (not just windowed aggregations).
Any news on this?
Just adding a note on another suggested syntax for this:
SELECT regionid, COUNT(*) FROM pageviews
WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS, RETAIN FOR 2 DAYS)
WHERE UCASE(gender)='FEMALE' AND LCASE (regionid) LIKE '%_6'
GROUP BY regionid;
i like that. it oalso occurs to me that since we first discussed this, Kafka Streams more formally introduced the "grace period" notion, which i think is really what is wanted from the KSQL perspective, so we could adopt the same terminology here as well so as to be consistent and ease cross-over learning and usage: WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS, GRACE PERIOD 2 DAYS) ?
I love the idea of tunable retention but I'm not sure that expressing this via syntax would be ideal. I agree with @bluemonk3y about encapsulating this configuration using WITH parameters. I think we should be extremely conservative about introducing non-standard syntax.
As an aside, I think we should establish a simple set of guiding principles around what should be represented with syntax and what should be configuration (e.g. WITH properties). And personally I think these principles could be something along the lines of:
* Anything that is needed for runtime query evaluation should be expressed using syntax.
* Everything else should be configuration.
If you'll humor me for a moment with those principles in mind, they would imply that retention is independent of runtime query evaluation and therefore not suitable to be represented via syntax.
You can disregard this :)
@derekjn how would one configure multiple subqueries with different grace periods (eg in UNION ALL) using WITH?
@derekjn how would one configure multiple subqueries with different grace periods (eg in UNION ALL) using WITH?
@PeterLindner we don't currently support subqueries, but the best option would probably be to just separate your continuous queries so that you could apply one retention period to each query. Personally I don't think that that the UX around this should be optimized for multiple retention periods within single query, especially given that we don't support subqueries as of today.
Personally, I think that retention time should be part of the syntax because compared to commit.interval.ms (that is a non-functional property) the retention time (from my point of view) is a functional property that may lead to a different result.
@derekjn Can you elaborate what you mean by "runtime query evaluation"? And why do you think that retention time does not belong to this category?
@derekjn Can you elaborate what you mean by "runtime query evaluation"? And why do you think that retention time does not belong to this category?
Actually guys I may be talking about something totally different here :) You can disregard!
To avoid confusion, I just update the ticket title to "grace period".
Note that in older KS version, there was just a single parameter that unified grace period and retention time (and both where always set to the same value effectively). We only later split it up into two parameters that can be set independently.
Btw: https://github.com/confluentinc/ksql/issues/4157 duplicate this ticket.
Closing as duplicate