Ksql: Use a deterministic name that is consistent across KSQL servers for aggregation state stores

Created on 13 Mar 2018  路  5Comments  路  Source: confluentinc/ksql

Currently, state stores for windowed aggregations (and their underlying topics) are named "KSQL_Agg_Query_" + System.currentTimeMillis(). This name is determined when building the streams topology for the query, which means the name is most likely different any time the topology is built. This means different KSQL servers (or even one server across reboots) won't share a changelog topic, which would cause us to lose data after a rebalance.

bug

All 5 comments

I'm looking at what ACLs are needed on the Kafka cluster for KSQL, and having non-deterministic topic names means you have to give KSQL produce and consume rights on _every_ topic in the cluster - no ideal!

So I'm a big +1 on this - ideally for GA

Once this is fixed we should update the integration test: SecureIntegrationTest.shouldRunQueryWithChangeLogsOnKafkaClusterWithCorrectAcls() and docs

Question: as a short-term workaround would it be possible to use wildcards for ACLs, assuming that KSQL has deterministic topic prefixing at least?

@rodesai to differentiate different servers, could we use some server ids as suffix?

@guozhangwang the intention here is to differentiate different queries, not servers. Generating consistent unique query ids across servers is a bigger problem that I'll open another issue for. Lets use this issue to track omitting the timestamp from windowed aggregates as we do for not-windowed aggregates for now.

Was this page helpful?
0 / 5 - 0 ratings