Ksql: Support non-key joins.

Created on 3 Feb 2020  路  17Comments  路  Source: confluentinc/ksql

With https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable complete, we should be able to enhance ksqlDB to support non-key joins!

engine enhancement

Most helpful comment

@big-andy-coates I was actually just about to email you about this, as I see you've done a lot of work on the joins to date. I contributed the PR (with some help from many fine people) for KIP-213 mentioned above, and I've been looking at how to get this into KSQL. I've been looking under the hood of KSQL and I admit that I don't have a firm handle on it yet, and was wondering where this sort of work sits in terms of priority for the Confluent folks working on this product. I'm interested in helping if I can, but I admit that I know next to nothing about KSQL's engine.

Thanks Andy!

All 17 comments

@big-andy-coates , @vpapavas Is support for non key table join coming to ksqlDB in 0.8.0 release ? We have a project going into production in second quarter of this year and its heavily dependent on table joins which always dont share same primary key. Without this feature GlobalKTable and custom kafka streams may be our only option (which we were trying to avoid with ksql platform)

Hi @entechlog, unfortunately foreign key joins are not coming in 0.8.0. How about creating additional tables that have as primary key the join key you are interested in? Simulate the behavior of secondary indeces basically by creating extra tables by hand that have the schema you need for each join you are interested in,

@vpapavas ,
I don't think creating an additional table will always help, Unless I am missing something. Please see below example.

Here I need to join the orders table every time when an order comes in with user table and both has different PK's. I can't make USER_ID as PK in orders table, then it will retain only one order record, Unless the PK's are same ksqlDB is not letting to join the tables.

TBL_ORDERS
_ORDER_ID - Primary Key_
ITEM_ID
ITEM_DESC
QUANTITY
PRICE
USER_ID

TBL_USER
_USER_ID - Primary Key_
USER_NAME
USER_EMAIL
USER_PHONE

@entechlog you have to convert your orders table to a stream partitioned by user ID, then you can perform a stream - table join on user ID, repartition back to order ID and convert back to a table
Note however that changes to the user table would not apply to older orders, if you need that I believe the only way to do this is to use the collect_set udaf to collect all order ids grouped by user ID (yields a table), perform a stream-table join with a user stream to get a message for each user modification and then explode an repetition on the collected order ids to propagate the modifications to the orders, then join that stream with the original orders table and insert into the orders stream and again create a table from it.
I hope you could follow :wink: note that I haven't tried that myself yet, but I will have to do something similar for my use case once I get my data into Kafka, so please tell me if it worked.

@PeterLindner, Thanks for the input. We are trying several things and will keep you posted on what we end up doing. The example was just to share the idea but we are dealing with much complex with multiple self joins and self joins based on result from aggregation of all historical data.

@big-andy-coates I was actually just about to email you about this, as I see you've done a lot of work on the joins to date. I contributed the PR (with some help from many fine people) for KIP-213 mentioned above, and I've been looking at how to get this into KSQL. I've been looking under the hood of KSQL and I admit that I don't have a firm handle on it yet, and was wondering where this sort of work sits in terms of priority for the Confluent folks working on this product. I'm interested in helping if I can, but I admit that I know next to nothing about KSQL's engine.

Thanks Andy!

Bumping the question @big-andy-coates

@PeterLindner collect_set has a limitation of just 1000 elements in the array. Did you implement non-key joins with any other work-around?

@bhamur unfortunately not, for my use case I knew, that I'd have at most 3 records in the set

@entechlog did you find a solution for the join with foreign-key ?

I have the same problem and I was wondering if you found a solution yet ? :)

@big-andy-coates I was actually just about to email you about this, as I see you've done a lot of work on the joins to date. I contributed the PR (with some help from many fine people) for KIP-213 mentioned above, and I've been looking at how to get this into KSQL. I've been looking under the hood of KSQL and I admit that I don't have a firm handle on it yet, and was wondering where this sort of work sits in terms of priority for the Confluent folks working on this product. I'm interested in helping if I can, but I admit that I know next to nothing about KSQL's engine.

Thanks Andy!

Hi @bellemare, sorry for the delay in answering your question!

To be honest, I'm not 100% sure where this comes in terms of priority. I know its not part of the next quarter's roadmap. Beyond that, I'm not sure. @MichaelDrogalis or @derekjn may be able to comment more.

It would be great to have you contribute, if you feel able! First steps would be to write up the design proposal: https://github.com/confluentinc/ksql/blob/master/design-proposals/README.md.

Andy

A KIP/patch for this would be 馃槏

Unfortunately, I don't have the time to devote to this in the next 3 months due to personal obligations. We also aren't using KSql where I work at the moment, so I can't reasonably get cycles there to address it.

Understandable. We appreciate your contribution to Streams, and we will leverage your work by exposing it in ksqlDB within the foreseeable future, just no firm date as yet. What this space!

I'll keep my eyes and ears open! That being said, if something changes, this would probably be one of the first things I would work on :)

is there some updates for this feature?

Hey @rtrive, nothing yet. But patches welcome from anyone that wants to work on this. 鉂わ笍

Was this page helpful?
0 / 5 - 0 ratings