Ksql: Improve Testability of KSQL applications

Created on 29 Oct 2018  路  9Comments  路  Source: confluentinc/ksql

It would be great to get feedback from users as to the tooling they'd like to see and how they see it being used.

  • Please give us a +1 (thumbs up) on this message if you like better support for testing KSQL applications.
  • Please write a comment yourself to indicate which of the listed options you'd prefer, and why. If none of the listed options are what you need, please tell us what would.

Motivation

Improved tooling to move from development to production: KSQL should provide better tooling to help with the development lifecycle of a KSQL application, specifically around testing the application. Here, "KSQL application" refers to headless aka non-interactive KSQL cluster that is executing a SQL file containing one or more SQL statements. This is the preferred setup for production environments.

Possible Options

Option 1: Testkit (executable) that takes your queries from a SQL file, plus input data and expected output data to verify that actual output matches expected output

Advantages:

  • This approach should be simple, easy to understand. Particularly, unlike option 3 below, the user would not need knowledge of Java/Scala to test their KSQL applications.

What we have today:

KSQL already has something similar to this in the form of the QueryTranslationTest. This test runs through many test cases defined in JSON test files, where each test case defines the SQL to execute, the input and expected output. Though it doesn't currently support testing multiple queries and isn't packaged in a way that would make it easy for others to use.

The QueryTranslationTest is also capable of capturing the kafka-streams topology each statement generates and using this to detect differences between releases of KSQL. Such differences can result indicate compatibility breaking changes have been introduced. It may also be useful to have something similar to this to ensure that a change to a sql file is compatible with the previous version of the application, i.e. state stores and repartition topic names remain consistent.

Option 2: Approach similar to ksql-datagen tool

Another testing approach people might want is to use an enhanced version of the ksql-datagen tool. Though this tends to create randomised data.

Option 3: Embedding KSQL as a library into a Java/Scala application for purposes of testing

This is related to GH-734: Support using KSQL as library to write streaming applications (aka KSQL embedded mode). That is, to use KSQL as a library similar to how Kafka Streams is being used.

Advantages:

  • Leverage the full power of Java/Scala testing tools to help with testing your KSQL application.

    • The big downside of this approach, conversely, is that it is impractical or at least very inconvenient for those users that are using KSQL particularly because KSQL is NOT requiring them to write any Java/Scala/JVM code.

Here, the user would write a JVM application in which KSQL would be used as a library (this would thus create a dependency on GH-734 to be implemented first).

While the details of this setup would have to be hashed out, a few users have already expressed their interest in such a functionality. Typically, it's the same group of KSQL users who'd prefer to not only test but also develop+deploy KSQL embedded inside a JVM application, i.e. the same group of users that would like to see the functionality in GH-734: Support using KSQL as library to write streaming applications (aka KSQL embedded mode). See https://github.com/confluentinc/ksql/issues/734#issuecomment-426870851 for an example provided by a user in favor of GH-734.

Other options?

Please share any such additional options in your comments below.

code-lifecycle-ux enhancement operability

Most helpful comment

The version one of the testing tool as described in Option 1 has been implemented and will be available in 5.3 release.

All 9 comments

@big-andy-coates - ksql-datagen tool - doesn't create random data - you provide it with bounds and it generates data within those bounds, numeric enum etc. It is very powerful.

Option 1 - is the ultimate. it should not be language oriented and provide validation of inputs, outputs etc. Augmenting this with the option to use datagen would be nice (provide a datagen schema and expected output).

Option 3 is the natural starting point, before evolving to 2 and 3.

It should be pointed out that these are smoke tests and not unit tests or are you proposing to embed ZK, Kafka, and SR?

Option 1 - would be ideal. Not everyone is skilled or has the time to setup JVM testing methods. Especially in a big company/enterprise it would be useful to consider "testers" as a different team.

@bluemonk3y option 1 is kinda there already. We already have such tests for internal use. They don't require kafka, SR, etc. A good first step would be to extract that testing infra into a module that could be used by clients. For reference: https://github.com/confluentinc/ksql/tree/master/ksql-engine/src/test/resources/query-validation-tests

@big-andy-coates

Option 1: would be really great - especially since it would allow data engineers to define and run test cases. Also what's already there and looks nice and is more than a good starting point.

Option 3: is of course what I would prefer from a developers perspective. When it comes to automated testing and deployment in order to reach production readiness it's ok that an approach requires developer skills. It's one thing to allow data analysts to do interactive ad hoc stream processing without Java/Scala skills but a different story to prepare production ready releases - maybe non-dev folks won't like me for this statement :) which is fine.

Option 2: I see this as an addition but only for generating test data not so I wouldn't mix / couple the testing aspects with data generation capabilities.

+1 for Option 1 from an offline discussion with a user.

Option 1 from an offline discussion with a user

That was me, can confirm the 馃憤

+1 for Option 1

The version one of the testing tool as described in Option 1 has been implemented and will be available in 5.3 release.

great news. many THX @hjafarpour for your hard work!

Was this page helpful?
0 / 5 - 0 ratings