Ksql: Support ksql datagen for custom data sets

Created on 25 Apr 2018  路  9Comments  路  Source: confluentinc/ksql

Can I use ksql-datagen for my own data sets? Looks like I can only use one of 'orders', 'users', or 'pageviews'? Is there a way to use custom data (at least workaround for now)? I think this would also be very helpful for community to generate test data for kafka...

Most helpful comment

Yes you can. If you create an avro schema and pass it to datagen, it will generate random data according to that schema: ./ksql-datagen schema=impressions.avro format=delimited topic=impressions key=impressionid.

impressions.avro could look like this.

All 9 comments

Yes you can. If you create an avro schema and pass it to datagen, it will generate random data according to that schema: ./ksql-datagen schema=impressions.avro format=delimited topic=impressions key=impressionid.

impressions.avro could look like this.

Where is located the pageviews ?
In documentation this command generate data with spesical format.

/home/lenovo/apache/confluent/bin/ksql-datagen quickstart=pageviews format=delimited topic=pageviews maxInterval=100 propertiesFile=/home/lenovo/apache/confluent/etc/ksql/datagen.properties

I'm new in this topic. Can I only use the ksql - example for generating my own data sets with the special structure like IP, timestamp, and value data?

Yes. You can create your own avro schema and pass it to ksql-datagen as exemplified a few messages above.

Is there any documentation available for supported "arg.properties" field of ksql-datagen tool. I want data need to be generate from given predefined values (enums) but didn;t find suitable option.

Trying following code but not working -

{
 "name": "gender",
 "type": {
               "type":["null", "string"],
               "arg.properties":{
                        "regex": "{MALE|FEMALE|OTHER}"
                }
          }
}

Command

ksql-datagen topic=users schema=users.avsc format=avro key=userid

Error -

Exception in thread "main" org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected character ('}' (code 125)): was expecting a colon to separate field name and value

@bajaj-varun https://github.com/confluentinc/avro-random-generator maybe here you can find what you are looking for

Where is the avro file to be located if one is running everything via docker container through this docker file? I have placed my avro file in both connect and broker containers, and given path as _schema=club.avsc_, but still get a 'File not found' error.

Was this page helpful?
0 / 5 - 0 ratings