I am trying to see if I can build a data processing pipeline using tekton CD with say an input of event in Kafka Queue. Some example of data processing pipeline task steps will include Text extraction from files, Running AI/ML predictions / classification / tagging on extracted text from the earlier tasks , meta-data processing etc and finally push it as an event to kafka and store the data in S3. I am trying to compare this option with running Apache Beam pipelines (https://beam.apache.org/documentation/pipelines/design-your-pipeline/) with Spark runner (Spark operator on K8s) or use Argo Workflows on K8s. Please advise. the step containers are likely going to be huge (cpu/ram) as it will need to deal with large file (10-100 MB) text extraction etc.
Not knowing too much about Tekton, I think this is a great idea. My use case is very similar to yours (1000x100mb files that need text analysis) and I was planning on using RabbitMQ. My intuition is that using a Tekton pipeline would be less of a headache.
I鈥檒l post back if it turns out. Please keep us up to date on your efforts as well! :)
Obviously there's nothing in Tekton that will actively prevent you from building whatever kinds of pipelines you want. However, while I don't think it would be impossible to do what you're suggesting I also don't think Tekton Pipelines is geared towards your goals either. Tekton's focus is very much to provide building blocks and working implementations of continuous integration and delivery primitives.
I'm going to close this issue since I believe the general answer to your question is "yes" but the more pragmatic answer is "noone's actively supporting that use case right now".
Thanks Scott for your quick reply!
Most helpful comment
Obviously there's nothing in Tekton that will actively prevent you from building whatever kinds of pipelines you want. However, while I don't think it would be impossible to do what you're suggesting I also don't think Tekton Pipelines is geared towards your goals either. Tekton's focus is very much to provide building blocks and working implementations of continuous integration and delivery primitives.
I'm going to close this issue since I believe the general answer to your question is "yes" but the more pragmatic answer is "noone's actively supporting that use case right now".