Potential user here. I'm interested in using Kedro, but we use Ray Distributed instead of PySpark for our execution engine. Do your pipelines support this?
Hi, thank you for your interest for Kedro! We don't natively support Ray DataSet nor RayRunner so far, but you can add both as a custom dataset/runner. See https://kedro.readthedocs.io/en/latest/06_nodes_and_pipelines/02_pipelines.html?#using-a-custom-runner for using a custom runner, and https://kedro.readthedocs.io/en/latest/07_extend_kedro/01_custom_datasets.html for custom dataset.
Update: I confirmed that Ray works fine :)
Just a quick note on this. @crypdick followed this tutorial from @dataengineerone, except instead of using multiprocessing inside each node he used ray.
Thanks for investigating this @crypdick ✨
Most helpful comment
Update: I confirmed that Ray works fine :)