drake's code of conduct.drake-r-package tag. (If you anticipate extended follow-up and discussion, you are already in the right place!)drake is a very interesting package to me and I wonder whether there is any python package that is similar to drake. I saw Metaflow is mentioned in the issues, but Metaflow seems not to support caching. I saw a package called bionic is similar but drake has already been developed for a few years. Any thoughts on how to have something similar in python?
Good question. https://github.com/pditommaso/awesome-pipeline lists a bunch of excellent pipeline tools, and several are implemented in Python. The ones most similar to drake are probably going to be the ones resembling GNU Make. Snakemake is the closest one to drake I have looked at.
What I am wondering is how many of those Python-based Make-like tools support a function-oriented style, static code analysis to automatically detect dependencies, and abstraction of data as Python objects. (drake does all three for R.)
Metaflow is more like Airflow than Make. Unless you are trying to resume a run that crashed, Metaflow does not skip up-to-date targets, and it assumes you are already resigned to rerunning all your targets from scratch. Tools like that tend to focus on reproducibly packaging up and sharing complete end-to-end runs among the members of a data science team. That is valuable, but different from what drake is trying to do.
@ajing: You might want to take a look at Ploomber, it also features skipping up-to-date tasks https://github.com/ploomber/ploomber
Disclaimer: I'm the author
Most helpful comment
Good question. https://github.com/pditommaso/awesome-pipeline lists a bunch of excellent pipeline tools, and several are implemented in Python. The ones most similar to
drakeare probably going to be the ones resembling GNU Make. Snakemake is the closest one todrakeI have looked at.What I am wondering is how many of those Python-based Make-like tools support a function-oriented style, static code analysis to automatically detect dependencies, and abstraction of data as Python objects. (
drakedoes all three for R.)Metaflow is more like Airflow than Make. Unless you are trying to resume a run that crashed, Metaflow does not skip up-to-date targets, and it assumes you are already resigned to rerunning all your targets from scratch. Tools like that tend to focus on reproducibly packaging up and sharing complete end-to-end runs among the members of a data science team. That is valuable, but different from what
drakeis trying to do.