Hi,
I noticed that pandas is pinned to < 1.0.0 in Airflow's dependencies. It has now started to impact other dependencies in my pipelines, and it will gradually become more and more difficult to solve conflicts.
Do you have an idea when Airflow will become compatible with pandas 1.0.0 ?
Cheers
Thanks for opening your first issue here! Be sure to follow the issue template!
Good point, I find out Pandas < 1.0.0 release in October 31, 2019 https://pandas.pydata.org/docs/whatsnew/v0.25.3.html, and I think is new enough, although I know that 1.0.0 is the big version change.
And maybe you could create some draft PR to upgrade the version, due to have experience on your daily usage @JPFrancoia
I created the PR, but jeez the contribution process is convoluted...
I tried testing as much as possible locally. Let's see what Circle CI has to say.
Closing since PR was merged.
@JPFrancoia Sorry about the process, Airflow is a big project with lots of moving parts. We're always trying to make it friendlier for new contributors!
For future changes: you can just create a PR, no need to create an issue first.
Indeed @JPFrancoia. We have indeed quite a process. I think you hit the hardest part of it. We've been thinking and discussing how to solve the dependency problems and it's still not perfect. One day maybe we will make it super easy. Thanks for the feedback - for me that's a sign we need to do a better job at it. But for now, it is a bit convoluted just to keep us safe from transitive dependency problems. You can read a bit of why it is so complex here:
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#airflow-dependencies
TL;DR; Airlfow is a bit of both - library and application. Current approach tries to accommodate both approaches at the same time (keep dependencies open for library and pinned for application).
I think the problem can be (eventually) only solved in an easier way if we actually split "airflow application" from "airflow library" and we treat dependencies for those differently. I think it's possible and we have a number of ideas how to do it, but it is not a priority for Airflow 2.0. Maybe in Airflow 2.1 we can do something about it.
I understand, thanks for providing more explanations.
Indeed separating the application and library parts of airflow seems to be a good idea.
To give you a bit of context, I was trying to make aws data wrangler (https://github.com/awslabs/aws-data-wrangler) and Airflow to cohabit. Aws data wrangler moves fast and only supports pandas > 1.0. Ultimately it was possible, but it was a rabbit hole of dependencies and I ended up modifying a setup.py by hand.
Since Airflow is so versatile I imagine people will/are trying to plug different libraries on top of it so this dependency issue will probably happen again. But it's nice that you're thinking ahead!
Thanks for the support.