After much thought and conferring with others, I'd like to formally propose that we separate the current master branch of DataFrames into a separate package, tentatively called DataFrames2.
Advantages:
Nullable may change nontrivially (see https://github.com/JuliaLang/Juleps/pull/21)Nullable-based backend is more difficult to work with at the moment (see #1148)Nullable-backend version, the current version will continue to be more actively maintainedrelease-0.8 branchDisadvantages:
using DataFrames2 is kind of unfortunate as a long-term name unless we re-merge the packages at some pointFixing DataArrays is definitely not trivial, but I think overall this is the best course of action for both the developers and the users. I'd love to hear your thoughts, including further advantages or disadvantages not covered here.
That sounds fine to me but I would wait until DataArrays works on Julia 0.6. That's really the prerequisite before we can consider the old DataFrames framework as viable for at least one more release.
Also, in practical terms, I guess it would be better to rename this repo to DataFrames2 so that we don't lose open issues/PR, which contain lots of useful discussions and still valid points. Then we can create a new DataFrames package from the release-0.8 branch (including git history).
This seems like a more-workable approach.
I guess it would be better to rename this repo to DataFrames2 so that we don't lose open issues/PR, which contain lots of useful discussions and still valid points. Then we can create a new DataFrames package from the
release-0.8branch (including git history).
I'd recommend the other way around, leave this repo in place since the majority of the issues are w.r.t. the DataArrays backend. Can switch around the github default branch in the short term, and (optionally) roll back master once there's a separate new repo for the NullableArrays backend?
I don't really like this idea since in the end we'll have two projects with useful history: this one, plus DataFrames2 and issues/PR filed during the transition. The way forward is DataFrames2, so it should retain the history; DataFrames 0.8 is the dead-end and we won't care about it in a few months.
I don't like the idea that the long-term name will be DataFrames2. If we make the split, can we agree that DataFrames2 is planned to be merged back into DataFrames at some point?
The way forward is DataFrames2
Uncertainty about this is why this issue was filed. Isn't it still unclear whether the implementation that's on master is going to be the long term usable solution? This is taking time and the original February plan is nearly here, without the picture on the ground having changed much.
I don't like the idea that the long-term name will be DataFrames2. If we make the split, can we agree that DataFrames2 is planned to be merged back into DataFrames at some point?
Yes, of course my idea would be to deprecate DataFrames at some point, and later replace it with DataFrames2.
Uncertainty about this is why this issue was filed. Isn't it still unclear whether the implementation that's on master is going to be the long term usable solution? This is taking time and the original February plan is nearly here, without the picture on the ground having changed much.
Even if the implementation based on Nullable and NullableArray has to change significantly (which isn't certain yet), it is clear that the final implementation will be closer to DataFrames2 than to DataFrames. So IMO it makes sense to consider the current master as the way forward, even if we end up releasing DataFrames3.
I'm strongly in favor of this proposal.
In terms of naming, what about naming the new version DataTable?
I'd be fine with a name like DataTable. If we did something like that then perhaps we wouldn't need to merge it back with DataFrames at all; the Nullable-based version could just live under a new name. That said, I'd also be fine having a DataFrames2 and merging it back with DataFrames at some point.
Glad to see there's support for this. Action items (which as I write them are snowballing) are as follows:
@~ will go away in 0.6 (see https://github.com/JuliaLang/julia/pull/20420)Nullable-based backend@formula to handle the @~ change _in both repos_Longer term items:
Seems simple enough... maybe?! 馃樀
This issue seems to have served its purpose, so I'm going to go ahead and close it. Thanks everyone for your help and feedback!
Most helpful comment
Glad to see there's support for this. Action items (which as I write them are snowballing) are as follows:
Bump the DataArrays version requirement in the DataArrays-backed DataFrames andtag itUpper bound all registered versions of DataFrames in METADATA to Julia 0.6(-dev?)@~will go away in 0.6 (see https://github.com/JuliaLang/julia/pull/20420)Nullable-based backend@formulato handle the@~change _in both repos_Longer term items:
Seems simple enough... maybe?! 馃樀