Dataframes.jl: Do we need columns and eachcol

Created on 6 Nov 2018  路  14Comments  路  Source: JuliaData/DataFrames.jl

We currently have columns (not exported) function and eachcol (exported) that essentially serve the same purpose. Do we need to keep them both and what should be the direction in design here?

CC @nalimilan

Most helpful comment

A type-stable iterator over rows of a data frame (passed as named tuples) could be useful if it allowed specializing the user-provided function. But that can be discussed separately.

All 14 comments

eachcol is different from columns as it yields (name, vector) tuples. But indeed I've wondered for some time whether we shouldn't deprecate eachcol and export columns. We could also keep both, it doesn't really hurt and it might be a bit faster than doing df[name] repeatedly (only significant when there are only a few rows, though).

I am aware that they are a bit different, but I meant that eachcol covers all use cases of columns and should not be that much slower so actually columns is not needed if we keep eachcol.

For me it only hurts that it makes the design of DataFrames.jl package more complex to have two types serve almost the same purpose: Cols and DFColumnIterator (it is complex enough already which I have learned while working on getindex 馃槃).

OK. But eachcol is also inconvenient if you just want to apply a function to all columns in a data frame.

Yes, but columns is not exported so it does not really matter (unless we decide to export it - and that is why I am asking because I am confused what is best).

Additional points to gather the facts (the first of them was discussed earlier):

  • columns conflicts with Tables.columns (I know that there it is not exported, but nevertheless it might be confusing)
  • we have an implementation which effectively creates Cols object for SubDataFrame and accesses field columns for DataFrame - so it is not a unified interface;
  • I have checked the usage of columns in the package (which is enough as this function is internal) and actually there are not many places where it is used.

+1 to the SubDataFrame concern. Since eachcol is an iterator, it can be used with a subdataframe easily. But columns returns a vector of vectors, making (i think) it infeasible for a subdataframe.

@pdeffebach Note that is that currently columns(sdf) where sdf is a SubDataFrame is defined and returns object of type Cols.

Ah that makes sense. Sorry for the confusion. Yeah it makes sense that Cols is a bit too one-off to be user-facing.

OK. I know what I would do (and fortunately it is functionally non-breaking):

  • remove Cols type;
  • add an additional true/false parameter to DFColumnIterator type;
  • the parameter is interpreted if we return only column values or column names and column values and works as follows:

    • true makes the iterator return tuples (colname, coldata) (current behavior of DFColumnIterator);

    • false makes the iterator return vectors with coldata (current behavior of Cols);

  • eachcol(df) returns DFColumnIterator{T, true};
  • columns(df) returns DFColumnIterator{T, false};
  • we export columns function.

@nalimilan and @pdeffebach - if I have a green light on it and #1585 is merged (as it touches this code also so I want to avoid rebasing) I will implement this as a PR.

Makes sense.

OK - I will implement it along with taking https://github.com/JuliaData/DataFrames.jl/pull/1587/#issuecomment-437106277 into account to make it consistent (for this I will wait with this PR till #1587 is merged so that I can then update the documentation in one shot).

welp, i guess this opens the door to rows(df) with a similar logic. I think we can make an argument that it's not needed because we don't really have a row index, and the row number isnt that important in the scheme of things.

A type-stable iterator over rows of a data frame (passed as named tuples) could be useful if it allowed specializing the user-provided function. But that can be discussed separately.

Closing as this is implemented and merged in consistency with Julia 1.1.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bbrunaud picture bbrunaud  路  3Comments

gustafsson picture gustafsson  路  6Comments

ahalwright picture ahalwright  路  3Comments

garborg picture garborg  路  8Comments

CameronBieganek picture CameronBieganek  路  6Comments