Dataframes.jl: DataFrameRow should be more NamedTuple like

Created on 19 Dec 2019 · 14Comments · Source: JuliaData/DataFrames.jl

DataFrameRow is basically a mutable NamedTuple it should support:

splatting into kwargs positions foo(x; row...)
splatting into named tuple literal (; row..., a=1, b=2)
Converting into NamedTuple
merge(row, nt), merge(row, row2), merge(nt, row)
pairs(row)

cc @willtebbutt

Source

oxinabox

👍2

Most helpful comment

OK - I will add also the merge method (it will be probably several of them but this is all that is needed)

bkamins on 20 Dec 2019

👍2

All 14 comments

I'm curious what you are doing that requires a DataFrameRow to be used like that? I always imagined a DataFrameRow should be basically a vector that can be indexed with a symbol.

pdeffebach on 19 Dec 2019

I'm trying to achieve the following:

map a function over each row of a DataFrame, which spits out a couple of new fields.
Create a new DataFrame containing the existing data, with the new fields concatenated on the end.

One way to do this is to return from map(eachrow(df)) do row ... end a Vector of NamedTuples (i.e. a table), which can then be used to construct a new DataFrame, where the NamedTuples contain the existing and new fields. To do this, you need to convert a DataFrameRow to a NamedTuple, which can currently be achieved as follows:

new_names = vcat(names(row), vector_of_new_names)
new_values = vcat(convert(Vector, row), vector_of_corresponding_values)
(; Pairs.(new_names, new_values)...)

willtebbutt on 19 Dec 2019

Currently the way to convert DataFrameRow to NamedTuple is to call copy on it.

Also pairs(::DataFrameRow) is defined.

So I have the following questions (here I am rather OK to add it):

do you want to NamedTuple(::DataFrameRow) to be defined?
do you want convert(::Type{NamedTuple}, ::DataFrameRow) to be defined?
do you want convert(::Type{Tuple}, ::DataFrameRow) to be defined? (Tuple(::DataFrameRow) exists)

And do you want to have these (here I am more reluctant, as you can always call copy on DataFrameRow to get what you want, so this seems not to be strictly needed; the only drawback is that merge is needed for splatting):

merge allowing arbitrary mixing of NamedTuple and DataFrameRow and producing a NamedTuple (I am reluctant because it is tricky to get it 100% right and not stress the compiler at the same time)

bkamins on 19 Dec 2019

👍1

Wait, copy is how to convert a DataFrameRow into a NamedTuple ?
What is the logic behind that?

oxinabox on 19 Dec 2019

DataFrameRow is a view, and views in base are materialized by copy (e.g. standard view, in linear algebra wrappers around arrays etc.). We materialize to a NamedTuple, what other data type would you find suitable for this?

bkamins on 19 Dec 2019

I guess that makes sense.

oxinabox on 19 Dec 2019

In summary, I will add only convert and NamedTuple methods and omit merge. OK?

CC @nalimilan

bkamins on 20 Dec 2019

What about making (; row..., a=1) work?
That's what I want most

oxinabox on 20 Dec 2019

Why not if that's not too hard, but first converting to NamedTuple will also work and it's not too hard to do.

(FWIW, the kind of operation you describe really sounds like a job for the select we're talking about for some time.)

nalimilan on 20 Dec 2019

Yeah, @willtebbutt was saying that when we were talking about this in person

oxinabox on 20 Dec 2019

So what is the conclusion - do we want this splatting (which implicates implementing merge or not).

As I have noted the needed definition will be a bit tricky (it is doable). The first natural implementation:

merge(b::Union{NamedTuple, DataFrameRow}...) = merge(NamedTuple.(b)...)

is not good due to method ambiguities, so I would have to define special cases for 1, 2 and more than 2 positional arguments.

bkamins on 20 Dec 2019

If it's not harder than that, then yes, sounds worth it.

nalimilan on 20 Dec 2019

OK - I will add also the merge method (it will be probably several of them but this is all that is needed)

bkamins on 20 Dec 2019

👍2

See #2060

bkamins on 20 Dec 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Provide indexing feature to allow for fast sort, join, and group-by operations

xiaodaigh · 7Comments

Can writetable write to a stream?

ahalwright · 3Comments

Materializing TableTraits sources via Tables.jl is slow

davidanthoff · 4Comments

Problems in groupreduce_init

bkamins · 8Comments

More intuitive functions

bbrunaud · 3Comments