See:
https://github.com/JuliaData/DataFrames.jl/issues/1935#issuecomment-586967550
for a dicsussion.
This applies to: combine, by, select, transform and filter.
Continuing the discussion from #1935, it's still ambiguous to me how this interacts with broadcasted selection .=> tranform pairs.
Consider a DataFrame with an :age column and a large number of measurement columns (for example, let's say [:x, :y, :z]). Perhaps you want to see what the correlation is between age and each other column.
# knowing what they're called in advance, you can do
Splat.(:age, [:x, :y, :z]) .=> cor
# if instead, we don't know what the measurement columns are
# named, we might use a `Regex` or `Not`
Splat.(:age, Not(:age)) .=> cor
This raises some ambiguity. Syntactically, you can't broadcast Splat (without a specialized method) because Not(:age) doesn't have a length and doesn't yet know what columns it's selecting. Similar ambiguity arises if you're trying to work with pairwise combinations of columns.
I don't think all of these necessarily need to be accommodated, as such a niche use case might not warrant such syntactic shorthand, but raise it for consideration about where to draw this line.
We could add broadcasting to Splat in the future, but I do not find it a crucial functionality for now. In such complex cases I think it is cleaner to just write:
by(df, :col) do sdf
whatever_you_need_to_do_with_your_sdf
end
We have decided to auto-splat by default (to lower compilation cost).
The actions are the following:
NamedTuple), my initial idea is NT, but it does not seem very nice, maybe Table is a better name.NamedTuple API)by/combine API (this is the hard part, as we have to go through deprecation period here)In order to resolve this we need to:
NamedTuple; is NT OKcombinehas to be updatedWe have settled to auto-splat.
So now the only thing to track is that in some future we should add a NT wrapper to allow passing a named tuple instead of auto-splatting to a function.
Probably NT is not a great name so a crucial thing it to have a good idea here.
I can't think of a better name than NT but am definitely looking forward to this functionality.
An added benefit is that passing a named tuple of vectors means that any function written for generic Tables will work when we pass a named tuple. People can write generic functions using Query or TableOperations and it will work.
This is a good point! Maybe we should settle for NTif no one has a better name (it seems pretty natural and is short).
NT sounds really to obscure. We never use acronyms like this. WithNames? Anything with an explicit name would be better IMHO.
So maybe AsTable?
@nalimilan - this is the last pending decision for 1.0 release. All else is decided and implemented - just waiting for reviews and merging.
So do we want to go for the AsTable name? (or you prefer WithNames?) - I prefer AsTable as it is a bit shorter and actually the user does not care if the table is named tuple or something else. We pass NamedTuple for performance.
Alternatively we can decide to make a 0,21 release without this feature and discuss it after the release (but possibly before 1.0 - we will probably have several months to think about it).
What is your opinion on this?
AsTable sounds good.