Dataframes.jl: select!(df, Not(tuple)) does not work

Created on 30 Oct 2019  路  8Comments  路  Source: JuliaData/DataFrames.jl

The issue is pretty simple:

unwanted = (:col1, :col2, :col3)
select!(df, Not(unwanted))

will fail (getindex error)

unwanted = [:col1, :col2, :col3]
select!(df, Not(unwanted))

will work fine.

Maybe the behaviour from (2) could be generalised to iterables?

Most helpful comment

The documentation does not mention that tuples are not allowed.

Tuples are not allowed in Julia in general. If you feel this should be mentioned somewhere in DataFrames.jl documentation please feel free to make a PR.

Removing a column by index is very dangerous.

Agreed. We simply support both options, as sometimes it is more convenient to use number.

Additionally, using select instead of delete is also confusing.

That is why we specifically give an example of it in Getting Started part of the manual (supposedly the first thing one reads when learning DataFrames.jl). You can find it here.

I will add these examples also to docstrings of select and select! to make sure it is clear.
Note that this pattern of dropping columns is standard in e.g. dplyr.

@nalimilan - do you think reverting deletecols! and deletecols is justified (I think having select/select! and getindex to do this job is enough).

All 8 comments

This is intentional. Note that for e.g. arrays you have the same:

julia> using InvertedIndices

julia> x = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> x[Not(1)]
2-element Array{Int64,1}:
 2
 3

julia> x[Not((1,2))]
ERROR: ArgumentError: invalid index: (1, 2) of type Tuple{Int64,Int64}

Ok then it's fine and thanks for the explanation, it was the first time I was trying to drop columns and was using delete! which prompted me to use this Not syntax I didn't know.

Thank you so much! The documentation does not mention that tuples are not allowed.

I spent two hours trying to know how to remove a column from a dataframe.
I am not sure why such a simple task has to be so complicated.

I urge the developers to use column labels more often than column indices.
That is the purpose of dataframes. Indices are meant more for Arrays.

Removing a column by index is very dangerous.
If a table schema has changed, you will remove the wrong columns.

Additionally, using select instead of delete is also confusing.
Please, consider bringing back delete (with column names).

Thank you!

The documentation does not mention that tuples are not allowed.

Tuples are not allowed in Julia in general. If you feel this should be mentioned somewhere in DataFrames.jl documentation please feel free to make a PR.

Removing a column by index is very dangerous.

Agreed. We simply support both options, as sometimes it is more convenient to use number.

Additionally, using select instead of delete is also confusing.

That is why we specifically give an example of it in Getting Started part of the manual (supposedly the first thing one reads when learning DataFrames.jl). You can find it here.

I will add these examples also to docstrings of select and select! to make sure it is clear.
Note that this pattern of dropping columns is standard in e.g. dplyr.

@nalimilan - do you think reverting deletecols! and deletecols is justified (I think having select/select! and getindex to do this job is enough).

Thanks bkamins for the timely and caring reply.
I find your reply highly informative.

I am new to Julia (switching from Python)
I will go through the link you provided.

Thanks again!

In the link you have df[:, Not(:col)] example only, therefore I will also add an example using select as noted above.

Yes, please!

See https://github.com/JuliaData/DataFrames.jl/pull/2011 (please leave a comment if something is not clear)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ahalwright picture ahalwright  路  3Comments

mattBrzezinski picture mattBrzezinski  路  5Comments

gustafsson picture gustafsson  路  6Comments

yakir12 picture yakir12  路  6Comments

bbrunaud picture bbrunaud  路  3Comments