Dataframes.jl: deleterows! error

Created on 5 Apr 2018  ยท  13Comments  ยท  Source: JuliaData/DataFrames.jl

I got this error, using julia v0.6.2, DataFrames v0.11.5 and CSV v0.2.4:

using CSV
dat = CSV.read("example.csv")
using DataFrames
deleterows!(dat, 2)
ERROR: MethodError: no method matching deleteat!(::WeakRefStrings.WeakRefStringArray{WeakRefString{UInt8},1,Missings.Missing}, ::Int64)
Closest candidates are:
  deleteat!(::Array{T,1} where T, ::Integer) at array.jl:875
  deleteat!(::Array{T,1} where T, ::Any) at array.jl:913
  deleteat!(::BitArray{1}, ::Integer) at bitarray.jl:961
  ...
Stacktrace:
[1] deleterows!(::DataFrames.DataFrame, ::Int64) at /Users/ane/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:779

Strangely enough, we (@pbastide) can make the error go away like this:

dat = CSV.read("example.csv", 
      types=[Union{Missing, String},
             Union{Missing, Float64},
             Union{Missing, Float64}])
deleterows!(dat, 2) # no error, correct result

The example file is a standard csv file:

julia> dat = CSV.read("example.csv")
4ร—3 DataFrames.DataFrame
โ”‚ Row โ”‚ tipNames    โ”‚ sword_index โ”‚ preference โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1   โ”‚ Xalvarezi   โ”‚ 0.65        โ”‚ missing    โ”‚
โ”‚ 2   โ”‚ Xbirchmanni โ”‚ 0.275       โ”‚ -0.33      โ”‚
โ”‚ 3   โ”‚ Xclemenciae โ”‚ 0.564       โ”‚ 0.44       โ”‚
โ”‚ 4   โ”‚ Xcontinens  โ”‚ 0.3         โ”‚ missing    โ”‚

julia> eltypes(dat)
3-element Array{Type,1}:
 Union{Missings.Missing, String} 
 Union{Float64, Missings.Missing}
 Union{Float64, Missings.Missing}

All 13 comments

That's because a WeakRefStringArray doesn't support deleteat!. Would be worth filing an issue there.

CSV.jl should probably stop returning WeakRefStringArray columns by default too, since it creates a lot of complications (https://github.com/JuliaData/CSV.jl/issues/180#issuecomment-373667822).

It looks like deleteat! accepts Vector but not AbstractVector. So maybe the solution is to make a pr to Base.

Not all AbstractVectors are mutable. Two notable examples are UnitRanges, e.g. 1:5, and StaticArrays from the StaticArrays package. So it's not safe to assume that deleteat! should work for any AbstractVector.

I hit a similar issue again. Not sure if this is an issue for DataFrames or CSV, but it's a problem for deleterows!:

julia> using CSV

julia> using DataFrames

julia> dat1 = DataFrame(
           tipNames = ["Xalvarezi","Xandersi","Xbirchmanni","Xclemenciae"],
           sword_index = [0.65, 0.35, 0.275, 0.564],
           preference = [missing, missing, -0.33, 0.44]);

julia> CSV.write("issue_deleterows.csv", dat1);

julia> deleterows!(dat1, 2) # all is good
3ร—3 DataFrame
โ”‚ Row โ”‚ tipNames    โ”‚ sword_index โ”‚ preference โ”‚
โ”‚     โ”‚ String      โ”‚ Float64     โ”‚ Float64โฐ   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1   โ”‚ Xalvarezi   โ”‚ 0.65        โ”‚ missing    โ”‚
โ”‚ 2   โ”‚ Xbirchmanni โ”‚ 0.275       โ”‚ -0.33      โ”‚
โ”‚ 3   โ”‚ Xclemenciae โ”‚ 0.564       โ”‚ 0.44       โ”‚

julia> dat2 = CSV.read("issue_deleterows.csv") # should be the same as dat1
4ร—3 DataFrame
โ”‚ Row โ”‚ tipNames    โ”‚ sword_index โ”‚ preference โ”‚
โ”‚     โ”‚ String      โ”‚ Float64     โ”‚ Float64โฐ   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1   โ”‚ Xalvarezi   โ”‚ 0.65        โ”‚ missing    โ”‚
โ”‚ 2   โ”‚ Xandersi    โ”‚ 0.35        โ”‚ missing    โ”‚
โ”‚ 3   โ”‚ Xbirchmanni โ”‚ 0.275       โ”‚ -0.33      โ”‚
โ”‚ 4   โ”‚ Xclemenciae โ”‚ 0.564       โ”‚ 0.44       โ”‚

julia> deleterows!(dat2, 2)
ERROR: MethodError: no method matching deleteat!(::CSV.Column{String,String}, ::Int64)
Closest candidates are:
  deleteat!(::Array{T,1} where T, ::Integer) at array.jl:1175
  deleteat!(::Array{T,1} where T, ::Any) at array.jl:1212
  deleteat!(::BitArray{1}, ::Integer) at bitarray.jl:909
  ...
Stacktrace:
 [1] (::getfield(DataFrames, Symbol("##111#112")){Int64})(::CSV.Column{String,String}) at /Users/ane/.julia/packages/DataFrames/CZrca/src/dataframe/dataframe.jl:953
 [2] foreach(::getfield(DataFrames, Symbol("##111#112")){Int64}, ::Array{AbstractArray{T,1} where T,1}) at ./abstractarray.jl:1866
 [3] deleterows!(::DataFrame, ::Int64) at /Users/ane/.julia/packages/DataFrames/CZrca/src/dataframe/dataframe.jl:953
 [4] top-level scope at none:0

This is with julia v1.1.1.

I will use this workaround for now:

julia> dat3 = dat2[:,:];

julia> deleterows!(dat3, 2) # all good
3ร—3 DataFrame
โ”‚ Row โ”‚ tipNames    โ”‚ sword_index โ”‚ preference โ”‚
โ”‚     โ”‚ String      โ”‚ Float64     โ”‚ Float64โฐ   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1   โ”‚ Xalvarezi   โ”‚ 0.65        โ”‚ missing    โ”‚
โ”‚ 2   โ”‚ Xbirchmanni โ”‚ 0.275       โ”‚ -0.33      โ”‚
โ”‚ 3   โ”‚ Xclemenciae โ”‚ 0.564       โ”‚ 0.44       โ”‚

Please refer to CSV.jl documentation:

  • CSV.read("issue_deleterows.csv") returns a read-only DataFrame
  • if you want a mutable DataFrame use DataFrame(CSV.File("issue_deleterows.csv")) or CSV.read("issue_deleterows.csv", copycols=true)

@quinnj Don't you agree now that the default should be to make a copy? People keep getting confused by that.

Ha, I see! Thanks a bunch. Yes, I had not heard about this option, and I could not see how the various versions of the data differed. Many thanks!

Nope; I still think we'd get just as many "performance" issues if copying was the default.

I think we can close this issue. It seems what @quinnj implemented in CSV.jl will stay ๐Ÿ˜„.

Could DataFrames detect the issue and give a friendlier error message? My first thought on seeing "no method matching deleteat!" was of a bug in DataFrames.jl.

This should be handled not in DataFrames.jl but in the package defining a specific vector type I think (in this case it is CSV.jl).

The reason is that DataFrames.jl accepts whatever vector type you give it and cannot know what is the exact API of this type, only the package that defines the concrete vector type can safely decide what is the right error message (in this case probably something like "CSV.jl by default produces read only columns so they cannot be resized in-place").

Yes it would make sense to implement this in CSV.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

abieler picture abieler  ยท  7Comments

bkamins picture bkamins  ยท  8Comments

mattBrzezinski picture mattBrzezinski  ยท  5Comments

bkamins picture bkamins  ยท  8Comments

xiaodaigh picture xiaodaigh  ยท  5Comments