I got this error, using julia v0.6.2, DataFrames v0.11.5 and CSV v0.2.4:
using CSV
dat = CSV.read("example.csv")
using DataFrames
deleterows!(dat, 2)
ERROR: MethodError: no method matching deleteat!(::WeakRefStrings.WeakRefStringArray{WeakRefString{UInt8},1,Missings.Missing}, ::Int64)
Closest candidates are:
deleteat!(::Array{T,1} where T, ::Integer) at array.jl:875
deleteat!(::Array{T,1} where T, ::Any) at array.jl:913
deleteat!(::BitArray{1}, ::Integer) at bitarray.jl:961
...
Stacktrace:
[1] deleterows!(::DataFrames.DataFrame, ::Int64) at /Users/ane/.julia/v0.6/DataFrames/src/dataframe/dataframe.jl:779
Strangely enough, we (@pbastide) can make the error go away like this:
dat = CSV.read("example.csv",
types=[Union{Missing, String},
Union{Missing, Float64},
Union{Missing, Float64}])
deleterows!(dat, 2) # no error, correct result
The example file is a standard csv file:
julia> dat = CSV.read("example.csv")
4ร3 DataFrames.DataFrame
โ Row โ tipNames โ sword_index โ preference โ
โโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโค
โ 1 โ Xalvarezi โ 0.65 โ missing โ
โ 2 โ Xbirchmanni โ 0.275 โ -0.33 โ
โ 3 โ Xclemenciae โ 0.564 โ 0.44 โ
โ 4 โ Xcontinens โ 0.3 โ missing โ
julia> eltypes(dat)
3-element Array{Type,1}:
Union{Missings.Missing, String}
Union{Float64, Missings.Missing}
Union{Float64, Missings.Missing}
That's because a WeakRefStringArray doesn't support deleteat!. Would be worth filing an issue there.
CSV.jl should probably stop returning WeakRefStringArray columns by default too, since it creates a lot of complications (https://github.com/JuliaData/CSV.jl/issues/180#issuecomment-373667822).
It looks like deleteat! accepts Vector but not AbstractVector. So maybe the solution is to make a pr to Base.
Not all AbstractVectors are mutable. Two notable examples are UnitRanges, e.g. 1:5, and StaticArrays from the StaticArrays package. So it's not safe to assume that deleteat! should work for any AbstractVector.
I hit a similar issue again. Not sure if this is an issue for DataFrames or CSV, but it's a problem for deleterows!:
julia> using CSV
julia> using DataFrames
julia> dat1 = DataFrame(
tipNames = ["Xalvarezi","Xandersi","Xbirchmanni","Xclemenciae"],
sword_index = [0.65, 0.35, 0.275, 0.564],
preference = [missing, missing, -0.33, 0.44]);
julia> CSV.write("issue_deleterows.csv", dat1);
julia> deleterows!(dat1, 2) # all is good
3ร3 DataFrame
โ Row โ tipNames โ sword_index โ preference โ
โ โ String โ Float64 โ Float64โฐ โ
โโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโค
โ 1 โ Xalvarezi โ 0.65 โ missing โ
โ 2 โ Xbirchmanni โ 0.275 โ -0.33 โ
โ 3 โ Xclemenciae โ 0.564 โ 0.44 โ
julia> dat2 = CSV.read("issue_deleterows.csv") # should be the same as dat1
4ร3 DataFrame
โ Row โ tipNames โ sword_index โ preference โ
โ โ String โ Float64 โ Float64โฐ โ
โโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโค
โ 1 โ Xalvarezi โ 0.65 โ missing โ
โ 2 โ Xandersi โ 0.35 โ missing โ
โ 3 โ Xbirchmanni โ 0.275 โ -0.33 โ
โ 4 โ Xclemenciae โ 0.564 โ 0.44 โ
julia> deleterows!(dat2, 2)
ERROR: MethodError: no method matching deleteat!(::CSV.Column{String,String}, ::Int64)
Closest candidates are:
deleteat!(::Array{T,1} where T, ::Integer) at array.jl:1175
deleteat!(::Array{T,1} where T, ::Any) at array.jl:1212
deleteat!(::BitArray{1}, ::Integer) at bitarray.jl:909
...
Stacktrace:
[1] (::getfield(DataFrames, Symbol("##111#112")){Int64})(::CSV.Column{String,String}) at /Users/ane/.julia/packages/DataFrames/CZrca/src/dataframe/dataframe.jl:953
[2] foreach(::getfield(DataFrames, Symbol("##111#112")){Int64}, ::Array{AbstractArray{T,1} where T,1}) at ./abstractarray.jl:1866
[3] deleterows!(::DataFrame, ::Int64) at /Users/ane/.julia/packages/DataFrames/CZrca/src/dataframe/dataframe.jl:953
[4] top-level scope at none:0
This is with julia v1.1.1.
I will use this workaround for now:
julia> dat3 = dat2[:,:];
julia> deleterows!(dat3, 2) # all good
3ร3 DataFrame
โ Row โ tipNames โ sword_index โ preference โ
โ โ String โ Float64 โ Float64โฐ โ
โโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโค
โ 1 โ Xalvarezi โ 0.65 โ missing โ
โ 2 โ Xbirchmanni โ 0.275 โ -0.33 โ
โ 3 โ Xclemenciae โ 0.564 โ 0.44 โ
Please refer to CSV.jl documentation:
CSV.read("issue_deleterows.csv") returns a read-only DataFrameDataFrame use DataFrame(CSV.File("issue_deleterows.csv")) or CSV.read("issue_deleterows.csv", copycols=true)@quinnj Don't you agree now that the default should be to make a copy? People keep getting confused by that.
Ha, I see! Thanks a bunch. Yes, I had not heard about this option, and I could not see how the various versions of the data differed. Many thanks!
Nope; I still think we'd get just as many "performance" issues if copying was the default.
I think we can close this issue. It seems what @quinnj implemented in CSV.jl will stay ๐.
Could DataFrames detect the issue and give a friendlier error message? My first thought on seeing "no method matching deleteat!" was of a bug in DataFrames.jl.
This should be handled not in DataFrames.jl but in the package defining a specific vector type I think (in this case it is CSV.jl).
The reason is that DataFrames.jl accepts whatever vector type you give it and cannot know what is the exact API of this type, only the package that defines the concrete vector type can safely decide what is the right error message (in this case probably something like "CSV.jl by default produces read only columns so they cannot be resized in-place").
Yes it would make sense to implement this in CSV.