Using DataFrames v0.11.0:
julia> row = DataFrame(a="foo")
1ร1 DataFrames.DataFrame
โ Row โ a โ
โโโโโโโผโโโโโโค
โ 1 โ foo โ
julia> df = similar(row, 0)
0ร1 DataFrames.DataFrame
julia> append!(df, row)
ERROR: Column eltypes do not match
Stacktrace:
[1] append!(::DataFrames.DataFrame, ::DataFrames.DataFrame) at /tmp/julia/v0.6/DataFrames/src/dataframe/dataframe.jl:765
julia> DataFrames.columns(row)
1-element Array{Any,1}:
String["foo"]
julia> DataFrames.columns(df)
1-element Array{Any,1}:
Union{Missings.Missing, String}[]
Maybe we should update the append! element check to use subtype checks:
all(issubtype(el2, el1) for (el1, el2) in zip(eltypes(df1), eltypes(df2)))
I guess we should change similar to respect column eltypes.
Using subtype checks in append! also make sense, but note that in your example it wouldn't help since it's the first eltype which is a subtype of the second one. A more general solution would be to attempt a conversion (i.e. call copy! and see what happens), but it will be tricky to handle when conversion fails mid-way.
@omus The behavior of similar should now be correct on master, thank you for the report. I'll leave this open until we address append!
When loading a file with CSV.jl , it happens quite often that the columns of the dataframe are a union with missing.
This make it difficult to append a new line.
How am I expected to create a dataframe which columns are Union{Float64, Missings.Missing}?
julia> eltypes(DataFrame(c = 0.1::Union{Float64, Missings.Missing}))
1-element Array{Type,1}:
Float64
Seems I'll have to copy a line from the dataframe, change it's value, then maybe it will append ...
Would be much better to be able to insert a dataframe with Float64 in a dataframe composed of Union{Float64, Missings.Missing}.
@JonWel if you pass in a vector you can set the element type appropriately:
julia> eltypes(DataFrame(c = Union{Float64, Missings.Missing}[0.1]))
1-element Array{Type,1}:
Union{Float64, Missings.Missing}
@omus Thanks, I think I'm almost there.
Still a weirdo remaining:
julia> eltypes(df1)
Any[Union{Float64, Missings.Missing}[280.12, 285.12], Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}["B", "B"], Union{Float64, Missings.Missing}[0.0, 0.06], Union{Float64, Missings.Missing}[201900.0, 239700.0], Union{Float64, Missings.Missing}[2.0481e5, 3.12964e5]]
julia> eltypes(df2)
Any[Union{Float64, Missings.Missing}[280.12], CategoricalArrays.CategoricalString{UInt32}["B"], Union{Float64, Missings.Missing}[0.0], Union{Float64, Missings.Missing}[NaN], Union{Float64, Missings.Missing}[NaN]]
I tried something like this, but it fails at the append step:
julia> eltype(Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}[df1[2,3])
Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}
julia> eltypes(df1)
Any[Union{Float64, Missings.Missing}[280.12, 285.12], Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}["B", "B"], Union{Float64, Missings.Missing}[0.0, 0.06], Union{Float64, Missings.Missing}[201900.0, 239700.0], Union{Float64, Missings.Missing}[2.0481e5, 3.12964e5]]
julia> eltypes(df2)
Any[Union{Float64, Missings.Missing}[280.12], Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}["B"], Union{Float64, Missings.Missing}[0.0], Union{Float64, Missings.Missing}[NaN], Union{Float64, Missings.Missing}[NaN]]
julia> append!(df1,df2)
ERROR: LoadError: MethodError: no method matching append!(::CategoricalArrays.CategoricalArray{Union{Missings.Missing, String},1,UInt32,String,CategoricalArrays.CategoricalString{UInt32},Missings.Missing}, ::Array{Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing},1})
I went arround the issue by imposing a much simpler type when reading the CSV file => CSV.read(file,types=Dict("colx"=>String)); I can do it because that column never have missing values....
We definitely need to relax this eltypes(df1) == eltypes(df2) || error("Column eltypes do not match") check. At the very least it should allow Union{T,Missing}, checking that there are no missing values in df2 columns for which df1 does not allow for missing values.
We could even go one step further and simply rely on the append! methods on column vectors to do the conversion, but if a failure happens some columns may have been mutated already. Maybe calling resize! on all columns in case of failure would be enough.
I've filed PR https://github.com/JuliaData/DataFrames.jl/pull/1432.
This can be closed, right? I get this now:
julia> row = DataFrame(a="foo")
1ร1 DataFrame
โ Row โ a โ
โ โ String โ
โโโโโโโผโโโโโโโโโค
โ 1 โ foo โ
julia> df = similar(row, 0)
0ร1 DataFrame
julia> eltypes(df)
1-element Array{Type,1}:
String
(BTW, it would make sense to print the column types even when there are no rows.)
The issue has been addressed. I would suggest adding an explicit test with similar as the #1432 PR doesn't actually do that.
#1432 didn't change similar, but I don't remember which PR fixed it.
@nalimilan is this fixed now?
Yes but it would be nice to add a test.
Closing as #1686 adds the tests.
Most helpful comment
We definitely need to relax this
eltypes(df1) == eltypes(df2) || error("Column eltypes do not match")check. At the very least it should allowUnion{T,Missing}, checking that there are no missing values indf2columns for whichdf1does not allow for missing values.We could even go one step further and simply rely on the
append!methods on column vectors to do the conversion, but if a failure happens some columns may have been mutated already. Maybe callingresize!on all columns in case of failure would be enough.