Dataframes.jl: `similar(::DataFrame, 0)` changes column types

Created on 27 Nov 2017  ยท  14Comments  ยท  Source: JuliaData/DataFrames.jl

Using DataFrames v0.11.0:

julia> row = DataFrame(a="foo")
1ร—1 DataFrames.DataFrame
โ”‚ Row โ”‚ a   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1   โ”‚ foo โ”‚

julia> df = similar(row, 0)
0ร—1 DataFrames.DataFrame


julia> append!(df, row)
ERROR: Column eltypes do not match
Stacktrace:
 [1] append!(::DataFrames.DataFrame, ::DataFrames.DataFrame) at /tmp/julia/v0.6/DataFrames/src/dataframe/dataframe.jl:765

julia> DataFrames.columns(row)
1-element Array{Any,1}:
 String["foo"]

julia> DataFrames.columns(df)
1-element Array{Any,1}:
 Union{Missings.Missing, String}[]

Most helpful comment

We definitely need to relax this eltypes(df1) == eltypes(df2) || error("Column eltypes do not match") check. At the very least it should allow Union{T,Missing}, checking that there are no missing values in df2 columns for which df1 does not allow for missing values.

We could even go one step further and simply rely on the append! methods on column vectors to do the conversion, but if a failure happens some columns may have been mutated already. Maybe calling resize! on all columns in case of failure would be enough.

All 14 comments

Maybe we should update the append! element check to use subtype checks:

all(issubtype(el2, el1) for (el1, el2) in zip(eltypes(df1), eltypes(df2)))

I guess we should change similar to respect column eltypes.

Using subtype checks in append! also make sense, but note that in your example it wouldn't help since it's the first eltype which is a subtype of the second one. A more general solution would be to attempt a conversion (i.e. call copy! and see what happens), but it will be tricky to handle when conversion fails mid-way.

@omus The behavior of similar should now be correct on master, thank you for the report. I'll leave this open until we address append!

When loading a file with CSV.jl , it happens quite often that the columns of the dataframe are a union with missing.
This make it difficult to append a new line.

How am I expected to create a dataframe which columns are Union{Float64, Missings.Missing}?

julia> eltypes(DataFrame(c = 0.1::Union{Float64, Missings.Missing}))
1-element Array{Type,1}:
 Float64

Seems I'll have to copy a line from the dataframe, change it's value, then maybe it will append ...

Would be much better to be able to insert a dataframe with Float64 in a dataframe composed of Union{Float64, Missings.Missing}.

@JonWel if you pass in a vector you can set the element type appropriately:

julia> eltypes(DataFrame(c = Union{Float64, Missings.Missing}[0.1]))
1-element Array{Type,1}:
 Union{Float64, Missings.Missing}

@omus Thanks, I think I'm almost there.

Still a weirdo remaining:

julia> eltypes(df1)
Any[Union{Float64, Missings.Missing}[280.12, 285.12], Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}["B", "B"], Union{Float64, Missings.Missing}[0.0, 0.06], Union{Float64, Missings.Missing}[201900.0, 239700.0], Union{Float64, Missings.Missing}[2.0481e5, 3.12964e5]]
julia> eltypes(df2)
Any[Union{Float64, Missings.Missing}[280.12], CategoricalArrays.CategoricalString{UInt32}["B"], Union{Float64, Missings.Missing}[0.0], Union{Float64, Missings.Missing}[NaN], Union{Float64, Missings.Missing}[NaN]]

I tried something like this, but it fails at the append step:

julia> eltype(Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}[df1[2,3])
Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}

julia> eltypes(df1)
Any[Union{Float64, Missings.Missing}[280.12, 285.12], Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}["B", "B"], Union{Float64, Missings.Missing}[0.0, 0.06], Union{Float64, Missings.Missing}[201900.0, 239700.0], Union{Float64, Missings.Missing}[2.0481e5, 3.12964e5]]
julia> eltypes(df2)
Any[Union{Float64, Missings.Missing}[280.12], Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}["B"], Union{Float64, Missings.Missing}[0.0], Union{Float64, Missings.Missing}[NaN], Union{Float64, Missings.Missing}[NaN]]

julia> append!(df1,df2)
ERROR: LoadError: MethodError: no method matching append!(::CategoricalArrays.CategoricalArray{Union{Missings.Missing, String},1,UInt32,String,CategoricalArrays.CategoricalString{UInt32},Missings.Missing}, ::Array{Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing},1})

I went arround the issue by imposing a much simpler type when reading the CSV file => CSV.read(file,types=Dict("colx"=>String)); I can do it because that column never have missing values....

We definitely need to relax this eltypes(df1) == eltypes(df2) || error("Column eltypes do not match") check. At the very least it should allow Union{T,Missing}, checking that there are no missing values in df2 columns for which df1 does not allow for missing values.

We could even go one step further and simply rely on the append! methods on column vectors to do the conversion, but if a failure happens some columns may have been mutated already. Maybe calling resize! on all columns in case of failure would be enough.

This can be closed, right? I get this now:

julia> row = DataFrame(a="foo")
1ร—1 DataFrame
โ”‚ Row โ”‚ a      โ”‚
โ”‚     โ”‚ String โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ 1   โ”‚ foo    โ”‚

julia> df = similar(row, 0)
0ร—1 DataFrame


julia> eltypes(df)
1-element Array{Type,1}:
 String

(BTW, it would make sense to print the column types even when there are no rows.)

The issue has been addressed. I would suggest adding an explicit test with similar as the #1432 PR doesn't actually do that.

#1432 didn't change similar, but I don't remember which PR fixed it.

@nalimilan is this fixed now?

Yes but it would be nice to add a test.

Closing as #1686 adds the tests.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

abieler picture abieler  ยท  7Comments

rofinn picture rofinn  ยท  3Comments

blackeneth picture blackeneth  ยท  5Comments

CameronBieganek picture CameronBieganek  ยท  6Comments

garborg picture garborg  ยท  8Comments