Dataframes.jl: missing vs nothing

Created on 20 Jun 2019  Β·  10Comments  Β·  Source: JuliaData/DataFrames.jl

Hello,

After asking on SO https://stackoverflow.com/questions/56684447/convert-a-julia-dataframe-column-with-string-to-one-with-int-and-missing-values/56685891?noredirect=1#comment99940131_56685891 I think this should in fact be discussed here.

I need to convert the following DataFrame

julia> df = DataFrame(:A=>["", "2", "3"], :B=>[1.1, 2.2, 3.3])

which looks like

3Γ—2 DataFrame
β”‚ Row β”‚ A      β”‚ B       β”‚
β”‚     β”‚ String β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚        β”‚ 1.1     β”‚
β”‚ 2   β”‚ 2      β”‚ 2.2     β”‚
β”‚ 3   β”‚ 3      β”‚ 3.3     β”‚

I would like to convert A column from Array{String,1} to array of Int with missing values.

I tried

julia> df.A = tryparse.(Int, df.A)
3-element Array{Union{Nothing, Int64},1}:
  nothing
 2
 3

julia> df
3Γ—2 DataFrame
β”‚ Row β”‚ A      β”‚ B       β”‚
β”‚     β”‚ Union… β”‚ Float64 β”‚
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1   β”‚        β”‚ 1.1     β”‚
β”‚ 2   β”‚ 2      β”‚ 2.2     β”‚
β”‚ 3   β”‚ 3      β”‚ 3.3     β”‚

julia> eltype(df.A)
Union{Nothing, Int64}

but I'm getting A column with elements of type Union{Nothing, Int64}.

nothing (of type Nothing) and missing (of type Missing) seems to be 2 differents kind of values.

After asking on SO, it seems that a solution could be

julia> df.A = map(x->begin val = tryparse(Int, x)
                           ifelse(typeof(val) == Nothing, missing, val)
                      end, df.A)
3-element Array{Union{Missing, Int64},1}:
  missing
 2
 3

Despite it perfectly answered my question I don't think that's what we can expect from DataFrames users to do so.

Maybe we should have a function which could replace nothing by missing or maybe another approach could be to have an other definition for tryparse function (which could output missing).

What is you opinion?

Kind regards

Most helpful comment

I think the reason it isn't included is because of how simple it is. As long as you're aware of the something function (and its missing counterpart coalesce), there are some really quick ways to switch between things.

All 10 comments

I noticed that replacing nothing by missing can be done using:

df.A = replace(df.A, nothing=>missing)

maybe doc should provide such a DataFrame example (with values as String, tryparse to parse as Int, and replace)

Having tryparse being able to directly return missing when a String can't be parsed would simplify this https://github.com/JuliaLang/julia/issues/32378

How about:

tryparsem(T, str) = something(tryparse(T, str), missing)
df.A = tryparsem.(df.A)

I didn't know something function. Thanks for the idea.
Should tryparsem be included in Base or in DataFrames.jl or in user code?
This idea is too clever for not being part of a package or the language itself :wink:

I think the reason it isn't included is because of how simple it is. As long as you're aware of the something function (and its missing counterpart coalesce), there are some really quick ways to switch between things.

@scls19fr - can this be closed given the solution given by @quinnj?

I'm still wondering what should be done here and I definitely think that closing this simply is not the best action.

At least, the doc should be improved to provide this idea.

But I still don't know why we couldn't / shouldn't add such a function (even if it's so simple).

By providing such a function in Base or in DataFrame it will urge developer to use same function name which is (imho) a good practice to improve code readability.

I am asking, because this functionality is not DataFrames.jl related. It should live in Base or Missings.jl (probably you can first discuss it in Missings.jl, as this is a place where experimental missing relate functionality is implemented before it is introduced in Base).

Ok it seems you opened a quite similar issue https://github.com/JuliaData/Missings.jl/issues/61

Yes - but we were not sure what was the best way to do it 😞.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

xiaodaigh picture xiaodaigh  Β·  7Comments

davidanthoff picture davidanthoff  Β·  4Comments

blackeneth picture blackeneth  Β·  5Comments

CameronBieganek picture CameronBieganek  Β·  6Comments

abieler picture abieler  Β·  7Comments