Dataframes.jl: 32-bit BoundsError

Created on 8 Oct 2019  ·  5Comments  ·  Source: JuliaData/DataFrames.jl

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Linux (i686-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9980H CPU @ 2.30GHz
  WORD_SIZE: 32
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

I've run into an issue when trying to do a join with two large DataFrames. The CI runners for 32-bit versions for Julia 1.0, 1.1, 1.2 all fail. The 64-bit versions work perfectly fine:

Stacktrace from the CI:

BoundsError: attempt to access 1365559-element Array{Int32,1} at index [0]
  Stacktrace:
   [1] getindex at ./array.jl:731 [inlined]
   [2] group_rows(::DataFrame, ::Bool, ::Bool, ::Bool) at /mnt/builds/DcGs9yxw/0/{REDACTED}/depot/packages/DataFrames/0Em9Q/src/dataframerow/utils.jl:255
   [3] group_rows at /mnt/builds/DcGs9yxw/0/{REDACTED}/depot/packages/DataFrames/0Em9Q/src/dataframerow/utils.jl:248 [inlined]
   [4] #join#237(::Array{Symbol,1}, ::Symbol, ::Bool, ::Nothing, ::Tuple{Bool,Bool}, ::Function, ::DataFrame, ::DataFrame) at /mnt/builds/DcGs9yxw/0/{REDACTED}/depot/packages/DataFrames/0Em9Q/src/abstractdataframe/join.jl:344
   [5] (::getfield(Base, Symbol("#kw##join")))(::NamedTuple{(:on, :makeunique),Tuple{Array{Symbol,1},Bool}}, ::typeof(join), ::DataFrame, ::DataFrame) at ./none:0
...

I've been playing around in VirtualBox with an Ubuntu 32-bit instance. Below is an example of how I am using DataFrames. Note this example on my VirtualBox instance causes a SIGABRT.

using DataFrames
using DataFramesMeta

df_1 = DataFrame(1:2000000)
df_1 = @transform(df_1, time=first.(:A))
df_2 = DataFrame(1:2000000)
df_2 = @transform(df_2, time=first.(:A))

join(df_1, df_2, on=[:time, :A], makeunique=true)

After spending sometime and looking at group_rows I was able to create another example which forces the above stacktrace. It will always crash after g_ix=57979.

using DataFrames

df = DataFrame(A=1:2000000)
groups = Vector{Int}(undef, nrow(df))
ngroups, rhashes, gslots, sorted = DataFrames.row_group_slots(ntuple(i -> df[i], ncol(df)), Val(true), groups, false)
stops = zeros(Int, ngroups)

for g_ix in groups
    stops[g_ix] += 1
end

Most helpful comment

Just tested #1979 this resolves the issue!

All 5 comments

Decided to do a little bit of digging in to this by looking at the row_group_slots function https://github.com/JuliaData/DataFrames.jl/blob/b0d8a87dd8edfadfb458a2121eda78210ec13e0f/src/dataframerow/utils.jl#L102

It looks like the rhashes vector is getting 450 collisions here, and thus 450 elements in the groups array are being set to 0, and since julia arrays start at 1, when stops[g_ix] += 1 is run with g_ix == 0 it breaks.

@nalimilan - you probably have most experience with this part of code base (if you are not available please let me know and I will have a look at this issue).

I spent some more time looking into this just now. I took this code from hash_rows and hashrows_cols!.

using DataFrames

df = DataFrame(A=1:2000000)
tup = ntuple(i -> df[i], ncol(df))
rhashes = zeros(UInt, length(tup[1]))

for (i, col) in enumerate(tup)
    @inbounds for j in eachindex(rhashes)
        el = col[j]
        rhashes[j] = hash(el, rhashes[j])
    end
end

nrow(df) - length(rhashes)  # 450 collisions

The root cause of this is most likely in here: https://github.com/JuliaLang/julia/blob/master/base/hashing2.jl#L30

EDIT: In my example the first instance of this issue can be replicated with:

hash(40237, 0x00000000)
hash(57970, 0x00000000)

These both evaluate to 0x38b05917

Interesting. Yes, hash collisions are expected to happen, and the code is supposed to be able to handle them. Apparently, I broke that by moving this break to the wrong place at https://github.com/JuliaData/DataTables.jl/pull/79:
https://github.com/JuliaData/DataFrames.jl/blob/b0d8a87dd8edfadfb458a2121eda78210ec13e0f/src/dataframerow/utils.jl#L135
Can you check whether https://github.com/JuliaData/DataFrames.jl/pull/1979 fixes it? If so, we should try to add tests for that (hopefully that won't use too much memory for Travis/AppVeyor)

Just tested #1979 this resolves the issue!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

cossio picture cossio  ·  5Comments

pdeffebach picture pdeffebach  ·  8Comments

tlienart picture tlienart  ·  8Comments

cormullion picture cormullion  ·  6Comments

abieler picture abieler  ·  7Comments