It seems like there should be a faster way to iterate over the rows of a DataFrame:
julia> d = [(a=rand(),b=rand()) for _ in 1:10^6];
julia> df = DataFrame(d);
julia> function f(xs)
s = 0.0;
for x in xs
s += x.a * x.b
end
s
end
f (generic function with 1 method)
julia> function g(xs)
s = 0.0
for x in eachrow(xs)
s += x.a * x.b
end
s
end
g (generic function with 1 method)
julia> @btime f($d)
577.269 渭s (0 allocations: 0 bytes)
249855.20496448214
julia> @btime g($df)
105.782 ms (6998979 allocations: 122.05 MiB)
249855.20496448386
julia> versioninfo()
Julia Version 1.3.0-DEV.0
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.0 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 4
JULIA_CMDSTAN_HOME = /tmp/cmdstan-2.19.1
DataFrame is not type stable so, unfortunately, this is what you will experience.
If you use barrier function you can speed it up and avoid allocations:
julia> g(xs) = _g(Tables.rows(xs))
g (generic function with 1 method)
julia> function _g(xsr)
s = 0.0
for x in xsr
s += x.a * x.b
end
s
end
_g (generic function with 1 method)
julia> @btime g($df)
1.089 ms (16 allocations: 688 bytes)
249854.8799023578
alternatively you could simply extract the vectors you want to work with and pass them to an inner function specified by a barrier and this should be also fast.
Most helpful comment
DataFrameis not type stable so, unfortunately, this is what you will experience.If you use barrier function you can speed it up and avoid allocations:
alternatively you could simply extract the vectors you want to work with and pass them to an inner function specified by a barrier and this should be also fast.