Elixir: flat_map is slower than map |> flatten (and erlangs :lists.flatmap)

Created on 2 Aug 2016 · 5Comments · Source: elixir-lang/elixir

(this is not strictly a bug but a somewhat surprising performance degradation I found, hope the issue tracker is still the appropriate place :))

Environment

tobi@airship ~/github/benchee $ elixir -v
Erlang/OTP 19 [erts-8.0] [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Elixir 1.3.2
tobi@airship ~/github/benchee $ uname -a
Linux airship 3.19.0-32-generic #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Current behavior

In my benchmark Enum.flat_map is a lot slower (~1.6) than doing map first and then flattening the list, which is unexpected as flat_map is specialized and therefore ought to be faster (I think and it is in many languages). Also :lists.flatmap is 2.7 times faster, so there should be some room for improvements.

It is entirely possible that I misconfigured something, am not aware of some performance implications. Happy to try and help, also feel free to run it yourself

without further ado, the benchmark (using benchee):

list = Enum.to_list(1..10_000)
map_fun = fn(i) -> [i, i * i] end

Benchee.run(%{time: 8}, %{
  ":lists.flatmap" => fn -> :lists.flatmap(map_fun, list) end,
  "flat_map"        => fn -> Enum.flat_map(list, map_fun) end,
  "map |> flatten"  => fn -> list |> Enum.map(map_fun) |> List.flatten end
})

and the result:

tobi@airship ~/github/benchee $ mix run samples/flat_map.exs 
Benchmark suite executing with the following configuration:
warmup: 2.0s
time: 8.0s
parallel: 1
Estimated total run time: 30.0s

Benchmarking :lists.flatmap...
Benchmarking flat_map...
Benchmarking map |> flatten...

Name                     ips        average    deviation         median
:lists.flatmap       1832.87       545.59μs    (±10.08%)       527.00μs
map |> flatten       1075.51       929.79μs    (±13.91%)       908.00μs
flat_map              707.04      1414.34μs     (±8.72%)      1374.50μs

Comparison: 
:lists.flatmap       1832.87
map |> flatten       1075.51 - 1.70x slower
flat_map              707.04 - 2.59x slower

Expected behavior

flat_map should be faster than map |> flatten and ideally somewhere in the ballpark of :lists.flatmap.

Source

PragTob

Most helpful comment

You sir, are amazing.

# old benchmark with 10_000 element list
Name                       ips        average    deviation         median
flat_map_list v2       1849.65       540.64μs    (±12.58%)       520.00μs
:lists.flatmap         1828.21       546.98μs    (±11.81%)       525.00μs
flat_map v1            1361.04       734.73μs     (±5.06%)       715.00μs
map.flatten            1089.10       918.19μs    (±12.98%)       907.00μs
flat_map                695.84      1437.12μs     (±9.17%)      1422.00μs

Comparison: 
flat_map_list v2       1849.65
:lists.flatmap         1828.21 - 1.01x slower
flat_map v1            1361.04 - 1.36x slower
map.flatten            1089.10 - 1.70x slower
flat_map                695.84 - 2.66x slower

# medium sized (chose 200 list elements)
Name                       ips        average    deviation         median
flat_map_list v2     114712.36         8.72μs    (±91.11%)         8.00μs
:lists.flatmap       107221.70         9.33μs    (±86.29%)         9.00μs
flat_map v1           86379.92        11.58μs    (±58.57%)        11.00μs
map.flatten           75570.66        13.23μs    (±49.44%)        13.00μs
flat_map              47269.18        21.16μs    (±43.72%)        19.00μs

Comparison: 
flat_map_list v2     114712.36
:lists.flatmap       107221.70 - 1.07x slower
flat_map v1           86379.92 - 1.33x slower
map.flatten           75570.66 - 1.52x slower
flat_map              47269.18 - 2.43x slower

# small list (10 elements)
# as this is very fast, benchee repeats the execution n times until it takes at least ~10 microsceonds as below I believe there won't be accurate measures, so there is some overhead incurred by that
Name                       ips        average    deviation         median
flat_map_list v2    2017197.14         0.50μs    (±16.27%)         0.48μs
:lists.flatmap      1811855.41         0.55μs    (±26.58%)         0.51μs
flat_map v1         1489038.79         0.67μs    (±17.98%)         0.65μs
map.flatten         1470887.23         0.68μs   (±288.22%)         0.70μs
flat_map             829364.49         1.21μs   (±250.98%)         1.00μs

Comparison: 
flat_map_list v2    2017197.14
:lists.flatmap      1811855.41 - 1.11x slower
flat_map v1         1489038.79 - 1.35x slower
map.flatten         1470887.23 - 1.37x slower
flat_map             829364.49 - 2.43x slower

Concrete implementations etc. in this commit

edit: forgot to mention, both versions are significantly faster for all use cases and v2 is even the fastest which is :rocket: :rocket: :rocket: - which is amazing :D It could have taken you at most 18 minutes or so :)

PragTob on 2 Aug 2016

🎉4

All 5 comments

So Enum.flat_map never calls ++ and uses a reversal, that should theoretically consume less memory. flat_map also accepts any enumerable to be flattened while :lists can afford to work only on lists. In any case, can you please try these versions?

  # v1
  def flat_map(enumerable, fun) when is_function(fun, 1) do
    Enum.reduce(enumerable, [], fn(entry, acc) ->
      case fun.(entry) do
        list when is_list(list) -> :lists.reverse(list, acc)
        other -> Enum.reduce(other, acc, &[&1 | &2])
      end
    end) |> :lists.reverse
  end

  # v2
  def flat_map_list([h | t], fun) do
    case fun.(h) do
      list when is_list(list) -> list ++ flat_map_list(t, fun)
      other -> Enum.to_list(other) ++ flat_map_list(t, fun)
    end
  end

  def flat_map_list([], fun) when is_function(fun, 1) do
    []
  end

It would also be nice to know how they compare against smaller inputs. :)

josevalim on 2 Aug 2016

👍4

@josevalim will do! Thanks and :heart: as always for your quick interaction and general awesomeness :)

PragTob on 2 Aug 2016

You sir, are amazing.

# old benchmark with 10_000 element list
Name                       ips        average    deviation         median
flat_map_list v2       1849.65       540.64μs    (±12.58%)       520.00μs
:lists.flatmap         1828.21       546.98μs    (±11.81%)       525.00μs
flat_map v1            1361.04       734.73μs     (±5.06%)       715.00μs
map.flatten            1089.10       918.19μs    (±12.98%)       907.00μs
flat_map                695.84      1437.12μs     (±9.17%)      1422.00μs

Comparison: 
flat_map_list v2       1849.65
:lists.flatmap         1828.21 - 1.01x slower
flat_map v1            1361.04 - 1.36x slower
map.flatten            1089.10 - 1.70x slower
flat_map                695.84 - 2.66x slower

# medium sized (chose 200 list elements)
Name                       ips        average    deviation         median
flat_map_list v2     114712.36         8.72μs    (±91.11%)         8.00μs
:lists.flatmap       107221.70         9.33μs    (±86.29%)         9.00μs
flat_map v1           86379.92        11.58μs    (±58.57%)        11.00μs
map.flatten           75570.66        13.23μs    (±49.44%)        13.00μs
flat_map              47269.18        21.16μs    (±43.72%)        19.00μs

Comparison: 
flat_map_list v2     114712.36
:lists.flatmap       107221.70 - 1.07x slower
flat_map v1           86379.92 - 1.33x slower
map.flatten           75570.66 - 1.52x slower
flat_map              47269.18 - 2.43x slower

# small list (10 elements)
# as this is very fast, benchee repeats the execution n times until it takes at least ~10 microsceonds as below I believe there won't be accurate measures, so there is some overhead incurred by that
Name                       ips        average    deviation         median
flat_map_list v2    2017197.14         0.50μs    (±16.27%)         0.48μs
:lists.flatmap      1811855.41         0.55μs    (±26.58%)         0.51μs
flat_map v1         1489038.79         0.67μs    (±17.98%)         0.65μs
map.flatten         1470887.23         0.68μs   (±288.22%)         0.70μs
flat_map             829364.49         1.21μs   (±250.98%)         1.00μs

Comparison: 
flat_map_list v2    2017197.14
:lists.flatmap      1811855.41 - 1.11x slower
flat_map v1         1489038.79 - 1.35x slower
map.flatten         1470887.23 - 1.37x slower
flat_map             829364.49 - 2.43x slower

Concrete implementations etc. in this commit

PragTob on 2 Aug 2016

🎉4

Thank you for benchmarking! :heart:

josevalim on 2 Aug 2016

Thank you! :)

PragTob on 3 Aug 2016

Was this page helpful?

0 / 5 - 0 ratings