Elixir: File.stream! too slow reading from text file

Created on 23 Apr 2020  路  9Comments  路  Source: elixir-lang/elixir

Environment

$ elixir --version
Erlang/OTP 22 [erts-10.7.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

Elixir 1.9.4 (compiled with Erlang/OTP 22)

Elixir running inside a container with the latest Docker version:

$ docker --version
Docker version 19.03.8, build afacb8b

Current behavior

Given the following module:

defmodule TestSpeed do
  @compile {:inline, chop: 1}

  @path "...."

  defp chop(binary), do: :binary.part(binary, 0, byte_size(binary) - 1)

  def read(), do: @path |> File.read!() |> String.split("\n")

  def stream(), do: @path |> File.stream!([:read_ahead]) |> Enum.map(&chop/1)

  def stream_10Mb(), do: @path |> File.stream!([read_ahead: 10_000_000]) |> Enum.map(&chop/1)
end

In a fresh IEx session I am getting the following times

Interactive Elixir (1.9.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Benchee.run(
...(1)>   %{
...(1)>     "read()" => fn -> TestSpeed.read() end,
...(1)>     "stream()" => fn -> TestSpeed.stream() end,
...(1)>     "stream_10Mb()" => fn -> TestSpeed.stream_10Mb() end
...(1)>   }
...(1)> )
Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
Number of Available Cores: 4
Available memory: 1.94 GB
Elixir 1.9.4
Erlang 22.3.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s

Benchmarking read()...
Benchmarking stream()...
Benchmarking stream_10Mb()...

Name                    ips        average  deviation         median         99th %
read()                84.86       0.0118 s    卤13.86%       0.0114 s       0.0201 s
stream()              0.177         5.64 s    卤46.01%         5.64 s         7.47 s
stream_10Mb()        0.0299        33.45 s     卤0.00%        33.45 s        33.45 s

Comparison:
read()                84.86
stream()              0.177 - 478.32x slower +5.62 s
stream_10Mb()        0.0299 - 2838.13x slower +33.43 s

The file used in this test has 57169 lines of text, the biggest line is perhaps 100 characters, and it has a total of 5281128 bytes.

The problem seems to be strongly related with the internals of File.stream!.

I could observe this result when running inside a Docker container in Linux and MacOS.

Expected behavior

The use of File.stream! in this example should be on the order of magnitude of File.read!, and probably faster.

Also, the effect of a read ahead buffer of 10Mb on such a significant raise in the duration was totally unexpected either. In fact, I do not expect such a buffer size to provide a significant reduction in the duration of File.stream! in this example, because given the default buffer size of 64Kb, and the file size in the example, there will be approximately 80 more calls to the OS to read the file, which should not significantly add to the duration.

Most helpful comment

Thank you @pera for the amazing work. I was planning to do all of this but you beat me to it and did a fantastic job! 鉂わ笍

All 9 comments

Streams are almost always slower than their eager operations. The point of a stream is never performance but rather to allow you to traverse on the data without loading it all into memory. If you have a 1GB file, you most likely want to stream over it, to avoid loading it all and because the cost of processing each line usually offsets the difference.

Finally, don't use the shell to benchmark, as code in there is evaluated, and don't use containers to measure file system operations either, as they often have some system calls quite more performant than others, so you may end-up benchmarking the container implementation of said syscalls.

@josevalim I created an Elixir script testspeed.exs file with the following code

Benchee.run(
  %{
    "read()" => fn -> TestSpeed.read() end,
    "stream()" => fn -> TestSpeed.stream() end,
    "stream_10Mb()" => fn -> TestSpeed.stream_10Mb() end
  }
)

and executed it outside of containers and of IEx, and the results are similar:

$ mix run testspeed.exs
Operating System: macOS
CPU Information: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.9.1
Erlang 22.3.2

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 21 s

Benchmarking read()...
Benchmarking stream()...
Benchmarking stream_10Mb()...

Name                    ips        average  deviation         median         99th %
read()               116.32      0.00860 s    卤16.41%      0.00816 s       0.0129 s
stream()               0.23         4.38 s    卤45.83%         4.38 s         5.80 s
stream_10Mb()        0.0638        15.68 s     卤0.00%        15.68 s        15.68 s

Comparison:
read()               116.32
stream()               0.23 - 509.50x slower +4.37 s
stream_10Mb()        0.0638 - 1823.74x slower +15.67 s

So no Docker, no IEx, still same results.

These results show that File.stream! seems unusable to read from a 5Mb file, much less for a 1Gb file.

Interestingly, I have no problems with File.stream! for writing to a file.

Could you please check what is going on? Thanks.

I have done some quick benchmarks, the issue seems to be with :file.read_line/2 which causes the main slow down followed by wrapping in streams which causes a 3x slow down.

File.read!("file.txt") |> String.split("\n") takes 41ms (1x).

The following code executes in 6210ms (151x):

defmodule Foo do
  def foo do
    {:ok, file} = :file.open('file.txt', [:read, :raw, :binary, :read_ahead])
    loop(file, [])
  end

  def loop(file, acc) do
    case :file.read_line(file) do
      {:ok, data} ->
        loop(file, [data | acc])
      :eof ->
        :file.close(file)
        Enum.reverse(acc)
    end
  end
end

File.stream!("file.txt") |> Enum.to_list takes 18975ms (463x).

The test file was generated with:

string = "foobarbaz"
io = Enum.map(1..58679, fn _ -> [Enum.map(1..10, fn _ -> string end), "\n"] end)
File.write!("file.txt", io)
Erlang/OTP 22 [erts-10.4] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [hipe]

Interactive Elixir (1.11.0-dev) - press Ctrl+C to exit (type h() ENTER for help)

I have more strange results using the same test file as before:

defmodule FileDemo do
  @path "/Users/ericmj/temp/file.txt"

  def read(), do: @path |> File.read!() |> String.split("\n")

  def stream1(), do: @path |> File.stream!() |> Stream.map(&to_string/1) |> Stream.run()

  def stream2(), do: @path |> File.stream!() |> Stream.map(&to_string/1) |> Enum.to_list()

  def stream3(), do: @path |> File.stream!() |> Enum.to_list()

  def benchmark() do
    Benchee.run(
      %{
        "read" => fn -> read() end,
        "stream1" => fn -> stream1() end,
        "stream2" => fn -> stream2() end,
        "stream3" => fn -> stream3() end
      }
    )
  end
end

mix run -e "FileDemo.benchmark"
Compiling 1 file (.ex)
Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
Number of Available Cores: 12
Available memory: 16 GB
Elixir 1.11.0-dev
Erlang 22.0

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 28 s

Benchmarking read...
Benchmarking stream1...
Benchmarking stream2...
Benchmarking stream3...

Name              ips        average  deviation         median         99th %
read           151.85      0.00659 s    卤15.45%      0.00641 s       0.0101 s
stream1         29.98       0.0334 s     卤3.16%       0.0331 s       0.0389 s
stream2          0.53         1.89 s    卤19.44%         2.10 s         2.10 s
stream3          0.33         3.07 s    卤61.31%         2.15 s         5.24 s

Comparison:
read           151.85
stream1         29.98 - 5.07x slower +0.0268 s
stream2          0.53 - 286.26x slower +1.88 s
stream3          0.33 - 466.92x slower +3.07 s

stream2 is 50x slower than stream1 even though the only difference is Enum.to_list vs Stream.run. I would not think allocating a list would make that much of a difference.

I get similar results to @ericmj I was trying to come up with a more comparable demo which is how we discovered the discrepancy with Stream.run

For comparisons sake:

# lib/file_demo.exs
defmodule FileDemo do
  @path "a_2mb.csv"

  def read(), do: @path |> File.read!() |> String.split() |> Enum.map(&to_string/1)
  def stream(), do: @path |> File.stream!() |> Stream.map(&to_string/1) |> Enum.to_list()
  def benchmark() do
    Benchee.run(
      %{
        "read" => fn -> read() end,
        "stream" => fn  -> stream() end
      }
    )
  end
end

# run.exs
FileDemo.benchmark()

# results of mix run run.exs

Operating System: macOS
CPU Information: Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Number of Available Cores: 8
Available memory: 16 GB
Elixir 1.10.0
Erlang 22.2.4
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s
Benchmarking read...
Benchmarking stream...
Name             ips        average  deviation         median         99th %
read          106.12        9.42 ms     卤7.96%        9.25 ms       11.76 ms
stream          1.85      539.71 ms    卤45.58%      423.90 ms      956.81 ms
Comparison: 
read          106.12
stream          1.85 - 57.27x slower +530.29 ms

Changing the file to a larger, 98mb csv made the stream exponentially slower.

Name             ips        average  deviation         median         99th %
read            3.03    0.00550 min    卤13.23%    0.00518 min    0.00710 min
stream       0.00327       5.10 min     卤0.00%       5.10 min       5.10 min

Comparison: 
read            3.03
stream       0.00327 - 928.27x slower +5.10 min

It looks like this is more of a BEAM issue than an Elixir one as I am only able to trigger this behavior when my programs deal with refc binaries while reading files.

Here I am testing one file with 100k lines where each line contains 64 a's (which fits in a heap binary), and then another file with 100k lines of 65 a's (which requires a refc binary):

defmodule Test do
  def parse(name) do
    Path.join(["priv", name])
    |> File.stream!
    |> Stream.map(&String.trim/1)
  end

  def f1(name) do
    parse(name)
    |> Enum.map(fn x -> x end)
  end

  def f2(name) do
    parse(name)
    |> Stream.map(&(String.split_at(&1, 32)))
    |> Enum.map(fn {x, _y} -> {x} end)
  end

  def f3(name) do
    parse(name)
    |> Stream.map(&(String.split_at(&1, 32)))
    |> Enum.map(fn {_x, y} -> {y} end)
  end

  def f4(name) do
    parse(name)
    |> Stream.map(&(String.split_at(&1, 32)))
    |> Enum.map(fn {x, y} -> :binary.copy(x) <> :binary.copy(y) end)
  end

  def f5(name) do
    parse(name)
    |> Stream.map(&(String.split_at(&1, 32)))
    |> Enum.map(fn {x, y} -> {:binary.copy(x), :binary.copy(y)} end)
    |> Enum.map(fn {x, y} -> x <> y end)
  end

  def test do
    # NOTE for extra fun swap the order of f4 and f5 (i.e. f1 f2 f3 f5 f4):
    Enum.each(~w(f1 f2 f3 f4 f5)a, fn f ->
      IO.puts("* #{f}:")
      Enum.each(~w(100k_64b.csv 100k_65b.csv), fn name ->
        {t, _} = :timer.tc(fn -> apply(__MODULE__, f, [name]) end)
        IO.puts("#{name}: #{t/1_000_000}s")
      end)
      IO.puts("")
    end)
  end
end

The results I get with Erlang/OTP 23 [erts-11.0] are:

  • f1:
    100k_64b.csv: 0.19416s
    100k_65b.csv: 12.127937s

  • f2:
    100k_64b.csv: 0.489538s
    100k_65b.csv: 0.470504s

  • f3:
    100k_64b.csv: 0.499139s
    100k_65b.csv: 13.927376s

  • f4:
    100k_64b.csv: 21.047894s
    100k_65b.csv: 16.871461s

  • f5:
    100k_64b.csv: 0.606711s
    100k_65b.csv: 0.567491s

The last case is specially hilarious: it's faster to split a line of 65 characters in two parts, put them into a tuple, and then concatenate them than just copying the whole line into a list! 馃槀

Note that if I set File.stream! with something like read_ahead: 1 or encoding: :utf8 all these tests finish in ~1 second (I'm currently using a read_ahead of 100 lines as a workaround for the project I am working on).

Alright, now I am able to reproduce this issue without File.stream! but mostly erlang calls:

defmodule Test do
  def parse(fd, acc \\ []) do
    case IO.each_binstream(fd, :line) do
      {[x], fd} ->
        parse(fd, [String.trim(x) | acc])
      {:halt, fd} ->
        :file.close(fd)
        :lists.reverse(acc)
    end
  end

  def test do
    Enum.each(~w(100k_64b.csv 100k_65b.csv), fn name ->
      {t, _} = :timer.tc(fn ->
        {:ok, fd} = :file.open("priv/"<>name, [:raw, :read_ahead, :binary])
        parse(fd)
      end)
      IO.puts("#{name}: #{t/1_000_000}s")
    end)
  end
end

(IO.each_binstream source is here)

The results are pretty much the same as with my f1 case above:
100k_64b.csv: 0.188176s
100k_65b.csv: 12.099765s

EDIT: got same results with a "pure" Erlang version:

  def parse(fd, acc \\ []) do
    case :file.read_line(fd) do
      {:ok, x} ->
        parse(fd, [String.trim(x) | acc])
      :eof ->
        :file.close(fd)
        :lists.reverse(acc)
    end
  end

Last night I reported this in the Erlang's issue tracker (ERL-1347), and it is already confirmed to be a GC quirk.

Thank you @pera for the amazing work. I was planning to do all of this but you beat me to it and did a fantastic job! 鉂わ笍

Was this page helpful?
0 / 5 - 0 ratings