Ecto: Repository/Connection pool seems to hang

Created on 19 Mar 2019  路  3Comments  路  Source: elixir-ecto/ecto

From time to time we experience an issue in our production environment where we stop being able to execute database queries. We measure the amount of queries we do with telemetry and it drops to 0. The only way we have been able to mitigate this is by restarting our entire application. Just restarting our repository didn't resolve the issue.

We have been unable to find the root cause and so far we haven't been able to reproduce this somewhere else than production. This makes it quite hard for us to identify the bug or creating a reproducible scenario. Doing queries from the console also doesn't work anymore, but for me the strange thing is that it actually "hangs" the console and we don't see any logs related to that. Perhaps there is a good explanation of this which we've been unable to come up with.

We use fairly default config with a pool size of 10.

Any pointers to what steps we can take to further investigate/debug this would be appreciated.

Environment

  • Elixir version: 1.7
  • Erlang version: 21
  • Database and version: Postgres 9.4 (with BDR 1.0)
  • Ecto version: 3.0.6
  • Ecto SQL version: 3.0.4
  • Database adapter and version: Postgrex 0.14.1
  • Operating system: Ubuntu Xenial

Current behavior

When we query using Repo.query("SELECT 1") from iex the console seems to hang forever. It doesn't return anything.

Expected behavior

I would expect to receive an error or some log line why it doesn't return anything.

Most helpful comment

All 3 comments

@tverlaan a couple things you can try are:

  1. I would do Repo.checkout when it enters such state to see if the issue is on the pool or maybe waiting for the database

  2. Once you get to such state, you can do this:

iex> pid = spawn(fn -> Repo.query("SELECT 1") end)
iex> Process.info(pid)
iex> Process.info(pid, :current_stacktrace)

So we can see where it is stuck.

We had it happening again and got the stacktrace:

iex> pid = spawn(fn ->Repo.query("SELECT 1") end)
#PID<0.2564.2148>
iex> {Process.info(pid), Process.info(pid, :current_stacktrace)}
{[
   current_function: {DBConnection.Holder, :checkout_call, 5},
   initial_call: {:erlang, :apply, 2},
   status: :waiting,
   message_queue_len: 0,
   links: [],
   dictionary: [],
   trap_exit: false,
   error_handler: :error_handler,
   priority: :normal,
   group_leader: #PID<51483.75.0>,
   total_heap_size: 233,
   heap_size: 233,
   stack_size: 36,
   reductions: 334,
   garbage_collection: [
     max_heap_size: %{error_logger: true, kill: true, size: 0},
     min_bin_vheap_size: 46422,
     min_heap_size: 233,
     fullsweep_after: 65535,
     minor_gcs: 0
   ],
   suspending: []
 ],
 {:current_stacktrace,
  [
    {DBConnection.Holder, :checkout_call, 5,
     [file: 'lib/db_connection/holder.ex', line: 232]},
    {DBConnection.Holder, :checkout, 2,
     [file: 'lib/db_connection/holder.ex', line: 58]},
    {DBConnection, :checkout, 3, [file: 'lib/db_connection.ex', line: 964]},
    {DBConnection, :run, 6, [file: 'lib/db_connection.ex', line: 1274]},
    {DBConnection, :parsed_prepare_execute, 5,
     [file: 'lib/db_connection.ex', line: 498]},
    {DBConnection, :prepare_execute, 4,
     [file: 'lib/db_connection.ex', line: 491]},
    {Postgrex, :query_prepare_execute, 4, [file: 'lib/postgrex.ex', line: 182]}
  ]}}
Was this page helpful?
0 / 5 - 0 ratings

Related issues

mcginleyr1 picture mcginleyr1  路  5Comments

jordi-chacon picture jordi-chacon  路  4Comments

kelostrada picture kelostrada  路  3Comments

yordis picture yordis  路  4Comments

atsheehan picture atsheehan  路  4Comments