From time to time we experience an issue in our production environment where we stop being able to execute database queries. We measure the amount of queries we do with telemetry and it drops to 0. The only way we have been able to mitigate this is by restarting our entire application. Just restarting our repository didn't resolve the issue.
We have been unable to find the root cause and so far we haven't been able to reproduce this somewhere else than production. This makes it quite hard for us to identify the bug or creating a reproducible scenario. Doing queries from the console also doesn't work anymore, but for me the strange thing is that it actually "hangs" the console and we don't see any logs related to that. Perhaps there is a good explanation of this which we've been unable to come up with.
We use fairly default config with a pool size of 10.
Any pointers to what steps we can take to further investigate/debug this would be appreciated.
When we query using Repo.query("SELECT 1") from iex the console seems to hang forever. It doesn't return anything.
I would expect to receive an error or some log line why it doesn't return anything.
@tverlaan a couple things you can try are:
I would do Repo.checkout when it enters such state to see if the issue is on the pool or maybe waiting for the database
Once you get to such state, you can do this:
iex> pid = spawn(fn -> Repo.query("SELECT 1") end)
iex> Process.info(pid)
iex> Process.info(pid, :current_stacktrace)
So we can see where it is stuck.
We had it happening again and got the stacktrace:
iex> pid = spawn(fn ->Repo.query("SELECT 1") end)
#PID<0.2564.2148>
iex> {Process.info(pid), Process.info(pid, :current_stacktrace)}
{[
current_function: {DBConnection.Holder, :checkout_call, 5},
initial_call: {:erlang, :apply, 2},
status: :waiting,
message_queue_len: 0,
links: [],
dictionary: [],
trap_exit: false,
error_handler: :error_handler,
priority: :normal,
group_leader: #PID<51483.75.0>,
total_heap_size: 233,
heap_size: 233,
stack_size: 36,
reductions: 334,
garbage_collection: [
max_heap_size: %{error_logger: true, kill: true, size: 0},
min_bin_vheap_size: 46422,
min_heap_size: 233,
fullsweep_after: 65535,
minor_gcs: 0
],
suspending: []
],
{:current_stacktrace,
[
{DBConnection.Holder, :checkout_call, 5,
[file: 'lib/db_connection/holder.ex', line: 232]},
{DBConnection.Holder, :checkout, 2,
[file: 'lib/db_connection/holder.ex', line: 58]},
{DBConnection, :checkout, 3, [file: 'lib/db_connection.ex', line: 964]},
{DBConnection, :run, 6, [file: 'lib/db_connection.ex', line: 1274]},
{DBConnection, :parsed_prepare_execute, 5,
[file: 'lib/db_connection.ex', line: 498]},
{DBConnection, :prepare_execute, 4,
[file: 'lib/db_connection.ex', line: 491]},
{Postgrex, :query_prepare_execute, 4, [file: 'lib/postgrex.ex', line: 182]}
]}}
Fixed in https://github.com/elixir-ecto/db_connection/pull/191 and released in v2.0.6
Most helpful comment
Fixed in https://github.com/elixir-ecto/db_connection/pull/191 and released in v2.0.6