Erlang/OTP 21 [erts-10.0.5] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]
Elixir 1.7.2 (compiled with Erlang/OTP 21)
Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Elixir 1.7.3 (compiled with Erlang/OTP 21)
We are getting deadlocks when doing mix compile on slower machines. This issue started happening very rarely and has become more pervasive, to the point of being unable to compile on CI (CircleCI).
When compiling locally on my MacBook without any resource restrictions compiling works fine. However, other developers on slower machines have seen the issue pop up more often.
We also see this issue when compiling in Docker when not enough resources are allocated. The problem appears when a user only allocates 2 cores and 4GB of memory. Increasing this to 4 cores and 8GB makes the issue go away. I believe the build boxes on CircleCI that we have all have 2 cores available for compiling.
I have been able to reproduce this outside of docker by instructing the erlang VM to only use 2 cores with MIX_ENV=prod elixir --erl '+S 2' -S mix compile. The issue happens with +S 2 +S 3 but not +S 4
The compile output is fairly long. Here is an abridged version skipping to the OTP app that the errors appear in:
==> partner
Compiling 58 files (.ex)
== Compilation error in file lib/partner/job_assignment.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/kernel/typespec.ex:462: Kernel.Typespec.typespec/3
(elixir) lib/kernel/typespec.ex:744: anonymous fn/4 in Kernel.Typespec.fn_args/4
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
(elixir) lib/kernel/typespec.ex:744: Kernel.Typespec.fn_args/4
(elixir) lib/kernel/typespec.ex:733: Kernel.Typespec.fn_args/5
(elixir) lib/kernel/typespec.ex:300: Kernel.Typespec.translate_spec/7
== Compilation error in file lib/partner/estimated_rate_period_cost.ex ==
** (CompileError) deadlocked waiting on module Partner.JobAssignment
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_series.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:432: Ecto.Association.Has.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_assignment_transition.ex ==
** (CompileError) deadlocked waiting on module Partner.JobAssignment
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job.ex ==
** (CompileError) deadlocked waiting on module Partner.JobAssignment
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:432: Ecto.Association.Has.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/actuals_note.ex ==
** (CompileError) deadlocked waiting on module Partner.JobAssignment
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/organization.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:432: Ecto.Association.Has.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/flat_rate.ex ==
** (CompileError) deadlocked waiting on module Partner.Organization
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/profile.ex ==
** (CompileError) deadlocked waiting on module Partner.Organization
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/partner.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:432: Ecto.Association.Has.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/client.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:432: Ecto.Association.Has.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_series_transition.ex ==
** (CompileError) deadlocked waiting on module Partner.JobSeries
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/location.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:432: Ecto.Association.Has.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_transition.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_series_participant.ex ==
** (CompileError) deadlocked waiting on module Partner.JobSeries
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_participant.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/rate_period_cost.ex ==
** (CompileError) deadlocked waiting on module Partner.JobAssignment
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_series/schedule.ex ==
** (CompileError) deadlocked waiting on module Partner.JobSeries
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/estimated_job_assignment_cost.ex ==
** (CompileError) deadlocked waiting on module Partner.JobAssignment
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_assignment_cost.ex ==
** (CompileError) deadlocked waiting on module Partner.JobAssignment
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
== Compilation error in file lib/partner/job_series/assignment.ex ==
** (CompileError) deadlocked waiting on module Partner.Profile
(elixir) lib/code.ex:1026: Code.ensure_compiled/1
(elixir) lib/code.ex:1046: Code.ensure_compiled?/1
lib/ecto/association.ex:719: Ecto.Association.BelongsTo.after_compile_validation/2
lib/ecto/schema.ex:1656: anonymous fn/4 in Ecto.Schema.__after_compile__/2
(elixir) lib/enum.ex:1925: Enum."-reduce/3-lists^foldl/2-0-"/3
lib/ecto/schema.ex:1654: Ecto.Schema.__after_compile__/2
Compilation failed because of a deadlock between files.
The following files depended on the following modules:
lib/partner/job_assignment.ex => Partner.Job
lib/partner/estimated_rate_period_cost.ex => Partner.JobAssignment
lib/partner/job_series.ex => Partner.Job
lib/partner/job_assignment_transition.ex => Partner.JobAssignment
lib/partner/job.ex => Partner.JobAssignment
lib/partner/actuals_note.ex => Partner.JobAssignment
lib/partner/organization.ex => Partner.Job
lib/partner/flat_rate.ex => Partner.Organization
lib/partner/profile.ex => Partner.Organization
lib/partner/partner.ex => Partner.Job
lib/partner/client.ex => Partner.Job
lib/partner/job_series_transition.ex => Partner.JobSeries
lib/partner/location.ex => Partner.Job
lib/partner/job_transition.ex => Partner.Job
lib/partner/job_series_participant.ex => Partner.JobSeries
lib/partner/job_participant.ex => Partner.Job
lib/partner/rate_period_cost.ex => Partner.JobAssignment
lib/partner/job_series/schedule.ex => Partner.JobSeries
lib/partner/estimated_job_assignment_cost.ex => Partner.JobAssignment
lib/partner/job_assignment_cost.ex => Partner.JobAssignment
lib/partner/job_series/assignment.ex => Partner.Profile
Ensure there are no compile-time dependencies between those files and that the modules they reference exist and are correctly named
==> search
Compiling 58 files (.ex)
== Compilation error in file lib/search/listeners/partner_listener.ex ==
** (UndefinedFunctionError) function Partner.JobAssignment.__schema__/1 is undefined (module Partner.JobAssignment is not available)
Partner.JobAssignment.__schema__(:source)
lib/notification/pg_setup.ex:69: Notification.PgSetup.channel_name/1
lib/search/listeners/partner_listener.ex:33: (module)
(stdlib) erl_eval.erl:677: :erl_eval.do_apply/6
We don't (to my knowledge) have an circular dependencies between OTP apps, however there are a lot of modules that depend on Partner.JobAssignment as it is an Ecto schema that is used widely in the application.
I would expect compilation to work even on the slowest of machines.
I'm not really sure how to debug this any further, but I am glad to gather any more information that would help diagnose the issue.
The next step would be to isolate this into smaller app so we can reproduce
the issue. However, if the issue is resource utilization, I can see this
being hard.
Alternatively, if it is a possibility for you, you could give me access to
the repo. Feel free to reach out to me privately to figure this out. We do
try to avoid this, but given the severity of this issue, I think it
warrants an exception.
Thanks for the willingness to help with this so readily.
It would probably be most useful to get you access to the real code, we're working on getting approval for that. In the meantime, we have been able to gather some additional information.
A coworker @stwf noticed that the first error message mentions typespecs.
== Compilation error in file lib/partner/job_assignment.ex ==
** (CompileError) deadlocked waiting on module Partner.Job
(elixir) lib/kernel/typespec.ex:462: Kernel.Typespec.typespec/3
We were able to get the codebase building again on slow machines with the following change:
diff --git a/apps/partner/lib/partner/job_assignment.ex b/apps/partner/lib/partner/job_assignment.ex
index bf64141b..48ea1daf 100644
--- a/apps/partner/lib/partner/job_assignment.ex
+++ b/apps/partner/lib/partner/job_assignment.ex
@@ -164,7 +164,7 @@ defmodule Partner.JobAssignment do
@doc """
Check if the job related to this job assignment is filled.
"""
- @spec job_filled?(%Job{}, t(), boolean) :: boolean
+ @spec job_filled?(Job.t(), t(), boolean) :: boolean
def job_filled?(%Job{state: "filled"}, %__MODULE__{}, _skip_job_filled_check = false), do: true
def job_filled?(%Job{state: _}, %__MODULE__{}, _skip_job_filled_check = false), do: false
def job_filled?(_job, _job_assignment, true), do: false
@@ -174,7 +174,7 @@ defmodule Partner.JobAssignment do
A job could be "unfilled" and required 1 CHI and 1 CDI and have 1 CHI accepted. This would be "unfilled" overall
but the job would be "filled" for CHI.
"""
- @spec job_filled_for_cdi_or_chi?(%Job{}, t()) :: boolean
+ @spec job_filled_for_cdi_or_chi?(Job.t(), t()) :: boolean
def job_filled_for_cdi_or_chi?(job = %Job{}, job_assignment = %__MODULE__{}) do
case job.state in @to_accepted_state_limiters do
true ->
The typespec docs don't really make much of a distinction between using %MyStruct{} or MyStruct.t() when MyStruct is a module with both a defstruct and @type t. I gather though that when using %MyStruct{} in a spec, the compiler has to compile the mentioned module to figure out the type from the struct. When using MyStruct.t() it is able to skip compiling the whole module at that point and just grab the type information out of the @type t.
The modules we're dealing with in our codebase are ecto schema's that have explicit types defined, but they are recursive: A Job 'has many' JobAssignments and a JobAssignment 'belongs to' a Job.
I'll include a bit of sample code below that sort of illustrates the relationship and what is happening:
sample/lib/sample.ex:
defmodule Sample do
alias Sample.{MyModuleA, MyModuleB}
@spec run :: String.t()
def run do
%MyModuleB{}
|> MyModuleB.add_message(%MyModuleA{str: "foo"})
|> MyModuleB.echo()
end
end
sample/lib/sample/my_module_a.ex:
defmodule Sample.MyModuleA do
alias Sample.MyModuleB
defstruct b: nil, str: ""
@type t :: %__MODULE__{b: MyModuleB.t() | nil, str: String.t()}
@spec add_to_b(MyModuleA.t(), %MyModuleB{}) :: %MyModuleB{}
def add_to_b(my_module_a = %__MODULE__{}, my_module_b = %MyModuleB{}) do
%MyModuleB{my_module_b | a: my_module_a}
end
end
sample/lib/sample/my_module_b.ex:
defmodule Sample.MyModuleB do
alias Sample.MyModuleA
defstruct a: nil
@type t :: %__MODULE__{a: MyModuleA.t() | nil}
@spec echo(t) :: String.t()
def echo(%__MODULE__{a: %MyModuleA{str: s}}), do: s
# @spec add_message(t(), %MyModuleA{}) :: t
@spec add_message(t(), MyModuleA.t()) :: t
def add_message(my_module_b = %__MODULE__{}, my_module_a = %MyModuleA{}) do
%__MODULE__{my_module_b | a: my_module_a}
end
end
This sample compiles as-is. However, If I switch the comment on the @spec lines in Sample.MyModuleB, the compiler deadlocks. Now, in this example, it deadlocks _regardless_ of how fast the machine is or how many cores it can compile on.
I'm guessing in our real-world scenario we have some relationship like this; however, the dependencies aren't directly circular, but rather pass through a few intermediary modules.
I would almost expect it to always deadlock on our code, what surprises me now is that it doesn't deadlock in environments with more concurrency available.
For now, we have a workaround and hopefully, this issue can be illuminating for anyone else that happens to run into this problem.
Ultimately, it would be nice to see the compiler's behavior consistent regardless of concurrency, either it would deadlock always or never. Additionally, it may be useful to have some information in the typespec docs or in the deadlock error message pointing to what might cause this. I would be glad to contribute to either of those avenues if you think they make sense. In the meantime, we'll keep working on getting clearance to let Jos茅 take a peek at the code to see if we can get a more complete solution.
@jeffutter this is a great example because (IIRC) it was not supposed to deadlock at all, as the Elixir compiler should be able to solve those cases as they are both invoked AFTER the structs are defined. You have given me enough to further investigate the issue for now, thanks!
@jeffutter I'd be willing to help since I have some familiarity with your code base 馃槈 .
IIRC using the structs instead of t type was a reaction to dialyzer not resolving the t types in previous versions and saying they were unknown types.
@KronicDeth yeah you're right. I have had problems with Dialyzer not resolving t() types and had to revert to structs.
Closing in favor of #8377.
Most helpful comment
Thanks for the willingness to help with this so readily.
It would probably be most useful to get you access to the real code, we're working on getting approval for that. In the meantime, we have been able to gather some additional information.
A coworker @stwf noticed that the first error message mentions typespecs.
We were able to get the codebase building again on slow machines with the following change:
The typespec docs don't really make much of a distinction between using
%MyStruct{}orMyStruct.t()whenMyStructis a module with both adefstructand@type t. I gather though that when using%MyStruct{}in a spec, the compiler has to compile the mentioned module to figure out the type from the struct. When usingMyStruct.t()it is able to skip compiling the whole module at that point and just grab the type information out of the@type t.The modules we're dealing with in our codebase are ecto schema's that have explicit types defined, but they are recursive: A
Job'has many'JobAssignmentsand aJobAssignment'belongs to' aJob.I'll include a bit of sample code below that sort of illustrates the relationship and what is happening:
sample/lib/sample.ex:
sample/lib/sample/my_module_a.ex:
sample/lib/sample/my_module_b.ex:
This sample compiles as-is. However, If I switch the comment on the
@speclines inSample.MyModuleB, the compiler deadlocks. Now, in this example, it deadlocks _regardless_ of how fast the machine is or how many cores it can compile on.I'm guessing in our real-world scenario we have some relationship like this; however, the dependencies aren't directly circular, but rather pass through a few intermediary modules.
I would almost expect it to always deadlock on our code, what surprises me now is that it doesn't deadlock in environments with more concurrency available.
For now, we have a workaround and hopefully, this issue can be illuminating for anyone else that happens to run into this problem.
Ultimately, it would be nice to see the compiler's behavior consistent regardless of concurrency, either it would deadlock always or never. Additionally, it may be useful to have some information in the typespec docs or in the deadlock error message pointing to what might cause this. I would be glad to contribute to either of those avenues if you think they make sense. In the meantime, we'll keep working on getting clearance to let Jos茅 take a peek at the code to see if we can get a more complete solution.