A test fails on aarch64 with the following traceback:
Randomized with seed 343444
==> mix (ex_unit)
Excluding tags: [windows: true]
--rpc-eval : RPC failed with reason :nodedown
.........................................................................................................................................................................................................................................................................................................................
1) test executes rpc instructions (Mix.Tasks.ReleaseTest)
test/mix/tasks/release_test.exs:268
match (=) failed
code: assert {pid, 0} = System.cmd(script, ["pid"])
right: {"", 1}
stacktrace:
test/mix/tasks/release_test.exs:282: anonymous fn/1 in Mix.Tasks.ReleaseTest."test executes rpc instructions"/1
(mix) lib/mix/project.ex:352: Mix.Project.in_project/4
(elixir) lib/file.ex:1542: File.cd!/2
test/test_helper.exs:119: MixTest.Case.in_fixture/3
test/mix/tasks/release_test.exs:269: (test)
...........................................................................................................................................................................................................................................................................................................................
Finished in 622.5 seconds (10.8s on load, 611.6s on tests)
9 doctests, 620 tests, 1 failure
I am expecting the test to pass. The issue has occurred in the past, and we disabled tests on the aarch64 architecture for that reason.
This issue appears to occur on the armv7 (32-bit) architecture as well.
I will like to help with this.
Thanks for the report! Do you have an environment or a container or similar
that we can use to reproduce the errors?
Also, can you please check the result of System.pid() in those systems?
For those willing to investigate this, I would recommend creating a new
project, assembling a release, then start the release and invoke the
Jos茅 Valim
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D
Ping?
I also have another suggestion. Can you please try running epmd -daemon before calling make test to see if it fixes the failure? Thanks.
Apologies for the late reply, I've been fairly busy with $dayjob and a few other projects.
I suspect this is another flaky test (in similarity to another bug I reported prior to the release of 1.9.0) - I am not able to consistently reproduce it on an aarch64 dev container, but Drone CI failed multiple times.
It seems like a proper heisenbug.
@dunielpls, if it makes any difference, I saw this problem when building a Docker image based on openshift/base-centos7 when Docker was running in Virtualbox, but the problem seemed to go away when switching to Hyper-V. Exact same Dockerfile with the same build steps, exact same tag being built. Something nasty and timing related that trips up one platform but not the other?
@ilkka if you call epmd -daemon before running the tests in the image that fails, does it solve the problem? I am asking because I hard issues of epmd not being started automatically in the past and maybe those systems are triggering it.
Hm, I got the same traceback again: https://cloud.drone.io/alpinelinux/aports/8411/3/1
Is there a good and easy way to disable that one test without patching the code?
Hm, I got the same traceback again: https://cloud.drone.io/alpinelinux/aports/8411/3/1
Can you try on the v1.9 branch which has a possible fix?
Is there a good and easy way to disable that one test without patching the code?
Only by changing the source, sorry.
@josevalim
It works on my Windows 10 machine with Hyper-V; however, Docker Hub still fails to build this with epmd -daemon:
Dockerfile (based on Alpine Linux 3.10.1, including patches for LibreSSL and Erlang):
FROM wouterklijn/erlang:22.0.7
SHELL ["/bin/ash", "-euxo", "pipefail", "-c"]
ENV ELIXIR_VERSION="1.9.1" \
ELIXIR_SHA256="94daa716abbd4493405fb2032514195077ac7bc73dc2999922f13c7d8ea58777"
WORKDIR /usr/src/elixir-${ELIXIR_VERSION}
RUN echo "Installing Elixir ${ELIXIR_VERSION}" \
# Install Elixir build tools
&& apk update \
&& apk add --no-cache --virtual .elixir_build_tools \
git="2.22.0-r0" \
make="4.2.1-r2" \
# Download and unpack Elixir
&& ELIXIR_SRC_URL="https://github.com/elixir-lang/elixir/archive/v${ELIXIR_VERSION}.tar.gz" \
&& ELIXIR_SRC_ARCHIVE="/tmp/elixir-src-${ELIXIR_VERSION}.tar.gz" \
&& wget -O "${ELIXIR_SRC_ARCHIVE}" "${ELIXIR_SRC_URL}" \
&& echo "${ELIXIR_SHA256} ${ELIXIR_SRC_ARCHIVE}" | sha256sum -c \
&& tar -xzf "${ELIXIR_SRC_ARCHIVE}" -C "$(pwd)" --strip-components=1 \
# Build and install Elixir
&& make -j"$(nproc)" \
&& epmd -daemon \
&& make -j"$(nproc)" test \
&& make -j"$(nproc)" install \
&& make -j"$(nproc)" clean \
# Delete Elixir build tools
&& apk del .elixir_build_tools
RUN echo "Installing Hex" \
&& mix local.hex --force
CMD ["iex"]
Output snippet:
--rpc-eval : RPC failed with reason :nodedown
1) test executes rpc instructions (Mix.Tasks.ReleaseTest)
test/mix/tasks/release_test.exs:272
match (=) failed
code: assert {pid, 0} = System.cmd(script, ["pid"])
right: {"", 1}
stacktrace:
test/mix/tasks/release_test.exs:286: anonymous fn/1 in Mix.Tasks.ReleaseTest."test executes rpc instructions"/1
(mix) lib/mix/project.ex:352: Mix.Project.in_project/4
(elixir) lib/file.ex:1542: File.cd!/2
test/test_helper.exs:120: MixTest.Case.in_fixture/3
test/mix/tasks/release_test.exs:273: (test)
Could it be as simple as a previous assertion closing the port?
assert System.cmd(script, ["stop"]) == {"", 0}
Either way, this behavior should be consistent across platforms.
I can confirm this was the culprit for failing builds. Swapping the assertions around, having "stop" last, solves the issues on my end.
I wonder why it was working passing (where I expect it to be failing) on other platforms. Maybe someone could shed some light on this?
@wuhkuh can you please send a PR with the changes you have done? Or simply share a commit? I would love to take a look at it so I can answer your questions correctly. Thanks!
This is the patch file I used to fix the failing tests:
--- lib/mix/test/mix/tasks/release_test.exs
+++ lib/mix/test/mix/tasks/release_test.exs
@@ -281,10 +281,9 @@ defmodule Mix.Tasks.ReleaseTest do
open_port(script, ['start'])
wait_until_decoded(Path.join(root, "RELEASE_BOOTED"))
assert System.cmd(script, ["rpc", "ReleaseTest.hello_world"]) == {"hello world\n", 0}
- assert System.cmd(script, ["stop"]) == {"", 0}
-
assert {pid, 0} = System.cmd(script, ["pid"])
assert pid != "\n"
+ assert System.cmd(script, ["stop"]) == {"", 0}
end)
end)
end
If this looks good, I can send in a PR later today.
Yes, please do send a PR!
Awesome! Now I can pull it and apply it to get Elixir 1.9.0 in Alpine Linux' repositories.
I was planning to look at this but I've been on vacation and otherwise been terribly busy.
Thank you both @wuhkuh and @josevalim! :)
Most helpful comment
I will like to help with this.