Elixir: Mix test "--rpc-eval : RPC failed with reason :nodedown" fails.

Created on 26 Jun 2019  路  16Comments  路  Source: elixir-lang/elixir

Precheck

  • Do not use the issue tracker for help or support (try Elixir Forum, Stack Overflow, IRC, etc.)
  • For proposing a new feature, please start a discussion on the Elixir Core mailing list: https://groups.google.com/group/elixir-lang-core
  • For bugs, do a quick search and make sure the bug has not yet been reported
  • Please disclose security vulnerabilities privately at [email protected]
  • Finally, be nice and have fun!

Environment

  • Elixir & Erlang/OTP versions (elixir --version):
    Erlang/OTP 22.0.4, Elixir 1.9.0 (the issue has occurred with previous versions too)
  • Operating system:
    Alpine Linux edge (rolling release) aarch64/arm64

Current behavior

A test fails on aarch64 with the following traceback:

Randomized with seed 343444
==> mix (ex_unit)
Excluding tags: [windows: true]

--rpc-eval : RPC failed with reason :nodedown
.........................................................................................................................................................................................................................................................................................................................

  1) test executes rpc instructions (Mix.Tasks.ReleaseTest)
     test/mix/tasks/release_test.exs:268
     match (=) failed
     code:  assert {pid, 0} = System.cmd(script, ["pid"])
     right: {"", 1}
     stacktrace:
       test/mix/tasks/release_test.exs:282: anonymous fn/1 in Mix.Tasks.ReleaseTest."test executes rpc instructions"/1
       (mix) lib/mix/project.ex:352: Mix.Project.in_project/4
       (elixir) lib/file.ex:1542: File.cd!/2
       test/test_helper.exs:119: MixTest.Case.in_fixture/3
       test/mix/tasks/release_test.exs:269: (test)

...........................................................................................................................................................................................................................................................................................................................

Finished in 622.5 seconds (10.8s on load, 611.6s on tests)
9 doctests, 620 tests, 1 failure

Expected behavior

I am expecting the test to pass. The issue has occurred in the past, and we disabled tests on the aarch64 architecture for that reason.

Mix Needs more info

Most helpful comment

I will like to help with this.

All 16 comments

This issue appears to occur on the armv7 (32-bit) architecture as well.

Ref: https://cloud.drone.io/alpinelinux/aports/7875/5/1

I will like to help with this.

Thanks for the report! Do you have an environment or a container or similar
that we can use to reproduce the errors?

Also, can you please check the result of System.pid() in those systems?

For those willing to investigate this, I would recommend creating a new
project, assembling a release, then start the release and invoke the

bin/my_app pid command and see if it works.

Jos茅 Valim
www.plataformatec.com.br
Skype: jv.ptec
Founder and Director of R&D

Ping?

I also have another suggestion. Can you please try running epmd -daemon before calling make test to see if it fixes the failure? Thanks.

Apologies for the late reply, I've been fairly busy with $dayjob and a few other projects.

I suspect this is another flaky test (in similarity to another bug I reported prior to the release of 1.9.0) - I am not able to consistently reproduce it on an aarch64 dev container, but Drone CI failed multiple times.

It seems like a proper heisenbug.

@dunielpls, if it makes any difference, I saw this problem when building a Docker image based on openshift/base-centos7 when Docker was running in Virtualbox, but the problem seemed to go away when switching to Hyper-V. Exact same Dockerfile with the same build steps, exact same tag being built. Something nasty and timing related that trips up one platform but not the other?

@ilkka if you call epmd -daemon before running the tests in the image that fails, does it solve the problem? I am asking because I hard issues of epmd not being started automatically in the past and maybe those systems are triggering it.

Hm, I got the same traceback again: https://cloud.drone.io/alpinelinux/aports/8411/3/1

Is there a good and easy way to disable that one test without patching the code?

Hm, I got the same traceback again: https://cloud.drone.io/alpinelinux/aports/8411/3/1

Can you try on the v1.9 branch which has a possible fix?

Is there a good and easy way to disable that one test without patching the code?

Only by changing the source, sorry.

@josevalim
It works on my Windows 10 machine with Hyper-V; however, Docker Hub still fails to build this with epmd -daemon:

Dockerfile (based on Alpine Linux 3.10.1, including patches for LibreSSL and Erlang):

FROM wouterklijn/erlang:22.0.7
SHELL ["/bin/ash", "-euxo", "pipefail", "-c"]

ENV ELIXIR_VERSION="1.9.1" \
    ELIXIR_SHA256="94daa716abbd4493405fb2032514195077ac7bc73dc2999922f13c7d8ea58777"

WORKDIR /usr/src/elixir-${ELIXIR_VERSION}

RUN echo "Installing Elixir ${ELIXIR_VERSION}" \
# Install Elixir build tools
  && apk update \
  && apk add --no-cache --virtual .elixir_build_tools \
    git="2.22.0-r0" \
    make="4.2.1-r2" \
# Download and unpack Elixir
  && ELIXIR_SRC_URL="https://github.com/elixir-lang/elixir/archive/v${ELIXIR_VERSION}.tar.gz" \
  && ELIXIR_SRC_ARCHIVE="/tmp/elixir-src-${ELIXIR_VERSION}.tar.gz" \
  && wget -O "${ELIXIR_SRC_ARCHIVE}" "${ELIXIR_SRC_URL}" \
  && echo "${ELIXIR_SHA256}  ${ELIXIR_SRC_ARCHIVE}" | sha256sum -c \
  && tar -xzf "${ELIXIR_SRC_ARCHIVE}" -C "$(pwd)" --strip-components=1 \
# Build and install Elixir
  && make -j"$(nproc)" \
  && epmd -daemon \
  && make -j"$(nproc)" test \
  && make -j"$(nproc)" install \
  && make -j"$(nproc)" clean \
# Delete Elixir build tools
  && apk del .elixir_build_tools

RUN echo "Installing Hex" \
  && mix local.hex --force

CMD ["iex"]

Output snippet:

--rpc-eval : RPC failed with reason :nodedown
1) test executes rpc instructions (Mix.Tasks.ReleaseTest)
test/mix/tasks/release_test.exs:272
match (=) failed
code: assert {pid, 0} = System.cmd(script, ["pid"])
right: {"", 1}
stacktrace:
test/mix/tasks/release_test.exs:286: anonymous fn/1 in Mix.Tasks.ReleaseTest."test executes rpc instructions"/1
(mix) lib/mix/project.ex:352: Mix.Project.in_project/4
(elixir) lib/file.ex:1542: File.cd!/2
test/test_helper.exs:120: MixTest.Case.in_fixture/3
test/mix/tasks/release_test.exs:273: (test)

Could it be as simple as a previous assertion closing the port?

assert System.cmd(script, ["stop"]) == {"", 0}

Either way, this behavior should be consistent across platforms.

I can confirm this was the culprit for failing builds. Swapping the assertions around, having "stop" last, solves the issues on my end.

I wonder why it was working passing (where I expect it to be failing) on other platforms. Maybe someone could shed some light on this?

@wuhkuh can you please send a PR with the changes you have done? Or simply share a commit? I would love to take a look at it so I can answer your questions correctly. Thanks!

This is the patch file I used to fix the failing tests:

--- lib/mix/test/mix/tasks/release_test.exs
+++ lib/mix/test/mix/tasks/release_test.exs

@@ -281,10 +281,9 @@ defmodule Mix.Tasks.ReleaseTest do
         open_port(script, ['start'])
         wait_until_decoded(Path.join(root, "RELEASE_BOOTED"))
         assert System.cmd(script, ["rpc", "ReleaseTest.hello_world"]) == {"hello world\n", 0}
-        assert System.cmd(script, ["stop"]) == {"", 0}
-
         assert {pid, 0} = System.cmd(script, ["pid"])
         assert pid != "\n"
+        assert System.cmd(script, ["stop"]) == {"", 0}
       end)
     end)
   end

If this looks good, I can send in a PR later today.

Yes, please do send a PR!

Awesome! Now I can pull it and apply it to get Elixir 1.9.0 in Alpine Linux' repositories.

I was planning to look at this but I've been on vacation and otherwise been terribly busy.

Thank you both @wuhkuh and @josevalim! :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ericmj picture ericmj  路  3Comments

DEvil0000 picture DEvil0000  路  3Comments

andrewcottage picture andrewcottage  路  3Comments

lukaszsamson picture lukaszsamson  路  3Comments

Irio picture Irio  路  3Comments