Nixpkgs: firefox fails to build on latest master if compiled with a AMD EPYC 7401P CPU

Created on 4 Apr 2020  Â·  21Comments  Â·  Source: NixOS/nixpkgs

Describe the bug
Firefox fails to build on latest master (78bfdbb291fd20df0f0f65061ee3081610b0a48f). We originally suspected rust-cbindgen to be the culprit (as derivation before that was in the binary cache)

So for some reason it must have succeeded on hydra at least once, but clearly is flaky of some sort - we haven't yet determined whether it's luck, compiling with the right CPUs or something else. :-/

free(): invalid pointer
In file included from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Latin1.h:17,
                 from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/TextUtils.h:13,
                 from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Utf8.h:19,
                 from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/RecordReplay.h:16,
                 from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Atomics.h:22,
                 from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/nsISupportsImpl.h:24,
                 from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/nsISupportsUtils.h:14,
                 from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/nsISupports.h:79,
                 from /build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/nsIEventTarget.h:10,
                 from /build/firefox-74.0.1/xpcom/base/MemoryTelemetry.h:12,
                 from /build/firefox-74.0.1/xpcom/base/MemoryTelemetry.cpp:7,
                 from Unified_cpp_xpcom_base1.cpp:2:
/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Tuple.h: In instantiation of 'struct mozilla::detail::TupleImpl<5, long unsigned int, long unsigned int, long unsigned int, long unsigned int, bool, bool, mozilla::dom::UserActivation::State, bool, RefPtr<mozilla::dom::FeaturePolicy>, unsigned int, nsID, bool, bool, bool, mozilla::dom::GVAutoplayRequestStatus, mozilla::dom::GVAutoplayRequestStatus, float, mozilla::dom::OrientationType>':
/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Tuple.h:114:8:   recursively required from 'struct mozilla::detail::TupleImpl<1, bool, bool, nsILoadInfo::CrossOriginEmbedderPolicy, nsILoadInfo::CrossOriginOpenerPolicy, long unsigned int, long unsigned int, long unsigned int, long unsigned int, bool, bool, mozilla::dom::UserActivation::State, bool, RefPtr<mozilla::dom::FeaturePolicy>, unsigned int, nsID, bool, bool, bool, mozilla::dom::GVAutoplayRequestSta>
/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Tuple.h:114:8:   required from 'struct mozilla::detail::TupleImpl<0, nsTString<char16_t>, bool, bool, nsILoadInfo::CrossOriginEmbedderPolicy, nsILoadInfo::CrossOriginOpenerPolicy, long unsigned int, long unsigned int, long unsigned int, long unsigned int, bool, bool, mozilla::dom::UserActivation::State, bool, RefPtr<mozilla::dom::FeaturePolicy>, unsigned int, nsID, bool, bool, bool, mozilla::dom::GVAutoplayR>
/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Tuple.h:224:7:   required from 'class mozilla::Tuple<nsTString<char16_t>, bool, bool, nsILoadInfo::CrossOriginEmbedderPolicy, nsILoadInfo::CrossOriginOpenerPolicy, long unsigned int, long unsigned int, long unsigned int, long unsigned int, bool, bool, mozilla::dom::UserActivation::State, bool, RefPtr<mozilla::dom::FeaturePolicy>, unsigned int, nsID, bool, bool, bool, mozilla::dom::GVAutoplayRequestStatus, mo>
/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/dom/SyncedContext.h:162:14:   required from 'struct mozilla::dom::syncedcontext::FieldStorage<void, nsTString<char16_t>, bool, bool, nsILoadInfo::CrossOriginEmbedderPolicy, nsILoadInfo::CrossOriginOpenerPolicy, long unsigned int, long unsigned int, long unsigned int, long unsigned int, bool, bool, mozilla::dom::UserActivation::State, bool, RefPtr<mozilla::dom::FeaturePolicy>, unsigned int, nsID, bool, bool, >
/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/dom/BrowsingContext.h:128:3:   required from here
/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Tuple.h:114:8: internal compiler error: Aborted
  114 | struct TupleImpl<Index, HeadT, TailT...>
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
make[3]: *** [/build/firefox-74.0.1/config/rules.mk:729: Unified_cpp_xpcom_base1.o] Error 1
make[3]: Leaving directory '/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/xpcom/base'
make[2]: *** [/build/firefox-74.0.1/config/recurse.mk:74: xpcom/base/target-objects] Error 2
make[3]: Leaving directory '/build/firefox-74.0.1/obj-x86_64-pc-linux-gnu/netwerk/cache2'

Full build log (xz compressed)

To Reproduce
Steps to reproduce the behavior:

  1. nix-build -A firefox

Expected behavior
The package to be built.

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute: firefox

cc @bhipple @alyssais @andir

bug

Most helpful comment

Maybe worth opening a bug upstream, listing the hardware on which we could reproduce? I mean, we can rule out software differences, as the whole toolchain is described in the nixpkgs checkout…

All 21 comments

Is this still a flaky failure? I've just compiled it from master (again) and it worked. I did one run on an AMD CPU and another on an Intel CPU... not sure if that is relevant (had some discussions with flokli about that).

I also just built the latest firefox off of master. If it helps, I was using an AWS c5.9xlarge instance, which is Intel.

I can reproducibly get the build to fail on said box with a "AMD EPYC 7401P 24-Core Processor", even with --options cores 1. (Linux 5.5.9)

Maybe worth opening a bug upstream, listing the hardware on which we could reproduce? I mean, we can rule out software differences, as the whole toolchain is described in the nixpkgs checkout…

I can reproducibly get the build to fail on said box with a "AMD EPYC
7401P 24-Core Processor".

My build failures are... also on AMD EPYC 7401P 24-Core Processor, since
the revision I posted before.

Build does not fail on a Ryzen 3900X.

I've seen this fail on another build server which is an AMD EPYC. I've tested it on an Intel Xeon E3-1275v6, and that doesn't fail.

I've seen this fail on another build server which is an AMD EPYC. I've
tested it on an Intel Xeon E3-1275v6, and that doesn't fail.

Can you tell us which EPYC it was? (Try lscpu.)

Building Firefox from source on the same EPYC as before in a Fedora
container worked for me, so looks like maybe something to do with us?

I've seen this fail on another build server which is an AMD EPYC. I've
tested it on an Intel Xeon E3-1275v6, and that doesn't fail.

Can you tell us which EPYC it was? (Try lscpu.)

It's an EPYC 7401P.

I've ran creduce on an example crashing file and got it down to a reasonable 1216 bytes of .. bytes, though i'm not entirely sure how much of this is necessary to reproduce it, as it's not 100% reliable. https://gist.github.com/puckipedia/31f2cb955274a173dddc6817cb91ba71 contains both the reduced test-case and the command i used to reproduce it (in blank environment). It takes a few tries, but eventually crashes with free(): invalid pointer

I can reproduce this on a threadripper 1920x with linux 5.4.28.

Running some more creduce (and crashing *it* as well), i ended up with a smaller (but less reliable) test case, stored in the same gist. I can actually only get this to reproduce with a larger command line, which i also stored there. This is a super confusing error.

@puckipedia
Ryzen 2700X (zen+): can't reproduce.
GCC output: https://0x0.st/iuhH.txt
uname -a: https://0x0.st/iuh8.txt
/proc/cpuinfo: https://0x0.st/iuhK.txt
ran it with watch -n1 ./test.sh for a good 5 minutes to no results.

Does anyone know what the status is for this?

According to #85997, this still seems to be an issue. I have moved my builds to another box, so didn't investigate further. It might be a good idea to open a bug somewhere upstream, but I don't know where :-D

reddit.com/r/AyyMD ?

In all seriousness, probably should report it to the gcc-bugs mailing list

A workaround is to build Firefox using clangStdenv.

I don't have access to the hardware anymore. Can anyone with it open a bug report there?

Does this problem reproduce with any other GCC build?

Op vr 19 jun. 2020 om 00:22 schreef Florian Klink <[email protected]

:

I don't have access to the hardware anymore. Can anyone with it open a bug
report there?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/NixOS/nixpkgs/issues/84283#issuecomment-646334812,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAE57JD4DJJ6SWI6FTLDRMTRXKHR7ANCNFSM4L7FXZKQ
.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

thoughtpolice picture thoughtpolice  Â·  71Comments

globin picture globin  Â·  65Comments

timokau picture timokau  Â·  66Comments

purefn picture purefn  Â·  68Comments

samueldr picture samueldr  Â·  88Comments