dart --version): 2.2.0I've been using dart server for a few months now and really enjoying it, and have recently rolled it out into production for a project with a relatively small user base. For the most part, things are going smoothly! However, I think I've hit what may be a rough edge at the intersection of isolates + "shared" HttpServer functionality (think "cluster" functionality from node.js). The shortest way to put it is as follows: when one isolate (running one of multiple "shared" HttpServer instances) exits, a SEPARATE and seemingly RANDOM other isolate (also running a "shared" HttpServer instance on the same port) dies. With --enable-asserts, it's possible to see that that other isolate is asserting on _NativeSocket.multiplex (dart:io/runtime/binsocket_patch.dart:906:18).
I've created a stripped down repository containing the minimum code necessary for repro: https://github.com/emeyex/dart-iso-server-crash-repro.
I'm not sure if this should be multiple issues, or if that assert failure is the root of the hard crash as well. Please let me know if I can provide any more info.
We've just had a hard crash on one of our production instances, it's hard to tell for sure but it seems related to the above scenario, as it occurred right after a worker restart. However the crash dump looks a little bit different:
server_1 | ===== CRASH =====
server_1 | version=2.2.0 (Unknown timestamp) on "linux_x64"
server_1 | si_signo=Segmentation fault(11), si_code=1, si_addr=0x56489d603e30
server_1 | thread=2828, isolate=server.dart:_workerEntryPoint()(0x56489afcc900)
server_1 | [0x0000564851a3c059] dart::ClassTable::UpdateLiveNew(long, long)
server_1 | [0x0000564851a3c059] dart::ClassTable::UpdateLiveNew(long, long)
server_1 | [0x0000564851b84c67] dart::RawObject::VisitPointersPredefined(dart::ObjectPointerVisitor*, long)
server_1 | [0x0000564851d68443] dart::Scavenger::IterateStoreBuffers(dart::Isolate*, dart::ScavengerVisitor*)
server_1 | [0x0000564851d685da] dart::Scavenger::IterateRoots(dart::Isolate*, dart::ScavengerVisitor*)
server_1 | [0x0000564851d691bb] dart::Scavenger::Scavenge()
server_1 | [0x0000564851d5d3b5] dart::Heap::CollectNewSpaceGarbage(dart::Thread*, dart::Heap::GCReason)
server_1 | [0x0000564851d5beb3] dart::Heap::AllocateNew(long)
server_1 | [0x0000564851ad80f2] dart::Object::Allocate(long, long, dart::Heap::Space)
server_1 | [0x0000564851b3b95c] dart::Instance::New(dart::Class const&, dart::Heap::Space)
server_1 | [0x0000564851bbeb76] dart::DRT_AllocateObject(dart::NativeArguments)
server_1 | [0x00007f96896c0fa8] Unknown symbol
server_1 | [0x00007f9668dc015f] Unknown symbol
server_1 | [0x00007f966870a4f7] Unknown symbol
server_1 | [0x00007f9668560812] Unknown symbol
server_1 | [0x00007f96685603a1] Unknown symbol
server_1 | [0x00007f966873a009] Unknown symbol
server_1 | [0x00007f9668739b6a] Unknown symbol
server_1 | [0x00007f96808df5e7] Unknown symbol
server_1 | [0x00007f966870be62] Unknown symbol
server_1 | [0x00007f966870bd4b] Unknown symbol
server_1 | [0x00007f966870633a] Unknown symbol
server_1 | [0x00007f9668706115] Unknown symbol
server_1 | [0x00007f9668705980] Unknown symbol
server_1 | [0x00007f966870b9c5] Unknown symbol
server_1 | [0x00007f966870b5c8] Unknown symbol
server_1 | [0x00007f966870a716] Unknown symbol
server_1 | [0x00007f9668560812] Unknown symbol
server_1 | [0x00007f96685603a1] Unknown symbol
server_1 | [0x00007f966873a009] Unknown symbol
server_1 | [0x00007f9668739b6a] Unknown symbol
server_1 | [0x00007f96808e90bd] Unknown symbol
server_1 | [0x00007f966870be62] Unknown symbol
server_1 | [0x00007f966870bd4b] Unknown symbol
server_1 | [0x00007f966870633a] Unknown symbol
server_1 | [0x00007f9668706115] Unknown symbol
server_1 | [0x00007f9668705980] Unknown symbol
server_1 | [0x00007f966870b9c5] Unknown symbol
server_1 | [0x00007f966870b5c8] Unknown symbol
server_1 | [0x00007f966870a716] Unknown symbol
server_1 | [0x00007f9668560812] Unknown symbol
server_1 | [0x00007f96685603a1] Unknown symbol
server_1 | [0x00007f966873a009] Unknown symbol
server_1 | [0x00007f9668739b6a] Unknown symbol
server_1 | [0x00007f96808ea9e6] Unknown symbol
server_1 | [0x00007f966870be62] Unknown symbol
server_1 | [0x00007f966870bd4b] Unknown symbol
server_1 | [0x00007f966870633a] Unknown symbol
server_1 | [0x00007f9668706115] Unknown symbol
server_1 | [0x00007f9668705980] Unknown symbol
server_1 | [0x00007f966870b9c5] Unknown symbol
server_1 | [0x00007f966870b5c8] Unknown symbol
server_1 | [0x00007f966870a716] Unknown symbol
server_1 | [0x00007f9668709406] Unknown symbol
server_1 | [0x00007f9668709154] Unknown symbol
server_1 | [0x00007f9668df66b4] Unknown symbol
server_1 | [0x00007f9668df640d] Unknown symbol
server_1 | [0x00007f9668df5a1d] Unknown symbol
server_1 | [0x00007f966873c562] Unknown symbol
server_1 | [0x00007f966873c423] Unknown symbol
server_1 | [0x00007f9668708d9b] Unknown symbol
server_1 | [0x00007f9668708945] Unknown symbol
server_1 | [0x00007f966870881d] Unknown symbol
server_1 | [0x00007f9668703b7c] Unknown symbol
server_1 | [0x00007f9668de9942] Unknown symbol
server_1 | [0x00007f96896c146c] Unknown symbol
server_1 | [0x0000564851a6c327] dart::DartEntry::InvokeFunction(dart::Function const&, dart::Array const&, dart::Array const&, unsigned long)
server_1 | [0x0000564851a6f6f6] dart::DartLibraryCalls::HandleMessage(dart::Object const&, dart::Instance const&)
server_1 | [0x0000564851aa058e] dart::IsolateMessageHandler::HandleMessage(dart::Message*)
server_1 | [0x0000564851ad19ee] dart::MessageHandler::HandleMessages(dart::MonitorLocker*, bool, bool)
server_1 | [0x0000564851ad2036] dart::MessageHandler::TaskCallback()
server_1 | [0x0000564851bfff17] dart::ThreadPool::Worker::Loop()
server_1 | [0x0000564851bffda4] dart::ThreadPool::Worker::Main(unsigned long)
server_1 | [0x0000564851b6de39] Unknown symbol
server_1 | -- End of DumpStackTrace
We are now getting that last crash dump pretty frequently when we close and then restart a worker isolate on our prod server. We're more than likely doing something wrong to exacerbate the situation, but still it seems like if _we're_ the ones doing something wrong, it should trigger an exception/error in dart, correct?
Also, I wanted to re-iterate that the originally posted sample yields a 100% repro of a dart assertion failure simply by running dart --enable-asserts bin/server.dart... not sure if the two are related but there definitely seem to be some issues around isolate closing/starting, or else it's very possible we're doing something terribly wrong as well. Please let me know if I can provide any more info!
I can reproduce this on Linux with your reproduction. If I were to guess this is a race between shared socket implementation and isolate shutdown.
@sortie Can you take a look as dart:io maintainer?
I can confirm I can reproduce the issue as well. I'll take a closer look.
===== CRASH =====
version=2.2.0 (Tue Feb 26 15:04:32 2019 +0100) on "linux_x64"
si_signo=Segmentation fault(11), si_code=1, si_addr=0x8
thread=125709, isolate=(null)((nil))
[0x000055a346542871] Unknown symbol
[0x000055a346542871] Unknown symbol
[0x000055a346541944] dart::bin::EventHandlerImplementation::HandleInterruptFd()
[0x000055a34654200c] dart::bin::EventHandlerImplementation::HandleEvents(epoll_event*, int)
[0x000055a3465420db] dart::bin::EventHandlerImplementation::Poll(unsigned long)
[0x000055a346561a8e] Unknown symbol
-- End of DumpStackTrace
Aborted
Any update on this issue? We just updated one of our services to dart2 and the vm crashing on this pretty frequently (like every couple of hours).
@sortie are you looking at this issue, we seem to have another reproduction of it (see above).
Any update on this issue would be much appreciated!
re-assigning issue to @ZichangG as @sortie is busy.
@emeyex I tested with dart --enable-asserts bin/server.dart for more than 2 hours but failed to reproduce the issue. How long do I expect to see the crash? Do I miss anything?
@ZichangG If you run with asserts, it can be easy to miss the output as it scrolls by so quickly. Try running without asserts maybe? I repro'd it again within 5 minutes of running (now with dart 2.5.2):
===== CRASH =====
si_signo=Segmentation fault: 11(11), si_code=1, si_addr=0x8
GET / 127.0.0.1 -> Hello from worker: 1
version=2.5.2 (Tue Oct 8 16:17:11 2019 +0200) on "macos_x64"
thread=2563, isolate=(null)(0x0)
pc 0x000000010b504c41 fp 0x00007000049195b0 dart::bin::EventHandlerImplementation::SendData(long, long long, long long)
pc 0x000000010b5039fd fp 0x0000700004919880 dart::bin::EventHandlerImplementation::HandleInterruptFd()
pc 0x000000010b50440a fp 0x0000700004919ee0 dart::bin::EventHandlerImplementation::EventHandlerEntry(unsigned long)
pc 0x000000010b52bdbd fp 0x0000700004919f10 dart::bin::Thread::GetMaxStackSize()
pc 0x00007fff66c5a2eb fp 0x0000700004919f30 _pthread_body
pc 0x00007fff66c5d249 fp 0x0000700004919f50 _pthread_start
pc 0x00007fff66c5940d fp 0x0000700004919f78 thread_start
-- End of DumpStackTrace
Just to clarify, you're running dart bin/server.dart as well as dart bin/client.dart at the same time?
I can reproduce! Let me take a look later.
../../runtime/bin/eventhandler.h: 555: error: expected: entry != NULL
version=2.6.0-edge.0449905e2de69724e5fe4acc9c7589c8cab4046b (Tue Oct 22 13:40:01 2019 +0000) on "linux_x64"
thread=257467, isolate=(null)((nil))
pc 0x0000560a1651590c fp 0x00007ffa7007a730 dart::Profiler::DumpStackTrace(void*)
pc 0x0000560a16131f32 fp 0x00007ffa7007a810 dart::Assert::Fail(char const*, ...)
GET / 127.0.0.1 -> Hello from worker: 2
pc 0x0000560a1611d35c fp 0x00007ffa7007a850 /usr/local/google/home/zichangguo/dart-sdk/sdk/out/DebugX64/dart+0x17f735c
pc 0x0000560a1611bdb2 fp 0x00007ffa7007ab40 dart::bin::EventHandlerImplementation::HandleInterruptFd()
pc 0x0000560a1611c5bc fp 0x00007ffa7007ac40 dart::bin::EventHandlerImplementation::HandleEvents(epoll_event*, int)
GET / 127.0.0.1 -> Hello from worker: 1
pc 0x0000560a1611c7b1 fp 0x00007ffa7007ae40 dart::bin::EventHandlerImplementation::Poll(unsigned long)
pc 0x0000560a16175cc5 fp 0x00007ffa7007ae70 /usr/local/google/home/zichangguo/dart-sdk/sdk/out/DebugX64/dart+0x184fcc5
-- End of DumpStackTrace
Aborted
A brief update for root cause.
In this case, each isolate will create its own socket to bind to the same address. In VM, we only keep one copy of OS socket if that's shared(Of course, address and port should be same). This socket will be used by different dart sockets.
When one of isolates get killed, it calls finalizer to clean up. Finalizer needs to close the socket living inside dead isolate. But it didn't store all connected dart sockets so ends up closing a random dart socket. This causes
when one isolate (running one of multiple "shared" HttpServer instances) exits, SEPARATE and seemingly RANDOM other isolate (also running a "shared" HttpServer instance on the same port) dies.
I will work on a fix. Thanks for patience.
@ZichangG exciting to see your commit 鈽濓笍- does it seem to fix the issue? Any idea on when that might land in a dart release?
This pr should fix the issue. It seems to be rolled into 2.6.1.
@ZichangG I just upgraded to 2.6.1 and tried it out, but I still got the crash using my little test harness. I'm not quite sure how to verify whether your commit is in that version, but looking at the changelog (https://github.com/dart-lang/sdk/commit/796d6cc344b93c0432b404173438e0343fc6c2b8#diff-4ac32a78649ca5bdd8e0ba38b7006a1e), it seems like maybe it's not in yet?
This will be included in the 2.7 release, which will be early December 2019. If this is very urgent, you can request for another release before that.
That's great news. We're fine waiting til December, great to know that it's on its way 馃憤
Excellent news!
@ZichangG Do you know when this will be available in the dev channel? Or is it already in 2.7.0-dev.0.0?
I'm not very familiar with release process. But 2.7 has not been released. I'll close this issue for now. If you encounter any problem, feel free to reopen.
@a-siva Can you answer their question regarding dart 2.7?
The fix is not in 2.7.0-dev-0.0, it will be there in the next dev release.
Awesome, thank you!
Most helpful comment
A brief update for root cause.
In this case, each isolate will create its own socket to bind to the same address. In VM, we only keep one copy of OS socket if that's shared(Of course, address and port should be same). This socket will be used by different dart sockets.
When one of isolates get killed, it calls finalizer to clean up. Finalizer needs to close the socket living inside dead isolate. But it didn't store all connected dart sockets so ends up closing a random dart socket. This causes
I will work on a fix. Thanks for patience.