Mailsync processes running in the background leak massive amounts of OS handles. All of the leaked handles are handles to events that were never signaled.
Basically all operations on the account (including sync) increase the handle count by amounts varying from 1 to over 100k. Simply opening a message that Mailspring needs to download creates >1k of handles.
Log file is available here
Over time this leads to millions of open handles, which causes excessive memory usage and slight reduction of performance when opening unread messages. OS functionality is degraded - after certain number of handles, the NtQuerySystemInformation(SystemHandleInformation) API for querying all handles open in the OS starts to fail and debugging/security/anti-cheat tools cease to work correctly.
I encountered this issue with two Gmail accounts I tested.
No
Windows 10 x64 version 16299
1.0.9
--
Bug?
No
Can't determine, all my accounts are on Gmail
Any message, any Gmail account
Hey! Thanks for the detailed report, this is really interesting. Let me see if this is also happening on Mac and Linux, it's definitely odd that it'd be windows-specific.
Quick update: I've been able to reproduce this. Inspecting the mailsync process in windbg shows that all of these leaked event handles are created here:
0x00007ffa47cc59d4: ntdll!NtCreateEvent+0x0000000000000014
0x00000000538146e2: wow64!whNtCreateEvent+0x0000000000000062
0x0000000053816245: wow64!Wow64SystemServiceEx+0x0000000000000155
0x0000000053801c87: wow64cpu!ServiceNoTurbo+0x000000000000000b
0x000000005381bdb2: wow64!RunCpuSimulation+0x0000000000000022
0x000000005381bcc0: wow64!Wow64LdrpInitialize+0x0000000000000120
0x00007ffa47c96e91: ntdll!_LdrpInitialize+0x00000000000000dd
0x00007ffa47c96d5e: ntdll!LdrInitializeThunk+0x000000000000000e
0x0000000076f8707c: ntdll_76f10000!NtCreateEvent+0x000000000000000c
0x00000000767fccd6: KERNELBASE!CreateEventExW+0x0000000000000066
0x00000000767fcc33: KERNELBASE!CreateEventA+0x0000000000000033
0x00000000658a28c1: pthreadVC2!pthread_timechange_handler_np+0x00000000000015c4
--------------------------------------
Handle = 0x00000000000ef940 - OPEN
Thread ID = 0x0000000000000f04, Process ID = 0x00000000000007f0
It looks like this particular event is broadcast to condition_variables (which mailsync uses pretty extensively) so they can evaluate whether they should wake (https://www.sourceware.org/pthreads-win32/manual/pthread_timechange_handler_np.html).
It's unclear why these aren't getting cleaned up, and also unclear why pthread_timechange_handler_np is being called so frequently. It's docs say "To improve tolerance against operator or time service initiated system clock changes... this routine can be called by an application when it receives a WM_TIMECHANGE message from the system.", which doesn't sound like something that should be happening a zillion times.
Interesting - this page notes that virtual machine time sync can do this (https://www.greyware.com/kb/KB2015.401.asp: "Another possible cause of unexpected clock change are time sync features of virtual machines. Be sure that the VMWare Tools/Hyper-V Integration Services Time Sync features are turned off for your VMs.")
Any chance you're running Mailspring on Windows inside a VM? (I'm using VMWare Fusion)
No, but I have Hyper-V enabled. It always runs a hypervisor underneath my host system, but as far as I know no time sync features touch the host machine.
Hey! Thanks for the quick reply. In my VM, I disabled Time Synchronization from the virtual machine settings and it didn't have any effect, but disabling Hyper-V in Windows (by typing bcdedit /set hypervisorlaunchtype off on an elevated command prompt) fixed the issue.
Definitely need to find a workaround for this (since Hyper-V is enabled by default I think...) but this should narrow it down a lot!
I just tried disabling Hyper-V according your suggestion and was still able to reproduce the bug.
Also, your htrace stacktrace is a bit misleading. pthreadVC2!pthread_timechange_handler_np+0x00000000000015c4 is in fact inside pthread_mutex_init exported function.
Edit: after a quick look I think the handles are created by the constructor named mailcore::AutoreleasePool::AutoreleasePool(void). The corresponding destructor function does not directly nor indirectly call pthread_mutex_destroy.
Edit2: not sure if this is the matching source, but could this be the culprit? The destructor does not free the mutex.
https://github.com/MailCore/mailcore2/blob/f708ce74e23b61ec6e5ae958eba0b8bcd8831a1e/src/core/basetypes/MCObject.cpp#L43
Not sure if github sends notifications about comment edits, so let me ping: @bengotow
Hey! Ahh good catch鈥攖hat MCObject lock is definitely suspicious. Just read it over and it seems like there should be a call to pthread_mutex_destroy(&mLock); in the object destructor. I'm going to insert it and see if things look better.
I recompiled the app with that additional line and it 1) doesn't crash and 2) prevents the handle count from increasing indefinitely. (I also turned Hyper-V back on and confirmed that I see the 20k+ open handles without the change.)
I think that's a wrap! Really glad we got to the bottom of this. We've had some reports of Mailspring causing Windows to "hang for an extended period of time" when the computer wakes from sleep, and I think it was caused by this issue (having a zillion leaked handles around when WM_TIMECHANGE is emitted).
I'm gonna PR this change into Mailcore2 and we'll see if @dinhviethoa has any thoughts.
Go ahead and send this PR! Thanks a lot!
Hey! The fix for this has shipped in 1.0.10 - thanks for the help getting this narrowed down and fixed quickly.