As soon as you are able, Upgrade to Kernel 5.6.13. Joplin will not work properly otherwise due to all of the information listed below.
This issue has been resolved but Iâm leaving it open for the time being only so users that need it still can find it while Kernel Upgrades roll out still.
Alright, so this is a continuation of #2518. The bug report became horribly bloated and unmanageable. This is an attempt to fix that.
Users on various different Linux distributions that all are running Linux Kernel 5.5 or 5.6 and their updates.
To find out what Kernel you are running and get other useful distribution information, please run in your terminal uname -a and copy the response here.
Arch Linux
Fedora 31+
Solus
Clear Linux
Void Linux
Debian Testing
Joplin's Sidebar, Notebook panel, and Notes freeze during various syncing tasks.
Perform normal tasks with Joplin and allow it to start syncing. It will eventually freeze during Syncing and / or Decrypting of notes. The best way to manually trigger this bug, just make some edits to a note and then start clicking the Synchronize button a few times until you can no longer interact with the UI.
The Linux Kernel maintainers pushed a commit that changed how asynchronous input / output worked, leading to Electron and a small handful other frameworks having this issue. It doesn't look like the issue will be fixed upstream anytime soon.
linux-lts kernel package.Tools=>OptionsWeb-clipper tab.Apply and then Back[I] justa@lame-ass-host~ []> uname -a
Linux lame-ass-host 5.6.4-artix1-1 #1 SMP PREEMPT Fri, 17 Apr 2020 14:57:51 +0000 x86_64 GNU/Linux
My tests are also primarily using Nextcloud syncing target. I do have a davfs mount point setup, but due to the age of my system and possibly my filesystem type (btrfs), it is extremely slow and not at all ideal for FIlesystem syncing.
Late January 25th Early 26th - credit @taw00
Deadlock Tests - set various Max Connection Settings under Tools=>Options=>Synchronization=>Show Advanced Settings
- 1 Tests: StanczakDominik , bedwardly-down
- 20 Tests: bedwardly-down , m-angelov
Daniel Souza's Code Fix Test - comment out / run if not Linux platform specific parts of synchronizer.js to see if that solves the issue
Does Encryption matter for triggering this bug? - credits to @figue
5.7-RC5 test - waiting on confirmation from other users.
5.6.13 with Epoll Fixes test - Switching to FIlesystem sync with a locally mounted Nextcloud instance solved my issue here. 5.6.13 seems to solve the core issues.
When switching the web clipper on and off, what happens is that a server is started then stopped. Do you have any idea why that would unfreeze the UI?
If we could understand better what's happening maybe we could add a workaround. Like now I'm thinking we could integrate some dummy server that the sync process would start and stop at regular intervals. Don't know if that would help but could be worth a try.
In the original bug report, there was some code work early on and one tester found that the synchronizer.js code could have parts of it commented out and the issue wouldnât occur. I believe the part was where it was checking for changes with the server mainly. Another user commented that the way the code was written was part of why the kernel commit caused this problem. They theorized that the synchronization service was going to âsleepâ and wasnât being woken up like in 5.4 .
There wasnât enough users working on that aspect of testing and my kernel research wasnât really useful to them, so they eventually stopped and no one picked it up from there.
@hexclover what do you think of @laurent22âs thoughts? Werenât you the one that found the synchronizer.js stuff originally?
Hmmm. I just found, by using strace -f -etrace=epoll_ctl, that toggling the web clipper server generates these calls to epoll_ctl reliably:
[pid 18485] epoll_ctl(3, EPOLL_CTL_ADD, 53, {EPOLLOUT, {u32=53, u64=53}}) = 0
[pid 18485] epoll_ctl(3, EPOLL_CTL_DEL, 53, 0x7ffd94c703e0) = 0
[pid 18485] epoll_ctl(3, EPOLL_CTL_ADD, 64, {EPOLLIN, {u32=64, u64=64}}) = 0
[pid 18485] epoll_ctl(3, EPOLL_CTL_DEL, 64, 0x7ffd94c713b0) = 0
The first 3 are issued by enabling and the last by disabling -- perhaps they are related to creating (destroying) the server and make it start (stop) listening on some port. It may help explain why this trick unfreezes the UI.
This still does not tell us whether Joplin/Node.js/Electron/... is to blame, though. I don't have much time to look into it.
BTW, ever after I managed to trigger the freeze once or twice today I fail to trigger it for a next time following the same steps. I think the lack of a reliable way to replicate the bug really adds to the difficulty in debugging.
P. S. I think I wasn't the first one to locate synchronizer.js, at least not the first one to mention it in #2518 :-)
You didnât locate it, sure. Remember, i prodded you in the right direction with it since i had previously worked with it a bit? Either way, that information is extremely valuable and thanks for getting back to me, @hexclover. I hope youâre safe. ;)
Also, reliably triggering the bug could be an issue with the all of the updates the kernel developers make to "fix" bugs that appear from the previous patches.
EDIT: I can definitely verify that Joplin calls EPOLL while syncing. I found the Process ID for its main thread and during syncing, I got this output:
[I] justa@lame-ass-host~ []> pidof joplin
5950 5948 5935 5915 5913 5908 5905
[I] justa@lame-ass-host~ []> sudo strace -p 5950 -etrace=epoll_ctl
[sudo] password for justa:
strace: Process 5950 attached
epoll_ctl(3, EPOLL_CTL_DEL, 56, 0x7ffc84f076e0) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 56, {EPOLLOUT, {u32=56, u64=56}}) = 0
epoll_ctl(3, EPOLL_CTL_DEL, 45, 0x7ffc84f076e0) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 45, {EPOLLOUT, {u32=45, u64=45}}) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 56, {EPOLLIN, {u32=56, u64=56}}) = -1 EEXIST (File exists)
epoll_ctl(3, EPOLL_CTL_MOD, 56, {EPOLLIN, {u32=56, u64=56}}) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 45, {EPOLLIN, {u32=45, u64=45}}) = -1 EEXIST (File exists)
epoll_ctl(3, EPOLL_CTL_MOD, 45, {EPOLLIN, {u32=45, u64=45}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 56, {EPOLLIN, {u32=56, u64=56}}) = 0
epoll_ctl(3, EPOLL_CTL_MOD, 45, {EPOLLIN, {u32=45, u64=45}}) = 0
epoll_ctl(3, EPOLL_CTL_DEL, 49, 0x7ffc84f076e0) = 0
epoll_ctl(3, EPOLL_CTL_DEL, 52, 0x7ffc84f076e0) = 0
If you hadn't posted your output, I wouldn't have found out how to take it a bit further.
Could it be a deadlock issue when accessing many files simultaneously? I wonder if the bugs happen if you set the max number of simultaneous downloads to 1?
I can check.
Or conversely, maybe by increasing it to 20 or more it would be possible to consistently replicate the bug, which could make fixing it easier.
That's the Max Simultaneous Connections part under Synchronization, isn't it?
Setting it to a Max Connection of One is showing no signs of the bug. Just extra remote items created and deleted during its initial check. I create 123 duplicates of a test note with a simple image in it, fully synced, deleted, then fully synced again. After it finished syncing, it sent 124 remote items and then deleted all of them even though it had done that in the previous step.
EDIT: @hexclover, can you test this out with me? I'm getting some interesting results. If I set Max Simultaneous Connections to either 1 or 20, the bug isn't appearing at all so far while leaving it at the default 5 is definitely showing the issue. I wonder if there's something happening with a calculation or something because of that number or the default settings are where the issue lies?
I noticed awhile back that leaving Sync settings as DropBox without changing it threw a similar issue when not putting in any credentials or doing OAUTH or whatever it uses.
@laurent22, the issue finally happened at the end of a 1024 note (with a 5 MB video file attached for good measure) on 20. Using a strace like before but checking on all Joplin processes, when this bug shows up, Epoll calls are frozen until doing the Webclipper fix. It took almost 10 minutes for it to finally show up and when it did, the app completely froze for a couple of minutes. Settings still worked, so I was able to get it back up and running.
Finally, something I can actually help debug! I'll try Max Connections = 1 over the next few days and see whether the bug comes back.
Finally, something I can actually help debug! I'll try Max Connections = 1 over the next few days and see whether the bug comes back.
Testing is fully welcome. The fact that Laurent, the lead Joplin dev here, is on this is a good thing. Let's try to get this bug tackled in some form while he has time to do something about it. :D
Also, @StanczakDominik , can I get you to run uname -a in your terminal and have that pasted here so we can keep track of what systems are affected and have been tested on? Thanks.
But of course:
dominik@dell ~ % uname -a
Linux dell 5.6.6-arch1-1 #1 SMP PREEMPT Tue, 21 Apr 2020 10:35:16 +0000 x86_64 GNU/Linux
(I believe I still haven't rebooted after updating stuff, so I should probably do that...)
It has just frozen again, with File System/Syncthing for synchronization. I tried the Web Clipper toggle flick and it worked well to restore everything to normal, as usual.
Also, did the syncing freeze on a setting of 1 for you? @StanczakDominik
Yes, still keeping it on max connections = 1. Web clipper extension active, firefox turned on.
Firefox isn't actually required here, but glad to get that information. :smile_cat:
I have the same issues and a max connection of 20 solves it for me for now. I'll test and see how it turns out.
$ uname -a
Linux dellicious 5.6.6-942.native #1 SMP Tue Apr 21 03:03:21 PDT 2020 x86_64 GNU/Linux
I have the same issues and a max connection of 20 solves it for me for now. I'll test and see how it turns out.
$ uname -a Linux dellicious 5.6.6-942.native #1 SMP Tue Apr 21 03:03:21 PDT 2020 x86_64 GNU/Linux
Glad to hear. What distribution are you on so I can add it to the affected systems in original post?
Also, setting it to 20 drastically slowed down the bug appearing for me but it still appeared after something like 900 items synced,
What distribution are you on
Also, setting it to 20 drastically slowed down the bug appearing for me but it still appeared after something like 900 items synced
That might be the reason, i usually don't have that much to sync at once. I only have about 180 items (Markdown files) at the moment.
Hey, all. Glad to see the activity on this issue (and even more that you're doing well).
I've tried to set the max connections to 20, but around the 10th sync the app froze. I used the Clipper workaround, synced again and this time it froze on the 3rd attempt. It's doing the same thing with max connections set to 1. I have 440 notes and 67 resources, most of which are images.
I'm running 5.6.4-arch1-1, so it's not an optimal test, but at the moment I'm running a thing and I'll be able to restart sometime tomorrow :) I'll update you when I try the newest kernel.
@m-angelov, glad youâre here. Canât wait. Also, how do you like the issue tracker upgrade?
Hi, I am also facing this bug on 5.6.5-arch3-1, I have not enabled syncing and yet still UI freezes.
After reading the issue, I have set Max Connections = 1, so I will report back how it goes and will check logs constantly.
Using a strace like before but checking on all Joplin processes, when this bug shows up, Epoll calls are frozen until doing the Webclipper fix.
Just to be clear the fix is to start, then stop the web clipper, is that right? Do you need to wait a bit before the moment you start and then stop it?
I'll add a dummy background server on Linux that will be started and stopped at regular intervals while sync is active, so just want to make sure I'll replicate the same start/stop sequence.
Just to be clear the fix is to start, then stop the web clipper, is that right? Do you need to wait a bit before the moment you start and then stop it?
This is correct. And no time needed to wait. Just enable and disable before returning back to syncing.
It seems that things are moving forward, but I'll still give an update after upgrading everything. Kernel is 5.6.6-arch1-1, Joplin's version is 1.0.199 AppImage. Syncing with a local dir, MaxConnections set as 1. The issue is still manifesting.
When syncing there are no EPOLL events in strace, no matter if the sync is successful, or the issue is present. The WebServer gives this output:
### when started
epoll_ctl(3, EPOLL_CTL_ADD, 67, {EPOLLOUT, {u32=67, u64=67}}) = 0
epoll_ctl(3, EPOLL_CTL_DEL, 67, 0x7fff1844b680) = 0
epoll_ctl(3, EPOLL_CTL_ADD, 67, {EPOLLIN, {u32=67, u64=67}}) = 0
### when stopped
epoll_ctl(3, EPOLL_CTL_DEL, 67, 0x7fff1844c960) = 0
Some things, which unfortunately can't replicate, but can describe:
5.6.4 and Joplin 1.0.193 when syncing and things were going as normal, I was getting EPOLL events in strace. But when the bug occurred, there were no EPOLL events, maybe it's stuck right before hitting that point, or while trying to pass it? When I did the webserver thing, it threw the four lines above, and continued with the EPOLLs for the sync.@bedwardly-down The tracker looks great, it's very well organized and clear. Thank you for investing so much time in this!
@laurent22 Thank you for turning your attention to this issue, which while not critical is very frustrating.
Stay safe!
After some days of testing i have to say, to set the maximum connections to 20 isn't the solution. I encountered serval random sync/save problems. Sometimes everything seems alright, but then the GUI doesn't refresh anymore and changes are not saved. Slowly, the initial "Well ok, then I reload the WebClipper." changes into real frustration. To start and stop the WebClipper doesn't always solve the problem, sometimes i have to close and reopen Joplin. I use Joplin as my main note taking tool and to write How-Tos and other stuff, but for the last weeks it is pain to use Joplin, because you can't be sure that your written stuff will be saved or if it will get lost.
I don't know how, but let me know if i can help.
After some days of testing i have to say, to set the maximum connections to 20 isn't the solution. I encountered serval random sync/save problems. Sometimes everything seems alright, but then the GUI doesn't refresh anymore and changes are not saved. Slowly, the initial "Well ok, then I reload the WebClipper." changes into real frustration. To start and stop the WebClipper doesn't always solve the problem, sometimes i have to close and reopen Joplin. I use Joplin as my main note taking tool and to write How-Tos and other stuff, but for the last weeks it is pain to use Joplin, because you can't be sure that your written stuff will be saved or if it will get lost.
I don't know how, but let me know if i can help.
My situation is same as yours, max connections doesn't do anything, and WebClipper is always solution for me - after GUI freezes and I can't write, save or navigate through Joplin's notes.
Yeah, that's how it's gone for me as well. Max connections don't really help.
Perhaps another impactful factor might be that I'm using file system sharing with Syncthing?
Perhaps another impactful factor might be that I'm using file system sharing with Syncthing?
Me too. I have _Synchronization interval_ disabled and sync/save only with CTRL + s.
It happens to me on Nextcloud sync, so syncthing isnât really a factor.
With placeholders and testing I found that the syncing is hanging in this section of synchronyzer.js:
const listResult = await this.api().delta('', {
context: context,
// allItemIdsHandler() provides a way for drivers that don't have a delta API to
// still provide delta functionality by comparing the items they have to the items
// the client has. Very inefficient but that's the only possible workaround.
// It's a function so that it is only called if the driver needs these IDs. For
// drivers with a delta functionality it's a noop.
allItemIdsHandler: async () => {
return BaseItem.syncedItemIds(syncTargetId);
},
wipeOutFailSafe: Setting.value('sync.wipeOutFailSafe'),
logger: this.logger(),
});
Also MaxConnections = 1 was not helpful at all. I'm syncing to folder with autosync disabled and I'm consistently reproducing the issue by manually syncing.
I use filesystem sync and I can confirm I have the same issue. Max connections does not help:
Linux HP 5.6.7-arch1-1 #1 SMP PREEMPT Thu, 23 Apr 2020 09:13:56 +0000 x86_64 GNU/Linux
@danielsouzat , your findings match the findings of a couple other users in the old report. Are those comments your own or whatâs in the code?
@bedwardly-down what's in the code.
Somewhat related is issue #2191.
https://github.com/laurent22/joplin/issues/2191
A workaround would be to disable periodic sync but the app will sync every time the focus change to a different notebook or note regardless of the chosen setting.
The app is unusable right now in a production environment. Due to WireGuard merge in Linux 5.6, there's a big appeal to not stay in Linux 5.4.
Wireguard could definitely be something that would go hand in hand with Joplin. I just perused its website and i can see the appeal. https://www.wireguard.com/
As this is definitely related to a unknown upstream issue I updated almost all dependencies to their latest versions. Had to fix a couple of issues but I got it all working or at least I got rid of all errors and warnings. That alone did not solve this issue but made it more difficult to trigger. Later I found a workaround for the issue and I'm not reproducing it anymore in my development environment.
I commented line 331 of synchronizer.js:
await this.checkSyncTargetVersion_();
_Linux 5.6.8-1-MANJARO SMP PREEMPT GNU/Linux_
Could someone confirm this fix? Does it work 100% when it's applied?
As mentioned in the PR we can't remove this line as it will be needed when the sync target structure gets upgraded, but maybe we can fix this some other way.
Iâm heading to work but could try it out when i get home this evening. Also, @danielsouzat, have you attempted to see if maybe thereâs a library out there that could be implemented only for the Linux client that might have this fixed?
Since this is a Linux only issue, thereâs no point in taking a chance with breaking other platforms; your fix would break mobile along with Windows and Mac. Thoughts on this as an option, @laurent22 ?
@laurent22 worked 100% of the while exhaustively syncing to folder and navigating through notes and notebooks. Still would be great if we knew it also works 100% for other sync drivers.
A better thing would be to upgrade the code but that may take a while as we don't know why it isn't working. I may look into it at another time. The version.txt value can be stored at memory at startup or when sync target changes. All that IO can be avoided.
@bedwardly-down I limited the workaround for linux platform with an if statement. And can limit it further with 'uname -r' so it will only target linux > 5.5.
@danielsouzat , I really should have checked the actual commit. I didnât read the changes made just the comments. Again, maybe thereâs a way to work around it with another library that only loads when Linux is the platform and kind of acts as a bandaid
Same issue here on:
5.6.8-1-MANJARO
Debain 10 Testing
Joplin is unusable on both. After running Joplin, first few syncs work just fine, but on a 4th or 5th sync it is stuck in constant sync and shows that it is trying to sync one note. You can still navigate notes but can't edit them, This issue is happening for last 2-3 releases.
Web Clipper is disabled. Full reinstall (with cache and old files cleaning) does not help.
As this is definitely related to a unknown upstream issue I updated almost all dependencies to their latest versions. Had to fix a couple of issues but I got it all working or at least I got rid of all errors and warnings. That alone did not solve this issue but made it more difficult to trigger. Later I found a workaround for the issue and I'm not reproducing it anymore in my development environment.
I commented line 331 of _synchronizer.js_:
await this.checkSyncTargetVersion_();_Linux 5.6.8-1-MANJARO SMP PREEMPT GNU/Linux_
The workaround doesn't work for me (Archlinux, fully upgraded and clean compilation with patch).
Commenting out this line doesn't work for me as well.
Kernel 5.6.8-arch1-1, Joplin 1.0.204
Some (probably) unrelated things:
1.0.199) I wasn't able to do a JEX format export. Now it's working.On the 1.0.199 JEX issue, I believe that was a known bug across all desktop platforms.
I haven't tested the workaround listed by @danielsouzat yet but it sounds like it's not viable either.
I can verify it doesn't work with Nextcloud sync. In fact, the app doesn't sync at all or attempt to do anything useful on my end.
In my Linux laptop which I was testing the workaround, I have filesystem syncronization, but directory is in a Nextcloud folder. Is this related?
I looked over my notes from the previous tests, and a good place to check is: _https://github.com/laurent22/joplin/issues/2518#issuecomment-590805717_ and the following comments by me and others.
I commented out the whole DELTA part in the synchronizer.js, as this was solving the issue last time. I did ~40 syncs successfully, and then about 30 minutes later while editing a note, it got stuck. So I'm at a loss.
/rant - I started looking into Org-mode, the desperation is setting in :D
@m-angelov in my case sync works a few times, and then stuck, as you said. I think the problem is in another part.
Could someone confirm danielsouzat's fix? https://github.com/laurent22/joplin/pull/3165/files It would be good to confirm if this is really the culprit as that could help working towards a fix.
Trying out danielsouzat's fix on Arch Linux 5.6.10-arch1-1 #1 SMP PREEMPT Sat, 02 May 2020 19:11:54 +0000 x86_64 GNU/Linux. Will report back.
Testing on Arch 5.6.8-arch-1-1 also.
@laurent22 it seems that the fix is not working for me, as well as for figue and bedwardly-down.
@figue commenting out the stuff in DELTA worked for syncing when we were last looking into this, but I haven't tested it in detail now. But this is a rather crude way to do it, because it actually was pinpointed to fs-driver-node.js, see here. My tests also led to the function "fs.stat", which comes from node.js. And a disclaimer - I'm not 100% positive that this is the source of the issue, just pointing to our previous findings.
Can i block pyro209 from posting on here? Itâs a bot that keeps posting others personal info or spam here. I donât see an option for it. Just delete and report.
@laurent22, Iâm taking the morning off from work so Iâll update testing section in main post with our findings.
EDIT: a small handful of links to tested comments have been added to the Testing Section.
Did you report him? I don't think it's possible to block per project, but normally GitHub is quick to look at spammers.
Confirming that commenting out await this.checkSyncTargetVersion_(); does not solve the problem on Arch Linux 5.6.10-arch1-1 with Joplin .Joplin 1.0.201
I can also confirm this affects me, however when I disable syncing it doesn't occur. I am on Void Linux.
Edit:
$ uname -a
Linux void 5.6.7_1 #1 SMP Thu Apr 23 17:59:10 UTC 2020 x86_64 GNU/Linux
I can also confirm this affects me, however when I disable syncing it doesn't occur. I am on Void Linux.
Could you share what kernel you're running with uname -a? I want to keep track of what distros and kernel releases are having issues for new people to see in original post. That way we hopefully don't have a massive flood of reports for every distro under the sun.
Can we use an old version as workaround?
@figue , are you meaning version of Joplin? If youâre using encryption, older versions wonât decrypt due to a new encryption algorithm. Also, this issue has been around since at least 1.0.189.
@bedwardly-down yes, I mean old Joplin version... That's unfortunate. I have to close/open all the time Joplin to sync fine. So if in the meantime I decrypt all should be ok?
@figue, I'm not 100% sure, especially if you have the mobile apps. I believe that the latest app isn't compatible with older Desktop Joplin versions in other areas, but I could be wrong. If you would like to still test it out and make sure you have a full backup of everything important (since there is always the possibility you could lose some data in the process), feel free to do it. Testing that could be useful for figuring out Electron versions affected.
I have an idea for another test I could run myself. :D
EDIT: Because Joplin is built around Electron, swapping it to something like nwjs (replaced webpack) is actually not a straight process. I was thinking that if the desktop's backend could be changed out, that might be useful. We'd then know if Electron really is a factor in the equation.
Hello. I came across this problem on Debian Testing. Currently it runs 5.6.0-1-amd64 #1 SMP Debian 5.6.7-1, but the problem began from 5.5 kernel. U use the latest version of Joplint installed via AppImage.
In my setup I use filesystem synchronization with syncthing taking care of distributing that folder to other machines.
The problem was mostly solved by disabling automatic synchronization. But from time to time it still freezes when I try to manually synchronize the notes.
After some time (I don't know exactly how many, could be hours), Joplin freeze again without encryption.
After some time (I don't know exactly how many, could be hours), Joplin freeze again without encryption.
Is that on an older version of Joplin or just with auto sync turned off (which actually isnât turning it off). I need to find the post but outside of fully disconnecting from the internet, thereâs not any way to have Joplin not sync at all.
After some time (I don't know exactly how many, could be hours), Joplin freeze again without encryption.
Is that on an older version if Joplin or just with auto sync turned off (which actually isnât turning it off). I need to find the post but outside of fully disconnecting from the internet, thereâs not any way to have Joplin not sync at all.
Sorry. Joplin 1.0.201, Archlinux. Same version and package as before. just decrypted my data. Auto-sync is on.
Added to tests section. Thanks, @figue .
Hi, I'm having same problems. After some time the UI becomes unresponsive. Can't tell if sync triggers it. Seems random to me.
Using Joplin 1.0.201 on Linux 5.6.11-arch1-1. Syncing to filesystem.
Joplin 1.0.201, Archlinux. With sync disabled, freeze still happens at some point, leaving Joplin minimized.
Also, this issue has been around since at least 1.0.189.
I first noticed the bug in the 179 or 178 on arch. Could be useful if we have a "start date" as well.
I am now running Fedora 32 Kernel 5.6.7 and still having the same problem I had on Arch. God bless your souls for troubleshooting this bad boy.
@matcharles we have a rough idea thanks to @taw00, the Fedora rpm build maintainer. Let me check some things out and Iâll post my findings in the main notes.
@hexclover and @m-angelov , since you guys have been a part of this whole thing for a good while and have been incremental to finding out what's going on here, @taw00 on the forums along with another user on the original issue brought up something we haven't focused on but really should:
Is this issue actually a part of Electron or somewhere else that needs to be addressed upstream?
I'm not sure how to fully test that out, which is a huge part of why I haven't steered in that direction yet. If you guys have any ideas about where to go with it and how to go about it, feel free to pitch in. If we can figure out exactly where it is upstream outside of the kernel and can test and prove that it's there, we can try to get it fixed there too. Joplin is not a standard app using implementations that are common enough, so I think it can be safe to assume that the issue isn't fixed upstream yet due to it just not being seen enough in the wild.
Confirming that the freeze happens with sync disabled on Arch linux 5.6.11-arch1-1 with Joplin 1.0.201
Upon a suggestion on the forums, I've built 5.7-rc5 kernel and am getting better results. The problem is still there but the app is not freezing like it was. Instead, it's just hanging on syncing for a bit longer than was previously happening. Can anyone else verify this?
EDIT: after about 4 or 5 syncs of 512 notes with full screen screenshots from my laptop attached to them, 5.7 did fully show this bug with no way to cancel but full ui functionality otherwise. It's been stuck on Synchronize for the last hour or so, so if someone else can test this out, please post your findings.
5.5 is currently on its End of Line release and will be getting fully replaced across the board with 5.6 from this point on according to how the kernel maintainers have handled releases for awhile now.
I had a freeze now while I was editing. As I launched Joplin a few hours ago with strace I can confirm EPOLL events posted before. When freeze happen no EPOLL events are shown, but I you do the workaround to start and stop web clipper, I see this:
[9199:0513/160807.897630:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command
[pid 9204] epoll_ctl(3, EPOLL_CTL_ADD, 50, {EPOLLOUT, {u32=50, u64=50}}) = 0
[pid 9204] epoll_ctl(3, EPOLL_CTL_DEL, 50, 0x7ffc10c168e0) = 0
[pid 9204] epoll_ctl(3, EPOLL_CTL_ADD, 50, {EPOLLIN, {u32=50, u64=50}}) = 0
[pid 9204] epoll_ctl(3, EPOLL_CTL_DEL, 50, 0x7ffc10c178b0) = 0
Excuse me if this doesn't help at all...
@figue , any new information there outside of whatâs already been established here?
No. I can use any of these kernels if we need to discard something in the kernel side:
5.4.39
5.6.12 <-- running this now
4.19.102
Just tell me which I have to use. Thanks!
@figue , youâre fine. If thereâs any way you can test 5.7 out, thatâd be useful. I want to make sure my test isnât a fluke. Thatâs why Iâm trying to get second verifications for tests ran
i have a repo with 5.7rc5. Will install now.
Hey, unfortunately I'm not able to compile 5.7, so I'll have to wait for it to drop on Arch.
Regarding testing where the bug might be (upstream or else), I'm also unable to help much, as I'm not really a developer, and along that I'm not very up to speed with JS, Electron, etc. If there are specific things I can do - tests, bug reproducing, etc - I would gladly help.
I now have an interesting editor bug, tomorrow I'll post a video of it.
@figue , youâre fine. If thereâs any way you can test 5.7 out, thatâd be useful. I want to make sure my test isnât a fluke. Thatâs why Iâm trying to get second verifications for tests ran
Running since 2 days 5.7.0-rc5-1-mainline and Joplin seems to be working, no freeze at all...
figue@pluto ~ % grep joplin =(ps aux)
figue 2404 0.0 0.0 10332 2860 ? S may14 0:00 /bin/bash /usr/bin/joplin-desktop
figue 2405 0.1 1.3 592716 160276 ? Sl may14 1:50 /usr/share/joplin/joplin
figue 2408 0.0 0.3 189844 40596 ? S may14 0:00 /usr/share/joplin/joplin --type=zygote
figue 2410 0.0 0.0 189844 7044 ? S may14 0:00 /usr/share/joplin/joplin --type=zygote
figue 2433 0.0 0.8 326820 99192 ? Sl may14 0:18 /usr/share/joplin/joplin --type=gpu-process --field-trial-handle=14674961718266204706,18361054305212390926,131072 --disable-features=SpareRendererForSitePerProcess --gpu-preferences=KAAAAAAAAAAgAAAgAAAAAAAAYAAAAAAAEAAAAAAAAAAAAAAAAAAAAAgAAAAAAAAA --service-request-channel-token=16053664216136761665
figue 2446 0.0 0.4 240172 47980 ? Sl may14 0:00 /usr/share/joplin/joplin --type=utility --field-trial-handle=14674961718266204706,18361054305212390926,131072 --disable-features=SpareRendererForSitePerProcess --lang=es --service-sandbox-type=network --service-request-channel-token=13132157487631428044 --shared-files=v8_context_snapshot_data:100,v8_natives_data:101
figue 2454 0.1 1.8 631096 216452 ? Sl may14 2:50 /usr/share/joplin/joplin --type=renderer --field-trial-handle=14674961718266204706,18361054305212390926,131072 --disable-features=SpareRendererForSitePerProcess --lang=es --app-path=/usr/share/joplin/resources/app.asar --node-integration --no-sandbox --no-zygote --background-color=#fff --num-raster-threads=2 --enable-main-frame-before-activation --service-request-channel-token=4754917868446620008 --renderer-client-id=5 --no-v8-untrusted-code-mitigations --shared-files=v8_context_snapshot_data:100,v8_natives_data:101
@figue, thanks for the test. I'm not sure exactly why I was getting the results I was. I also built my kernel from source using the default config for my system and synced a large amount of notes all at one time over multiple manual syncs.
EDIT: One thing that does concern me is that 5.7-rc5 still has the bad commit in it and it doesn't look like it was reverted yet. Even though you and one other user stated that it seemed to fix the issue for you, I'd like to still get quite a few more test results to make sure.
Anyone else running Wayland instead of X11 here as their main rendering stack provider? I just discovered that Joplin wonât run without the xwayland x11 compatibly package installed. Iâm starting to use the Brave browser again since syncing can be enabled and used from a testing menu now. Brave freezes the same way at times due to how Xwayland writes debug info apparently. I wasnât using Wayland as my main one until about two weeks ago, so my current test could be bunk because of that.
EDIT: I just tested 5.7-rc5 on X11 without Wayland and the bug showed up in full on the first sync. I'm not sure what made it work for at least two users but it's not fixed in 5.7 for me. I'd like to see more tests on both Wayland and X11 to make sure, but I honestly think that the 5.7 success results could be from pure optimism or other factors that don't apply to all of us. I want this issue resolved as much as you all do.
Anyone that gets positive results, please share as much detail as possible about your system configurations. That means share what Desktop Environment you're using, drivers, etc. The more information we know, the better we can determine what will resolve this issue. Thanks.
@bedwardly-down @figue I just took a look at the changelog of 5.6.13 and found two commits (both from upstream=5.7) which claim to fix lost wakeups introduced by the bad commit (fs/epoll: remove unnecessary wakeups of nested epoll).
So it's possible that the problem is already(!) addressed in 5.6.13. I think we really should give it a shot.
@bedwardly-down @figue I just took a look at the changelog of 5.6.13 and found two commits (both from upstream=5.7) which claim to fix lost wakeups introduced by the bad commit (
fs/epoll: remove unnecessary wakeups of nested epoll).So it's possible that the problem is already(!) addressed in 5.6.13. I think we really should give it a shot.
That would be the patch that was proposed by the maintainers of the commit that caused all of this headache to begin with. I donât remember who exactly brought it up in the original issue tracker but at the time, the patch didnât fix the issue when i had tested it out. Now that itâs had some time to be tested internally, itâs possible that it may indeed resolve the issue if the main reason it didnât resolve it originally was something else that had to be patched in.
@hexclover, I tested 5.6.13 out and the UI doesn't freeze anymore on it but Syncing is still broken. I seem to remember it being the same results I had from my patching test last time. https://youtu.be/KoovSUEeB4A
Report on my side: Have been running 5.6.13 for a couple of hours (only normal workload, not stress testing), and just now I tried duplicating a note for 1000+ times, syncing and then deleting all of them and syncing again. No problem so far.
@bedwardly-down I feel a bit weird about the part that the UI does not freeze but sync freezes. Remember we've found out that (possibly) most filesystem-related operations are frozen when the bug occurs? But as you are able to open other notes in the test, I suppose Joplin is still able to read the content of the notes from the disk.
I think, when the problem occurs, if you are able to
and Joplin remembers the changes you've just made, then maybe it's not this specific issue that is causing Joplin to freeze in your test.
It does seem like the issue I'm running into with the current test is something else. Upon restarting and syncing Joplin, I ended up getting this error in the dev console with debug on:
/tmp/.mount_Joplin1FâŠ/useFormNote.js:134 Uncaught (in promise) Error: Cannot find note with ID: c14eaa02e3154fd8a1cb600863479d91
at /tmp/.mount_Joplin1FâŠ/useFormNote.js:134
at Generator.next (<anonymous>)
at fulfilled (/tmp/.mount_Joplin1FâŠls/useFormNote.js:5)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:85)
/tmp/.mount_Joplin1FâŠessageHandler.js:29 Got ipc-message: noteRenderComplete
[undefined]
/tmp/.mount_Joplin1FâŠ/useFormNote.js:134 Uncaught (in promise) Error: Cannot find note with ID: 6b91a2f1ec6747a58cbe0c1901b221ef
at /tmp/.mount_Joplin1FâŠ/useFormNote.js:134
at Generator.next (<anonymous>)
at fulfilled (/tmp/.mount_Joplin1FâŠls/useFormNote.js:5)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:85)
/tmp/.mount_Joplin1FâŠessageHandler.js:29 Got ipc-message: noteRenderComplete
[undefined]
It is possible that it was trying to sync notes that didn't exist or weren't fully synced to begin with. I've deleted my test notes and have fully synced my devices, so plan on restarting the test anew.
@hexclover, upon restarting my computer and deleting all of my test notes, this particular issue isn't showing up for me now. I'm just getting long syncs with Cancelling not working but it does seem to complete the sync and does eventually finish. We need more people to verify that it works and more information.
Neither Wayland nor X11 seem to have an effect for me like it seemed they might have this time around, so that may be a factor in a future issue but not this current bug. If enough users say this works for them, it may be good enough.
I'm just getting long syncs with Cancelling not working but it does seem to complete the sync and does eventually finish. We need more people to verify that it works and more information.
This looks like #3004 so the issue with sync operation not progressing seems to be not fixed, kernel upgrade seems only fixes the UI freeze.
@ebayer, thanks for sharing that. I do agree that my results could be that and am currently testing out filesytem sync with my Nextcloud instance being mounted with davfs.
@hexclover and @figue, I have a solution for my results here. If I can get you two along with ebayer to verify for me, I think I can safely say that both issues are resolved. :smile_cat:
Have noticed a freezing issue in the past. Currently running Kernel: 5.6.11-1-MANJARO, with Joplin 1.0.201 (prod, linux). Over the last couple of days all seems to be good đ
Thank you guys for helping me figure out this issue. I'm still not having any issues whatsoever with Joplin, so I think 5.6.13 fixed this particular issue. With that said, I've decided to back out of Linux testing for the time being and go back to Windows. I have real world career goals outside of this that require it and just want to focus on them, especially since the whole Covid thing could still cause me to lose my livelihood, despite things somewhat going back to normal. I hope you guys understand. :smile_cat:
I also have not encountered the freezing issue with 5.6.13-arch1-1!
Thank you guys for helping me figure out this issue. I'm still not having any issues whatsoever with Joplin, so I think 5.6.13 fixed this particular issue. With that said, I've decided to back out of Linux testing for the time being and go back to Windows. I have real world career goals outside of this that require it and just want to focus on them, especially since the whole Covid thing could still cause me to lose my livelihood, despite things somewhat going back to normal. I hope you guys understand. smile_cat
Definitely! Take care of yourself, focus on what you need to! Thank you for all your persistence in leading the effort against this issue.
it's ok @bedwardly-down
Finally if the issue will be fixed in kernel, is a matter of time that everybody has the fix.
Kernel 5.6.13 seems to fix it for me as well!
@bedwardly-down thank you for the time and effort you've invested in this! I know that it being fixed by itself may be a bummer, but I learned a lot from this bug hunt, so no effort was wasted in my book. Thank you once again, stay safe and excelsior!
Thanks guys. There wasnât anything lost here for sure. One thing i learned was Joplin does have a good ways to go before being a permanent solution, at least for Linux users. And, itâs not the appâs fault but instead really par for the course for the entire platform.
Kernel
5.6.13seems to fix it for me as well!
Yup. Two days now and this issue seems to have disappeared (Fedora 30, 31, and 32, at the very least). Precisely what commit / github issue solved this would be nice to know, but, well, it is gone. Maybe close this?
Thanks everyone for their contributions and thanks in particular to @bedwardly-down for chasing this down and unifying all the disparate issues.
Let's close this so that we can hope it is put to bed forever. :)
Problem seems solved for me too, Fedora 32.
Me too, solved running 5.6.14-xanmod1
@bedwardly-down @figue I just took a look at the changelog of 5.6.13 and found two commits (both from upstream=5.7) which claim to fix lost wakeups introduced by the bad commit (
fs/epoll: remove unnecessary wakeups of nested epoll).So it's possible that the problem is already(!) addressed in 5.6.13. I think we really should give it a shot.
@taw00 check here. After rereading that post, the original patch only partially fixed the issue when I tested it for the original bug report, which is why it didnât solve the issue when I patched it in but the second commit with it must have fixed the issues caused by the original.
If i close this, will users that need this info still be able to see it? I think what may be safer is to leave it open a little while longer but maybe shut down comments for the time being.
Closing now that itâs been almost 2 weeks since this was resolved. If anyone starts having this issue again, it can be reopened. Just @ me on the forums. Thanks all.
Most helpful comment
I also have not encountered the freezing issue with
5.6.13-arch1-1!Definitely! Take care of yourself, focus on what you need to! Thank you for all your persistence in leading the effort against this issue.