https://github.com/signalapp/Signal-Desktop/issues/1990
When starting Signal-Desktop, it takes a very long time to sync mesages with the Android App side (that is started all the time) (like 10 minutes or so).
Actual Result:
Expected Result:
No need for this. it is just loading message screen
Signal Version: 1.22.0
Operating System: Linux Ubuntu 18.04.1 LTS
Linked Device Version: Android version 8.0.0
https://debuglogs.org/26e1ab0f1c1ffea70c533b4e9fd5255d9a768c64f48e887ec85a7db0b6de988a
https://debuglogs.org/991da447629cf63321b0235bf71149b85e3822a123d61cfa11287475c24c310a
I took a look at your log and found that processing 10 messages could take up to 30s. In my testing, primarily on macOS, that should take under a second. Digging into one example, I do see three pretty large blank spots in the logging, adding up to 25.5s total - first, I'm not sure where those delays are coming from, and second, that still doesn't get us to our target of <1s per 10 messages.
So, where do we go from here? I think I need to do some linux-specific testing, because the performance characteristics are surprisingly different.
Also, if you could provide your detailed machine specs, that would help us set a baseline between different linux machines.
INFO 2019-02-22T04:18:50.348Z onProgress: Message count is 100
INFO 2019-02-22T04:18:50.348Z received unidentified sender message
INFO 2019-02-22T04:18:50.349Z Worker job 713 (arrayBufferToStringBase64) succeeded in 1ms
INFO 2019-02-22T04:18:50.655Z SQL channel job 1942 (removeUnprocessed) succeeded in 309ms
INFO 2019-02-22T04:18:50.655Z SQL channel job 1943 (getMessagesBySentAt) succeeded in 309ms
INFO 2019-02-22T04:18:50.656Z No message for delivery receipt +[REDACTED]461 1550807631834
INFO 2019-02-22T04:18:50.656Z SQL channel job 1945 (updateUnprocessedWithData) succeeded in 307ms
INFO 2019-02-22T04:18:50.656Z SQL channel job 1944 (getItemById) succeeded in 308ms
INFO 2019-02-22T04:18:51.097Z SQL channel job 1947 (createOrUpdateSession) succeeded in 389ms
INFO 2019-02-22T04:18:51.100Z received unidentified sender message
INFO 2019-02-22T04:18:51.101Z Worker job 714 (arrayBufferToStringBase64) succeeded in 0ms
INFO 2019-02-22T04:18:51.723Z Worker job 715 (arrayBufferToStringBase64) succeeded in 1ms
>>> delay of 11s. you can see it in the SQL job delays below this point
INFO 2019-02-22T04:19:02.957Z SQL channel job 1949 (removeUnprocessed) succeeded in 10583ms
INFO 2019-02-22T04:19:02.958Z SQL channel job 1950 (getItemById) succeeded in 10580ms
INFO 2019-02-22T04:19:02.959Z Worker job 716 (arrayBufferToStringBase64) succeeded in 0ms
INFO 2019-02-22T04:19:02.959Z SQL channel job 1951 (updateUnprocessedWithData) succeeded in 11857ms
>>> delay of 1.5s
INFO 2019-02-22T04:19:04.453Z SQL channel job 1952 (saveUnprocessed) succeeded in 11237ms
INFO 2019-02-22T04:19:04.454Z queueing envelope 19a3dbb3-2609-4741-8267-1a17e726ed7b
INFO 2019-02-22T04:19:04.454Z SQL channel job 1953 (getItemById) succeeded in 1245ms
INFO 2019-02-22T04:19:04.457Z Worker job 717 (arrayBufferToStringBase64) succeeded in 0ms
INFO 2019-02-22T04:19:04.458Z SQL channel job 1954 (saveUnprocessed) succeeded in 2723ms
INFO 2019-02-22T04:19:04.459Z queueing envelope b3d8a6c4-1759-4d7e-805b-278ef54294d9
INFO 2019-02-22T04:19:05.460Z SQL channel job 1955 (createOrUpdateSession) succeeded in 2076ms
INFO 2019-02-22T04:19:05.461Z message from +[REDACTED]818.1 1550808882249 (05444234-1ef9-445f-9e2f-4dae356b5822)
INFO 2019-02-22T04:19:05.462Z SQL channel job 1956 (saveUnprocessed) succeeded in 1488ms
INFO 2019-02-22T04:19:05.462Z queueing envelope 54704fcc-1083-42c0-95ea-4c6998e2fbcd
>>> delay of 3s
INFO 2019-02-22T04:19:08.999Z SQL channel job 1958 (removeUnprocessed) succeeded in 3960ms
INFO 2019-02-22T04:19:08.999Z SQL channel job 1959 (getItemById) succeeded in 3946ms
INFO 2019-02-22T04:19:08.999Z SQL channel job 1957 (getUnprocessedById) succeeded in 3961ms
INFO 2019-02-22T04:19:09.000Z Worker job 718 (arrayBufferToStringBase64) succeeded in 0ms
INFO 2019-02-22T04:19:10.548Z SQL channel job 1961 (createOrUpdateSession) succeeded in 1548ms
INFO 2019-02-22T04:19:10.549Z got configuration sync message
INFO 2019-02-22T04:19:10.550Z received unidentified sender message
INFO 2019-02-22T04:19:10.550Z Worker job 719 (arrayBufferToStringBase64) succeeded in 0ms
>>> delay of 11s
INFO 2019-02-22T04:19:12.277Z SQL channel job 1963 (createOrUpdateItem) succeeded in 1727ms
INFO 2019-02-22T04:19:14.771Z SQL channel job 1967 (removeUnprocessed) succeeded in 2892ms
INFO 2019-02-22T04:19:14.772Z SQL channel job 1965 (createOrUpdateItem) succeeded in 4223ms
INFO 2019-02-22T04:19:14.773Z SQL channel job 1966 (createOrUpdateItem) succeeded in 4223ms
INFO 2019-02-22T04:19:14.773Z SQL channel job 1964 (createOrUpdateItem) succeeded in 4224ms
INFO 2019-02-22T04:19:14.773Z SQL channel job 1969 (updateUnprocessedWithData) succeeded in 4224ms
INFO 2019-02-22T04:19:14.773Z SQL channel job 1968 (getItemById) succeeded in 4224ms
INFO 2019-02-22T04:19:15.091Z SQL channel job 1971 (createOrUpdateSession) succeeded in 283ms
INFO 2019-02-22T04:19:15.092Z received unidentified sender message
INFO 2019-02-22T04:19:15.093Z Worker job 720 (arrayBufferToStringBase64) succeeded in 0ms
INFO 2019-02-22T04:19:15.272Z SQL channel job 1973 (removeUnprocessed) succeeded in 180ms
INFO 2019-02-22T04:19:15.272Z SQL channel job 1974 (getItemById) succeeded in 180ms
INFO 2019-02-22T04:19:15.272Z SQL channel job 1975 (updateUnprocessedWithData) succeeded in 180ms
INFO 2019-02-22T04:19:15.813Z SQL channel job 1977 (createOrUpdateSession) succeeded in 511ms
INFO 2019-02-22T04:19:15.814Z data message from +[REDACTED]883.1 1550808893299 (cac9b79d-e7f1-4fdc-880e-2d344096fa6b)
INFO 2019-02-22T04:19:15.815Z received unidentified sender message
INFO 2019-02-22T04:19:15.817Z Worker job 721 (arrayBufferToStringBase64) succeeded in 0ms
ERROR 2019-02-22T04:19:15.842Z background onError: MessageCounterError: Message key not found. The counter was repeated or the key was not filled.
at file://[REDACTED]/app.asar/js/libtextsecure.js:36416:25
ERROR 2019-02-22T04:19:15.843Z queueEnvelope error handling envelope 03964ec6-5984-4b3c-b4c1-3d0277af658e : MessageCounterError: Message key not found. The counter was repeated or the key was not filled.
at file://[REDACTED]/app.asar/js/libtextsecure.js:36416:25
INFO 2019-02-22T04:19:15.843Z received unidentified sender message
ERROR 2019-02-22T04:19:15.844Z Top-level unhandled promise rejection: MessageCounterError: Message key not found. The counter was repeated or the key was not filled.
INFO 2019-02-22T04:19:16.309Z SQL channel job 1982 (updateUnprocessedWithData) succeeded in 492ms
INFO 2019-02-22T04:19:17.339Z SQL channel job 1983 (createOrUpdateSession) succeeded in 1230ms
INFO 2019-02-22T04:19:17.340Z SQL channel job 1984 (removeUnprocessed) succeeded in 1497ms
INFO 2019-02-22T04:19:17.341Z SQL channel job 1985 (getItemById) succeeded in 1497ms
INFO 2019-02-22T04:19:17.341Z SQL channel job 1986 (getItemById) succeeded in 293ms
INFO 2019-02-22T04:19:17.382Z PUT https://textsecure-service.whispersystems.org/v1/messages/+[REDACTED]883 (unauth)
ERROR 2019-02-22T04:19:17.386Z background onError: MessageCounterError: Message key not found. The counter was repeated or the key was not filled.
at file://[REDACTED]/app.asar/js/libtextsecure.js:36416:25
ERROR 2019-02-22T04:19:17.386Z queueEnvelope error handling envelope 2c0e07a2-11ae-4dab-b46b-6b858bab8aae : MessageCounterError: Message key not found. The counter was repeated or the key was not filled.
at file://[REDACTED]/app.asar/js/libtextsecure.js:36416:25
INFO 2019-02-22T04:19:17.386Z received unidentified sender message
ERROR 2019-02-22T04:19:17.387Z Top-level unhandled promise rejection: MessageCounterError: Message key not found. The counter was repeated or the key was not filled.
INFO 2019-02-22T04:19:17.637Z SQL channel job 1987 (removeUnprocessed) succeeded in 251ms
INFO 2019-02-22T04:19:17.637Z SQL channel job 1988 (getItemById) succeeded in 251ms
INFO 2019-02-22T04:19:17.927Z PUT https://textsecure-service.whispersystems.org/v1/messages/+[REDACTED]883 200 Success
INFO 2019-02-22T04:19:18.302Z SQL channel job 1990 (createOrUpdateSession) succeeded in 632ms
INFO 2019-02-22T04:19:18.302Z delivery receipt from +[REDACTED]049.1 1550807631834
INFO 2019-02-22T04:19:18.303Z received unidentified sender message
INFO 2019-02-22T04:19:18.303Z SQL channel job 1991 (getMessageBySender) succeeded in 376ms
INFO 2019-02-22T04:19:18.307Z Update conversation 'expireTimer' {"id":"group([REDACTED]{.h)","expireTimer":86400,"source":"handleDataMessage"}
INFO 2019-02-22T04:19:18.315Z SQL channel job 1992 (getUnprocessedById) succeeded in 13ms
INFO 2019-02-22T04:19:18.317Z Worker job 722 (arrayBufferToStringBase64) succeeded in 2ms
INFO 2019-02-22T04:19:18.664Z SQL channel job 1993 (removeUnprocessed) succeeded in 238ms
INFO 2019-02-22T04:19:18.664Z SQL channel job 1994 (getMessagesBySentAt) succeeded in 238ms
INFO 2019-02-22T04:19:18.664Z No message for delivery receipt +[REDACTED]049 1550807631834
INFO 2019-02-22T04:19:18.665Z SQL channel job 1997 (updateUnprocessedWithData) succeeded in 348ms
INFO 2019-02-22T04:19:18.665Z SQL channel job 1996 (saveMessage) succeeded in 350ms
INFO 2019-02-22T04:19:18.666Z SQL channel job 1995 (getItemById) succeeded in 362ms
INFO 2019-02-22T04:19:18.667Z next message expires 2019-02-22T05:28:46.591Z
INFO 2019-02-22T04:19:18.920Z SQL channel job 2000 (updateConversation) succeeded in 252ms
INFO 2019-02-22T04:19:18.921Z SQL channel job 2001 (getItemById) succeeded in 229ms
INFO 2019-02-22T04:19:18.936Z Add notification {"conversationId":"group([REDACTED]{.h)","isExpiringMessage":true,"messageSentAt":1550808893299}
INFO 2019-02-22T04:19:18.936Z Update notifications: {"shouldClearNotifications":false,"shouldPlayNotificationSound":false,"shouldShowNotifications":false,"type":"disabled","isNotificationGroupingSupported":true}
INFO 2019-02-22T04:19:19.360Z SQL channel job 2004 (createOrUpdateSession) succeeded in 438ms
INFO 2019-02-22T04:19:19.360Z delivery receipt from +[REDACTED]883.1 1550807631834
INFO 2019-02-22T04:19:19.361Z received unidentified sender message
INFO 2019-02-22T04:19:19.879Z SQL channel job 2005 (removeUnprocessed) succeeded in 820ms
INFO 2019-02-22T04:19:20.349Z SQL channel job 2006 (createOrUpdateItem) succeeded in 572ms
INFO 2019-02-22T04:19:20.350Z SQL channel job 2009 (getMessagesBySentAt) succeeded in 520ms
INFO 2019-02-22T04:19:20.350Z No message for delivery receipt +[REDACTED]883 1550807631834
INFO 2019-02-22T04:19:20.351Z SQL channel job 2008 (removeUnprocessed) succeeded in 990ms
INFO 2019-02-22T04:19:20.351Z SQL channel job 2007 (getUnprocessedById) succeeded in 991ms
ERROR 2019-02-22T04:19:20.351Z updateCache: Didn't find item a454a543-3f36-40c4-8d71-0f0d4c54a0de in cache to update
INFO 2019-02-22T04:19:20.351Z SQL channel job 2010 (getItemById) succeeded in 990ms
INFO 2019-02-22T04:19:20.810Z SQL channel job 2012 (createOrUpdateSession) succeeded in 433ms
INFO 2019-02-22T04:19:20.811Z delivery receipt from +[REDACTED]461.2 1550807631834
INFO 2019-02-22T04:19:20.812Z onProgress: Message count is 110
Linux 4.15.0-45-generic
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (family: 0x6, model: 0x3a, stepping: 0x9)
16MB RAM
Please note that all my colleagues with different OSes ( ARCH , DEBIAN, Ubuntu) have the same problems with the Signal-Desktop, that is more than at least 6 people that I know of and they are all complaining about the same thing. So I very much doubt that the problem is due to my setup or my particular machine. Most people will just try the Signal-Desktop and dismiss it and never use it ever again without reporting any issues, so that is why you may be under the impression that only few people are saying something about this issue.
After running five tests on three machines I believe the problem is I/O related.
All machines are running 64 bit Xubuntu 18.04.
All machines got their internet connection over Ethernet, all have a 1Gbps interface, the download speed I get from my ISP is about 200Mb/s.
| Test | Machine | TTL | TTLM | TTMS | Message count | CPU | RAM | Notes | Log |
|------|---------|------|---------------------------------------|-------|---------------|-----|-------|------------------------------|----------------------------------------------------------------------------------------|
| 1 | 1 | 0:01 | 3:10 | 20:00 | 1030 | 10% | 705MB | Constant freezing, warm boot | https://debuglogs.org/99e941c3fbedcc8b87cf08b7900774d0206427b935831f823016a0fee3356558 |
| 2 | 2 | 0:06 | 0:06, 120 messages already downloaded | 0:50 | 1030 | 50% | 439MB | Cold boot | https://debuglogs.org/1a3f1327d432d7796c8e5925b29e868190b5d331f6a08248e45b6c194a501300 |
| 3 | 3 | 0:10 | 3:30 | 19:40 | 1030 | 20% | 560MB | Constant freezing, warm boot | https://debuglogs.org/99bc9a57ac9f6ac6ee8ab17e72f1eee65e20416eddf78f2322da42f0801b0273 |
| 4 | 3* | 0:01 | 0:05 | 1:08 | 1030 | N/A | N/A | Warm boot | N/A |
| 5 | 3* | 0:01 | 0:01 | 1:00 | 1010 | 75% | 675MB | Warm boot | https://debuglogs.org/ba46fc1c6262880f447dbb4eb57791bed184dbecc6d83565223e100fb7954397 |
Legend:
I tested running in RAM twice because I forgot to collect the logs the first time.
The most apples-to-apples comparison is the ten year old Toshiba running on a mechanical drive vs running in RAM.
Have the same problem on KDE Neon 5.15 from a long time. Recently updated signal to 1.22.0. In recent changes saw "Performance improvements help make Signal Desktop launch faster than ever". Thought that it is related to this issue. Indeed Signal is restarting immedietaly, but only when it is synchronized with android app. When I close desktop app and write several dozen messages with my friends then it again take few minutes to start desktop app
@dibok (and others who comment in the future) Your comments are not useful without debug logs. Please provide your debug logs and machine specs so we can get detailed information on the performance issues you're having.
@m-burg Wow, thanks for those tests! Looks like you only get reasonable performance with an SSD or RAM disk. Is that three minutes just to get to the 'Loading N' screen on machines 1 and 3? Wow.
When I don't have to sync a thousand messages it is not quite as bad, but it still stays frozen on "Loading..." for a while.
Does your mac have an SSD? I expect that you would get similar performance to machines 1 and 3 if you run your tests with Signal's data directory on an external mechanical drive.
@scottnonnenberg-signal Sorry for not having debug logs. But I have an Lenovo Ideapad Z370 with an Core-i5-2410M, 4GB DDR3 RAM and I used to have a spinning HDD with 5400 RPM and the Signal Desktop was barely useable. I changed the hard drive to an Samsung 860 EVO at the end of November and everything is just perfect since then.
I had never issues with my MacBook 12' 2016 and the Signal Desktop and would say it was due to the SSD.
This looks just like the log I posted in #3172, so this is probably what I'm experiencing as well
@scottnonnenberg-signal I can't give you more information than what I gave you before.
All I can say is that everytime I either start the application and there are lots of messages to sync with the Android app or someone writes a message my drive is being accessed like crazy and frankly speaking I wonder why.
I'm shooting in the dark here of course but some of these look promising to me, in particular WAF (in combination with synchronous=NORMAL) and ensuring that bulk inserts use transactions:
Maybe the problem is just worse on Linux due to the different environment (e.g. filesystem).
WAF could help because a lot of time is spent on locking during the startup phase:
> strace -C -tt -f signal-desktop
(...)
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
68,83 74,425698 1094 67980 4679 futex
15,70 16,977804 374 45366 1 epoll_wait
11,32 12,242458 420 29111 poll
1,59 1,715209 343041 5 wait4
0,53 0,572783 5 107352 88096 recvmsg
0,44 0,478264 5 82931 gettid
0,38 0,415437 7 53651 3 read
0,23 0,248866 13 18869 write
0,23 0,247595 6 38633 lseek
0,19 0,201868 11 18045 2 sendto
0,11 0,119595 6 18363 8 ioctl
0,07 0,073579 9 7870 madvise
0,05 0,049093 92 532 fsync
0,04 0,039849 10 3952 pread64
0,04 0,038085 6 5646 writev
(...)
------ ----------- ----------- --------- --------- ----------------
100.00 108,135207 546505 94540 total
But I don't know how much this says after all.
Startup took about 5 Minutes and there are about 200 GET requests in the log, so I guess there were also about 200 loading processes.
Notes:
saveUnprocessed takes consistently more than 1 second.Signal Version: 1.22.0
uname -a: Linux host 5.0.0-arch1-1-ARCH #1 SMP PREEMPT Mon Mar 4 14:11:43 UTC 2019 x86_64 GNU/Linux
Linux Distro: Arch Linux
Signal Package: AUR Package
@He-Ro, how fast is your internet connection? Also, do you know if fstrim was running in the background? On my laptop with an SSD performance completely tanks when fstrim is running as it takes up pretty much all I/O.
Hey @m-burg. I have a fully encrypted disk and I do not have trim or discard options enabled. I also do not have fstrim configured to run.
The stats of my internet connection are as followed: 14ms ping, 25.02 Mbps Downstream, 5.43 Mbps Upstream as tested on speedtest.net. The server is the one hosted at my German ISP, so this is only testing the connection from there to my home).
Going to the speedtest.net server in New York I get a 99ms ping, 15.23 Mbps Downstream and 4.69 Mbps Upstream.
For me on Debian 9 it is very slow to start as well, though I have fast internet.
Server close to home: 35ms ping, 103.23Mbps down, 22.61Mbps up.
New York 111ms ping, 97.51Mbps down, 20.00Mbps up.
Just adding a "me too" to help bump the priority on this. Running Ubuntu 19.04, kernel: 5.0.0-15-generic. 540 messages in, probably 20 minutes or so wait time. This happens every time I restart the app. Internet speed is in the 150Mbps range.
@augray, @Geblaat, is Signals data directory located on an SSD or on a mechanical drive?
I'm running Linux Fedora 30, and experience the same issue. The disk access is extreme high. See for more details: https://github.com/signalapp/Signal-Desktop/issues/2733#issuecomment-439125283 and the rest of that issue.
@m-burg: it鈥檚 a weird hybrid drive-mechanical with an SSD cache
@augray, @Geblaat, is Signals data directory located on an SSD or on a mechanical drive?
The whole system is on a SSD, including Signals data directory.
Same issue here. Brand new Core i7 system, Signal is located on SSD drive. Definitely not a system or hardware limitation. Some sort of locking is occurring while loading messages. Same issue as others, have 500 or so messages, takes 3-5 minutes to start up as it is processing them. I did run strace on the process this time:
[pid 30508] futex(0x7ffc9a05b138, FUTEX_WAKE_PRIVATE, 2147483647) = 1
[pid 30501] <... futex resumed> ) = 0
[pid 30508] futex(0x7fb15ba9b9a8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 30501] futex(0x7ffc9a05b0e8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30501] gettid() = 30501
[pid 30501] gettid() = 30501
[pid 30501] futex(0x7ffc9a05b138, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=96437479} <unfinished ...>
[pid 30505] <... epoll_wait resumed> [{EPOLLIN, {u32=32, u64=32}}], 32, -1) = 1
[pid 30505] recvmsg(32, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\340\0\0\0\20\0\1\0\0\0\0\0\0\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 224
[pid 30505] gettid() = 30505
[pid 30505] gettid() = 30505
[pid 30505] gettid() = 30505
[pid 30505] futex(0x7fb15ba9b9a8, FUTEX_WAKE_PRIVATE, 2147483647) = 1
[pid 30508] <... futex resumed> ) = 0
[pid 30505] epoll_wait(27, <unfinished ...>
[pid 30508] futex(0x7fb15ba9b958, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 30505] <... epoll_wait resumed> [], 32, 0) = 0
[pid 30508] <... futex resumed> ) = 0
[pid 30505] epoll_wait(27, [], 32, 0) = 0
[pid 30505] epoll_wait(27, <unfinished ...>
[pid 30508] gettid() = 30508
[pid 30508] gettid() = 30508
[pid 30508] sendto(32, "\340\32\0\0\20\0\1\0\0\0\0\0\0\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6880, MSG_NOSIGNAL, NULL, 0) = 6880
[pid 30508] futex(0x3923a2cfd668, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 30511] <... futex resumed> ) = 0
[pid 30508] <... futex resumed> ) = 1
[pid 30511] futex(0x3923a2cfd570, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 30508] futex(0x3923a2cfd570, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30511] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 30508] futex(0x7fb15ba9b9a8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 30505] <... epoll_wait resumed> [{EPOLLIN, {u32=32, u64=32}}], 32, -1) = 1
[pid 30511] futex(0x3923a2cfd570, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30505] recvmsg(32, <unfinished ...>
[pid 30511] gettid( <unfinished ...>
[pid 30505] <... recvmsg resumed> {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="x\0\0\0\20\0\1\0\0\0\0\0\0\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 120
[pid 30511] <... gettid resumed> ) = 30511
[pid 30511] gettid( <unfinished ...>
[pid 30505] gettid( <unfinished ...>
[pid 30511] <... gettid resumed> ) = 30511
[pid 30505] <... gettid resumed> ) = 30505
[pid 30511] futex(0x7fb15ba9b9a8, FUTEX_WAKE_PRIVATE, 2147483647 <unfinished ...>
[pid 30505] gettid( <unfinished ...>
[pid 30511] <... futex resumed> ) = 1
[pid 30508] <... futex resumed> ) = 0
[pid 30511] gettid( <unfinished ...>
[pid 30508] futex(0x7fb15ba9b958, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 30511] <... gettid resumed> ) = 30511
[pid 30508] <... futex resumed> ) = 0
[pid 30511] gettid( <unfinished ...>
[pid 30505] <... gettid resumed> ) = 30505
[pid 30511] <... gettid resumed> ) = 30511
[pid 30508] gettid( <unfinished ...>
[pid 30511] futex(0x3923a2d33be0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 30508] <... gettid resumed> ) = 30508
[pid 30505] gettid( <unfinished ...>
[pid 30508] futex(0x3923a2dd3e10, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 30505] <... gettid resumed> ) = 30505
[pid 30505] futex(0x3923a2dd3e10, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 30508] <... futex resumed> ) = 0
[pid 30505] <... futex resumed> ) = 1
[pid 30508] futex(0x3923a2dd3e10, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30505] futex(0x3923a2d33be0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 30508] gettid( <unfinished ...>
[pid 30511] <... futex resumed> ) = 0
[pid 30508] <... gettid resumed> ) = 30508
[pid 30511] gettid( <unfinished ...>
[pid 30508] gettid( <unfinished ...>
[pid 30511] <... gettid resumed> ) = 30511
[pid 30508] <... gettid resumed> ) = 30508
[pid 30511] futex(0x3923a2d33be0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 30508] futex(0x7ffc9a05b138, FUTEX_WAKE_PRIVATE, 2147483647 <unfinished ...>
[pid 30511] <... futex resumed> ) = 0
[pid 30508] <... futex resumed> ) = 1
[pid 30511] futex(0x3923a2cfd66c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 30505] <... futex resumed> ) = 1
[pid 30501] <... futex resumed> ) = 0
[pid 30508] gettid( <unfinished ...>
[pid 30505] epoll_wait(27, <unfinished ...>
[pid 30508] <... gettid resumed> ) = 30508
[pid 30505] <... epoll_wait resumed> [], 32, 0) = 0
[pid 30508] gettid( <unfinished ...>
[pid 30505] epoll_wait(27, <unfinished ...>
[pid 30508] <... gettid resumed> ) = 30508
[pid 30505] <... epoll_wait resumed> [], 32, 0) = 0
[pid 30508] futex(0x7fb15ba9b9a8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 30501] futex(0x7ffc9a05b0e8, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 30505] epoll_wait(27, <unfinished ...>
[pid 30501] gettid() = 30501
[pid 30501] gettid() = 30501
[pid 30501] futex(0x7ffc9a05b138, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=79731484} <unfinished ...>
[pid 30505] <... epoll_wait resumed> [{EPOLLIN, {u32=32, u64=32}}], 32, -1) = 1
[pid 30505] recvmsg(32, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\340\0\0\0\20\0\1\0\0\0\0\0\0\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 224
[pid 30505] gettid() = 30505
[pid 30505] gettid() = 30505
[pid 30505] gettid() = 30505
[pid 30505] futex(0x7fb15ba9b9a8, FUTEX_WAKE_PRIVATE, 2147483647) = 1
[pid 30505] epoll_wait(27, <unfinished ...>
[pid 30508] <... futex resumed> ) = 0
[pid 30505] <... epoll_wait resumed> [], 32, 0) = 0
[pid 30505] epoll_wait(27, <unfinished ...>
[pid 30508] futex(0x7fb15ba9b958, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 30505] <... epoll_wait resumed> [], 32, 0) = 0
[pid 30505] epoll_wait(27, <unfinished ...>
[pid 30508] <... futex resumed> ) = 0
[pid 30508] gettid() = 30508
[pid 30508] gettid() = 30508
If that can help, I had an instance closed since 25 days.
It took more than 40min to start, most of the time, my hard drive activity light was very active.
Processed a bit more than 1000 "messages" (the count on startup which isn't actual messages, maybe the log show the number if actual messages)
logs:
https://debuglogs.org/c2974035af16292734665ad649592fe5e371d0cad4b213aeafb10444348d735b
OS: Antergos Linux (Arch based)
version: 1.25.3
Is there anything else that I can do/test?
Maybe someone can create a debug version which separates the different steps like downloading (internet I/0), processing (CPU/RAM), storing messages (disk I/O).
This could help finding out where exactly the delays occur...
@m-burg How did you send so many messages to yourself? Can you provide us the script?
@darkdragon-001 I just manually used the note to self feature.
You might be able to use signal-cli to make it more automated.
@m-burg Thanks, I will look into it.
@scottnonnenberg-signal Did you check @real-or-random 's comment about sqlite transactions? I think this could make a huge impact if you don't use it already!
@darkdragon-001
As far as I understand, the current structure of the code does not make it easy to add more batch transactions (than already used).
For WAF, there's a PR https://github.com/signalapp/Signal-Desktop/pull/3395, which is inactive for a month now because I couldn't find time to work on this more. Feel free to contribute. I should find time next month, I hope.
I am also experiencing extreme loading times in the Linux desktop app. My hard drive has been writing the whole time, judging by the sounds and activity light.. Right now it is over two hours and has only gotten to "Loading messages. 20 so far..."
Machine specs: Intel Core i5-4460, 16GB ddr3, some not-fancy hard drive (not an SSD).
I am not sure if this is the correct place to share this information, as there seems to be multiple open start up performance issues on the tracker, and I have not dug enough to know more about the issue. Please let me know if I should comment elsewhere and if there is more information I can provide.
Edit: Also, the internet where my desktop is right now is very fast, but intermittently drops the connection for a short period. I have considered that Signal desktop does not handle this well.
I got an idea why performance of SQLite could be an issue on my normally quite fast setup: BtrFS filesystem with COW. I think this problem becomes more severe with more people using modern file systems and Ubuntu even including ZFS in the GUI installer for 19.10.
While obviously some work needs to be done on SQLite site, it makes sense to use all currently offered optimizations.
@real-or-random I am seeing your work in the PR and I hope this will speed-up the process!
I confirm that I'm also using [encrypted] Btrfs.
Could it be that the slowness is caused by SSD's where TRIM is not enabled? I experience the slowness on an intensively used Fedora laptop, where the disk filled up several times to near 100%. I enabled fstrim now, I will keep you informed on whether the startup times improve or not. Maybe other Linux users experiencing this issue could also check if they are using SSD's without TRIM enabled.
For more info see:
https://lukas.zapletalovi.com/2013/11/how-to-trim-your-ssd-in-fedora-19.html
https://blog.christophersmart.com/2013/06/05/trim-on-lvm-on-luks-on-ssd/
https://blog.christophersmart.com/2016/05/12/trim-on-lvm-on-luks-on-ssd-revisited/
Well I don't know which issue is the most current one on the problem, I initially commented on #2733 (https://github.com/signalapp/Signal-Desktop/issues/2733#issuecomment-433626206). But whatever you guys have done in the (I'd say) last 2-3 weeks (?) it now works like a charm - nice work!
I am confirming that since Signal 1.28.0 I have not encountered unacceptable slowness during start-up any more (I am on a fully up-to-date Fedora 31 distribution). This might also be related to my previous comment about trimming the SSD. But from my perspective the issue is fixed. Many thanks to you all for working on this!
For me hard drive LED is still quite active during loading messages, but it is indeed a lot faster since 1.28
While loading messages is a lot faster now (~10 per second), messages get lost. I was away for two months only with my phone, when I got back my laptop just received messages from the first three weeks of my trip but doesn't show any newer messages. I got a few _Note to Self_ with _Message encrypted for non-existing session_ though...
I'm going to close this, since a few of you have confirmed performance improvements. We'd like people to enter new bugs once things get this big anyway.
@darkdragon-001 please enter a new bug - that doesn't have anything to do with performance. It is quite interesting that you say that you only got the first three weeks of messages while desktop was offline about eight weeks. Remember, the server queue for any device is 1000 messages, so for that kind of offline time, missing messages are likely. We'd just expect the more recent messages (from the end of your trip) to be the set you end up with.
Most helpful comment
After running five tests on three machines I believe the problem is I/O related.
Method
Machines
Machine 1
Machine 2
Machine 3
All machines are running 64 bit Xubuntu 18.04.
All machines got their internet connection over Ethernet, all have a 1Gbps interface, the download speed I get from my ISP is about 200Mb/s.
Test results
| Test | Machine | TTL | TTLM | TTMS | Message count | CPU | RAM | Notes | Log |
|------|---------|------|---------------------------------------|-------|---------------|-----|-------|------------------------------|----------------------------------------------------------------------------------------|
| 1 | 1 | 0:01 | 3:10 | 20:00 | 1030 | 10% | 705MB | Constant freezing, warm boot | https://debuglogs.org/99e941c3fbedcc8b87cf08b7900774d0206427b935831f823016a0fee3356558 |
| 2 | 2 | 0:06 | 0:06, 120 messages already downloaded | 0:50 | 1030 | 50% | 439MB | Cold boot | https://debuglogs.org/1a3f1327d432d7796c8e5925b29e868190b5d331f6a08248e45b6c194a501300 |
| 3 | 3 | 0:10 | 3:30 | 19:40 | 1030 | 20% | 560MB | Constant freezing, warm boot | https://debuglogs.org/99bc9a57ac9f6ac6ee8ab17e72f1eee65e20416eddf78f2322da42f0801b0273 |
| 4 | 3* | 0:01 | 0:05 | 1:08 | 1030 | N/A | N/A | Warm boot | N/A |
| 5 | 3* | 0:01 | 0:01 | 1:00 | 1010 | 75% | 675MB | Warm boot | https://debuglogs.org/ba46fc1c6262880f447dbb4eb57791bed184dbecc6d83565223e100fb7954397 |
Legend:
I tested running in RAM twice because I forgot to collect the logs the first time.
The most apples-to-apples comparison is the ten year old Toshiba running on a mechanical drive vs running in RAM.