Signal-android: Backup notification and temp file disappear, no backup created

Created on 22 Jul 2018  ·  50Comments  ·  Source: signalapp/Signal-Android


Bug description

After noticing that there hasn't been a successful backup since 18 Apr, manually started backup.
(No automatic backups have been created since then, either.)
Backup notification disappears whilst count continues.
Went into the cache folder, ran backup again... temp file disappears at the same time notification disappears, whilst counting continues.
(Note: Chat backups to external storage is checked as this is a fairly low memory device. Cache folder not on the external storage, which may be relevant.)
(The notification did not disappear at the same point in the count.)

Steps to reproduce

  • Go to Settings - Chat and Media
  • Press Create Backup

Actual result: Notification disappears whilst count continues, backup created incomplete
Expected result: Backup created with all messages

Screenshots

Device info


Device: Sony Xperia Z3+ (ivy)/LineageOS 14.1-nightly
Android version: 7.1.2
Signal version: 4.24.2

Link to debug log


https://debuglogs.org/c18d9a0c31e00bb5241263549ac811a0b324f57c210a83692b18c7b91d4df632

https://debuglogs.org/c847f944dacc9d649d7ccfc3640c32ef6e3771b621382c3e11dfc8abda5da5c9

All 50 comments

We removed the ability to backup to external SD cards in 4.23.0. Is it possible your backup doesn't fit in internal storage?

I can send a screenshot that still says external storage on the current beta.
Also, the point at which the backup fails is not similar each (consecutive) time, as noted above, which would indicate that it's not from lack of storage. The last successful backup was around 150M and there is significantly more free space than that.

@elflng In Android, the space programatically referred to as "external storage" is essentially internal storage. It's a reference to this method. It is confusing though, and the user-facing string should be probably be changed to indicate that it's not on the external SD card.

Give that, when you say you have "significantly more free space than [150M]", is that true for internal storage, or external storage?

Both. I am able to download lineageos updates (~450M) to internal storage without low memory warnings, and this should be a fraction of that.

-elf

On 24 July 2018 19:20:00 GMT+03:00, Greyson Parrelli notifications@github.com wrote:

@elflng In Android, the space programatically referred to as "external
storage" is essentially internal storage. It's a reference to this
method
.
It is confusing though, and the user-facing string should be probably
be changed to indicate that it's not on the external SD card.

Give that, when you say you have "significantly more free space than
[150M]", is that true for internal storage, or external storage?

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/signalapp/Signal-Android/issues/8024#issuecomment-407465981

Update: after the most recent update, same behaviour EXCEPT it says the backup is successful and a file is created - but i imagine it is incomplete as the temp file disappears when it is still counting.

I have the same problem. I get the notification that a backup is being created, and Chats and Media -> Create backup shows that a backup was created. However, checking both my internal storage and my SD card, one doesn't even _have_ the Backup folder in the Android/data/org.thoughtcrime.securesms/files folder, and the other one is empty. So _if_ the backup is being created, I'm not quite sure where it is.

The backup should be located in the root of your internal storage in a
folder called "Signal".

On Sat, Jul 28, 2018, 9:55 AM ಚಿರಾಗ್ ನಟರಾಜ್ notifications@github.com
wrote:

I have the same problem. I get the notification that a backup is being
created, and Chats and Media -> Create backup shows that a backup was
created. However, checking both my internal storage and my SD card, one
doesn't even have the Backup folder in the
Android/data/org.thoughtcrime.sms/files folder, and the other one is
empty. So if the backup is being created, I'm not quite sure where it
is.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/signalapp/Signal-Android/issues/8024#issuecomment-408609489,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AjlVq-D633QOIcanDex9io_v3HF3dZf8ks5uLG07gaJpZM4VZ40C
.

Oh man, thank you :pray: It seems like it actually worked :blush:

Today's beta still has the issue.

Tried turning off backups and turning back on to see if it was a state issue. It's not.

Question: is there a max count of messages backed up enforced somewhere? It looks at the moment like each backup is stopping around the 8000 mark. (Is there a 2^13 constant in there somewhere?)

Problem still exists on today's update.

@elflng For what it's worth, I have over 40K messages and the backup works fine. I wonder if one of your messages happens to be corrupted...?

Are you sure they are all being backed up and that the backup process isn't aborting in midstream? This is precisely the bug that we are discussing... an incomplete backup file is being created, which aborts in midstream, though the counter continues.

If you are sure all your messages are being backed up, it is an excellent question and one for which i do not know how to test.

(If i could install signal on a non-phone device (eg tablet) and try to restore the backup, i imagine it would be possible to determine which message was corrupt, although there does not seem to be consistency as to which message number the process aborts on.)

Suggestions?

-elf

On 6 August 2018 15:43:00 GMT+03:00, "ಚಿರಾಗ್ ನಟರಾಜ್" notifications@github.com wrote:

@elflng For what it's worth, I have over 40K messages and the backup
works fine. I wonder if one of your messages happens to be
corrupted...?

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/signalapp/Signal-Android/issues/8024#issuecomment-410694696

@elflng The backup file is certainly being created in /sdcard/Signal/, which I imagine wouldn't happen should the backup be aborted midstream...right?

I dont know if it's relevant, but i thought of this because of the last comment by @chiraag-nataraj . I write in Hebrew and English, all the names in my addressbook now are in Hebrew but some were in English initially, and the server stored names may be in a variety of languages. Does this matter at all?

Also, is there a way to pull out all the contents of the active signal db and then repopulate it? I'm at a loss for how to test this without diving into the code, which i wont have the time to do for several days.

@elflng fwiw I have contact names in Devanagari, Bangla, Telugu, Tamil, Kannada, Malayalam, Mandarin, Cyrillic, Greek, and various other languages (which use the Latin script).

No, that is exactly what is happening. The file is created but it is incomplete - it is created after <16000 messages (much less but thats a loose limit), whilst the counter continues but no additional messages are written.

On 6 August 2018 16:02:11 GMT+03:00, "ಚಿರಾಗ್ ನಟರಾಜ್" notifications@github.com wrote:

@elflng The backup file is certainly being created in
/sdcard/Signal/, which I imagine wouldn't happen should the backup be
aborted midstream...right?

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/signalapp/Signal-Android/issues/8024#issuecomment-410700090

Oh, my counter finishes normally and the "Last backup created" gets updated. So as far as I can tell, my backup is finishing normally.

Does the notification of "creating backup" persist until the counter finishes?

Yes, it does.

Mine doesn't. Just tried again - loses at around 7000 this time.

It's also interesting that you say your backup is 150M. Mine (of 40K+ messages) is around...16M? I suppose that might just be a function of what you send...

Never mind. I was confusing it with my WhatsApp backups. My Signal backups are 367M.

https://debuglogs.org/5b249fc7e998eeb3a43da5fde562abfdfcb0030ca1b99ace03c95cbeae05241e

I got a debug log by waiting to press submit until the icon disappeared, without the context switch noise from changing windows.

For fun, i turned off backups whilst the counter was running (after the notification disappeared and the incomplete backup was created). The counter continued, and continued after backups were turned back on. At least it says correctly "never" for last backup time.

Problem persists in today's update. Debug log taken immediately after notification disappears:
https://debuglogs.org/c847f944dacc9d649d7ccfc3640c32ef6e3771b621382c3e11dfc8abda5da5c9

Problem persists in today's beta update.
It seems from the logs that there may be unidentified message types causing the error, if i understood correctly. Havent had time to dig into the code because of a death in the family, apologies.

Problem persists in today's beta.

Problem persists in today's beta.

Problem persists in second beta push today.
https://debuglogs.org/e00ac58ceb3c394ca7c6fa0a4877eef6063ee3be3ea3f490552b5d5990ac6849
is a new log with some more info

@elflng That log contains the deletion of a 2-day-old backup file, which makes it look like backup files are in fact being created correctly:

2018-08-23 00:36:47.376 GMT+03:00 I BackupUtil: Deleting: /storage/emulated/0/Signal/Backups/signal-2018-08-21-03-02-00.backup

Edit: sorry, I missed your earlier comment. So, files not being created is no longer the problem? Please correct me if I'm misunderstanding something. How did you confirm there's still an issue? Did you try restoring a backup on another phone or an Android emulator?

Problem persists in today's release.
The problem is that the backup is being created prematurely with ~1/4 of the correct messages, based on the counter, for unknown reasons. I would like to write something that can log each message into a separate file before passing it to the handlers to see where the problem is, and to simul record whether the backup process is still active.

Problem persists in today's update.
I was finally able to get a log exactly at the fail point:
https://debuglogs.org/ed16f2957246c0748c07dcc45d08e8f809b6a09169006be3694e94a66fc7b2bd

@greyson-signal It seems that a timeout for an mms decoder shuts down the connection. All the timeout vars seem to be hardcoded - is there any way to adjust them from within the program to test?

This in and of itself is a bug - a single write timeout shouldn't kill a receiving thread with multiple writers.

@greyson-signal

We removed the ability to backup to external SD cards in 4.23.0. Is it possible your backup doesn't fit in internal storage?

Can I ask what was the motivation and what is a workaround? My signal data is ~12 GB, so it does not fit on phones memory, to SD card it'd fit easily. I'm switching phones so I desparately need a backup. How can I proceed?

@breznak The directory we were persisting backups to on external SD cards would be deleted upon app uninstall, which created a ton of problems and broke a lot of assumptions of how people expect backups to work.

I guess if I was in your situation, I'd:

  1. Migrate as much stuff as possible to my new phone
  2. Delete all of the migrated stuff from my old phone (hopefully at this point there's only signal left on your old phone)
  3. If you have enough space at this point, that's great.
  4. If not, go to Settings > Chats and media > Message trimming. Turn it on with a very high number, and then slowly lower it until your data is small enough that you can fit your backup on your device.

@greyson-signal I certainly have enough space in the phone's regular memory - see above - and I think I have at least some of the bugs identified. Is there a way to override timeout settings?

@elflng Looking at:
https://debuglogs.org/ed16f2957246c0748c07dcc45d08e8f809b6a09169006be3694e94a66fc7b2bd

Not sure what you're referring to by

It seems that a timeout for an mms decoder shuts down the connection.

Copy-paste the log sequence you're referring to. However, if you're referring to issues with the WebSocketConnection, the rest of the log implies you're experiencing some network issues, which should be unrelated to the backup process.

Here's an example of a relevant snippet. Looks like the backup job starts and then finished ~1min later, with no reported error. This could just be a failure on our part to log something.

2018-08-28 11:02:21.248 GMT+03:00 I LocalBackupJob: Executing backup job...
2018-08-28 11:03:07.435 GMT+03:00 W WebSocketConnection: Sending keep alive...
2018-08-28 11:03:08.029 GMT+03:00 W WebSocketConnection: WSC onMessage()
2018-08-28 11:03:08.029 GMT+03:00 W WebSocketConnection: Message Type: 2
2018-08-28 11:03:12.449 GMT+03:00 W MessageRetrievalService: Application level read timeout...
2018-08-28 11:03:12.452 GMT+03:00 D MessageRetrievalService: Network requirement: true, active activities: 1, push pending: 0, gcm disabled: false
2018-08-28 11:03:12.456 GMT+03:00 I MessageRetrievalService: Reading message...
2018-08-28 11:03:32.203 GMT+03:00 I BackupUtil: Deleting: /storage/emulated/0/Signal/Backups/signal-2018-08-27-03-04-35.backup

Questions:

  • How, specifically, are you determining the backup file is incomplete?
  • Do you have any OS-specific settings that may be limiting apps running in the background? I'm not super-familiar with the specific feature set of Lineage, but I know that Samsung and Huawei both have extra settings for restricting background work.
  • If possible, could you take a video of what happens when you click "Create backup"? I just want to be clear on what's happening with the interaction between the backup running and the foreground notification.

I'm not referring to the websocket connection. Look at the top of the log, where you can see all of the failed writes (nosubscriber events). Find all the java.io.ioexceptions and eofexceptions and look at their contexts. The first one occurs after an mms timeout.
I'm sorry I'm not pasting but it's a huge paste when I tried, and giving you search terms is more useful.

There shouldn't be any relevant network problems - it's on wifi, on the desk, not moving, full signal. The cell reception is very poor but shouldn't be remotely relevant to signal.

I have ~40k messages. The backup file writes and the notification disappears when the counter reads ~10k, after ~1 min each time.

I have no OS limitations that I am aware of, in general, and I have Signal as a trusted app in terms of privacy setting and execution.

I will figure out how to take a video and post.

Thank you.

(Problem continues in today's update - and as a relevant question, can the timeouts be set manually or do they require recompilation? Always taking 1 minute before the process stops is a bit of a red flag to me.)

@greyson-signal I have a video but it has some personal info - how can i post it to you privately?

Ok, let's go through the exceptions I think you're talking about. Looking at:
https://debuglogs.org/ed16f2957246c0748c07dcc45d08e8f809b6a09169006be3694e94a66fc7b2bd

Look at the top of the log, where you can see all of the failed writes (nosubscriber events)

08-28 11:08:00.115 12707 12751 D EventBus: No subscribers registered for event class org.thoughtcrime.securesms.backup.FullBackupBase$BackupEvent

These events are posted so the UI can updated with messages like "n messages backed up" or whatever the UI says. I'm assuming no subscribers registered means that the app was backgrounded and broadcasting an event that no one is listening too. Should be harmless.

Find all the java.io.ioexceptions

2018-08-28 06:55:50.305 GMT+03:00 W SignalServiceMessageSender: java.io.IOException: No connection!
    at org.whispersystems.signalservice.internal.websocket.WebSocketConnection.sendRequest(WebSocketConnection.java:134)
    at org.whispersystems.signalservice.api.SignalServiceMessagePipe.send(SignalServiceMessagePipe.java:119)
    at org.whispersystems.signalservice.api.SignalServiceMessageSender.sendMessage(SignalServiceMessageSender.java:656)
    at org.whispersystems.signalservice.api.SignalServiceMessageSender.sendReceipt(SignalServiceMessageSender.java:136)
    at org.thoughtcrime.securesms.jobs.SendReadReceiptJob.onRun(SendReadReceiptJob.java:56)
    at org.thoughtcrime.securesms.jobmanager.JobConsumer.runJob(JobConsumer.java:74)
    at org.thoughtcrime.securesms.jobmanager.JobConsumer.run(JobConsumer.java:47)

Signal is unable to send a message because of a broken socket (i.e. the websocket stuff I was referring to). Not ideal, and may be a separate matter, but it's unrelated to backup.

and eofexceptions

2018-08-28 08:45:20.257 GMT+03:00 W WebSocketConnection: java.io.EOFException
    at okio.RealBufferedSource.require(RealBufferedSource.java:60)
    at okio.RealBufferedSource.readByte(RealBufferedSource.java:73)
    at okhttp3.internal.ws.WebSocketReader.readHeader(WebSocketReader.java:113)
    at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:97)
    at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:265)
    at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:204)
    at okhttp3.RealCall$AsyncCall.execute(RealCall.java:153)
    at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
    at java.lang.Thread.run(Thread.java:761)

Similar deal. Websocket reading issues. Unrelated to backup.

The first one occurs after an mms timeout

08-28 11:08:12.480 12707 12961 W MessageRetrievalService: Application level read timeout...

Also goes hand-in-hand with the other socket problems. The socket likely broke due to a timeout, causing all of the other exceptions. This could happen for any number of reasons. Your phone could be trying to use cellular network instead of Wi-Fi, idk. Unless you're experiencing consistent message sending/receiving problems, this was all likely some transient issue that is acting as a red herring. Regardless, network problems should have no impact on backups.

If I'm misunderstanding something, please copy-paste a line from the log. Even a single line will at least include a timestamp so I can jump to where you're looking at. You don't have to copy the whole thing.

Also, just saw you're running Lineage nightlies. Lineage has historically caused weird bugs through buggy OS behavior -- I imagine these problems are intensified on nightlies. Maybe try switching to a stable release or something? Also let us know if you're running any other OS-modification type things, like Xposed. Those are typically another source of problems.

The app was never backgrounded - that's what happens after the file is written and the notification disappears whilst it is still processing.

I'm running a Sony Xperia Z3+ - the only releases are the nightlies, which haven't given me any problems (except once, 9 months ago).

I am running Xposed and XprivacyLua, with no hooks/mods on Signal.

Where is it sending messages to? The only messages at the times I have logged are between the main process and the backup process - nothing else has been sent or received - which is why I thought it had to do with the backup. What other pipes were there?

https://debuglogs.org/20cb95369ca2d76235d8ddf32ffba45ab3e99b167b97cb325ead4e1606ea636c

Taken just now, in airplane mode, to filter out any garbage from network. Similar set of exceptions with no possibility of network - so I assume that the pipe is indeed the backup process, as originally stated. (I had done this earlier to make that determination but didn't log it.)

(Also, I'm wrong - it is backgrounded in the process of reading the log and submitting it - but I go straight to the menu -> advanced screen as soon as i start the backup, not at the point that the notification disappears, so if that part is from backgrounding, it would have happened much earlier in the log.)

Where is it sending messages to?

You can look around timestamp 2018-08-28 07:12:50.375 GMT+03:00. These logs cover several hours, so it's likely messages just being sent at some other time.

so I assume that the pipe is indeed the backup process, as originally stated

I can assure you that the websocket has nothing to do with the backup process. What's likely happening is that app has been woken up (either by the backup starting or via you opening the app to start a backup), and when that happens the app will attempt to establish the websocket connection for the purpose of having it available to send/receive messages. Which now it can't do, since you're in airplane mode. Please don't worry about websocket connection. The backup does not use sockets or pipes in any way.

Unfortunately the only thing to be learned from this backup is that there isn't enough logging to determine what's happening. The entire relevant section of the log is this:

2018-09-01 03:12:17.998 GMT+03:00 I LocalBackupJob: Executing backup job...
2018-09-01 03:12:18.011 GMT+03:00 I PersistentAlarmManagerListener: class org.thoughtcrime.securesms.service.LocalBackupListener scheduling for: 1535847138005
2018-09-01 03:12:51.072 GMT+03:00 I BackupUtil: Deleting: /storage/emulated/0/Signal/Backups/signal-2018-08-30-03-06-20.backup

In other words, the backup starts and then finishes ~30s later without any exceptions on our part (the deletion of the backup happens here, after the backup has finished). Now, there could, in fact, be some sort of bug on our part, the logs just don't help us figure it out. Every other exception we've talked about is outside the scope of that backup. The logs are in chronological order. So if you see stuff that is happening outside of this block, chances are that it's irrelevant.

Given that, there's nothing really left to be said about this until we have more information. More information could be gathered by:

  • Logging more stuff.
  • Attempting to restore your backup on another device, and seeing what happens.
  • Doing a backup on another device, importing that backup on your problem device, and then attempting to do a backup on the problem device.
  • Changing the OS your phone is running and trying again.
  • Uninstalling Xposed or any other OS-modifications and trying again.

If I copy everything in /data/data/org.thoughtcrime.securesms to my computer with android studio, and do some custom signal builds for extra logging and testing, will it work or will it reject the db?
(The first thing of course being to run it in the emulator with no modifications to see if it fails there too... if so, i think we can rule out xposed/lineage/whatever as causes)
I've not tried transplanting live data into the emulator before, hence the question.

If I copy everything in /data/data/org.thoughtcrime.securesms to my computer with android studio, and do some custom signal builds for extra logging and testing, will it work or will it reject the db?

Without the Keystore key that's used for encrypting the database, Signal will crash. Not sure how transferring the key would work. Instead, you could try restoring one of your backup files in the emulator and see what happens.

The point is that i want to see if it fails in the emulator in the same way it does on device. Any way to retrieve the keystore key?

On 3 September 2018 4:57:04 GMT+03:00, haffenloher notifications@github.com wrote:

If I copy everything in /data/data/org.thoughtcrime.securesms to my
computer with android studio, and do some custom signal builds for
extra logging and testing, will it work or will it reject the db?

Without the Keystore key that's used for encrypting the database,
Signal will crash. Not sure how transferring the key would work.
Instead, you could try restoring one of your backup files in the
emulator and see what happens.

--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/signalapp/Signal-Android/issues/8024#issuecomment-417979995

Further, how can i get a message count easily? And if i use the sdk, i dont think i can sign the (debuggable, verbose logging) output, so i have no idea what will happen if i try to run it. I generally avoid android/java dev. Some pointers would be nice if im going to dive in.

Oke, reflashed without xposed - the notification stays now. The file is somewhat larger than before, but still smaller than I would expect. Going to figure out which xposed module is affecting the notification and see if the backup is any good. (The total message count seemed smaller than i remembered.)

Oke. So many xposed modules cause the problem but xposed itself does not.

Without xposed, the backup size jumped by about 30M (and no, i havent been sending pics or anything like that).

With xposed and modules reinstalled, the file size stays at the new (larger) size, but the notification goes away in the middle of the counter again.

So a) why the size jump and b) why does the counter seem to lose its sync with the actual writing with hooks unrelated to the issue?

The issue is still partially open - the counter does not reflect the actual count for some reason - but the backup file seems oke (though the sha256sum does not match even though the size is the same and the contents should be the same).

Was this page helpful?
0 / 5 - 0 ratings