Operating System (Linux/Mac/Windows/iOS/Android):
Android 10 on Fairphone FP2 (Lineage OS 17.1 build QQ3A.200805.001 from 2020-08-14).
Delta Chat Version:
1.12.2.
Expected behavior:
Detect active airplane mode.
Actual behavior:
Active airplane mode is not detected.
Steps to reproduce the problem:
-- Enable airplane mode.
Screenshots:
N/A.
Logs:
This was logged during airplane mode:
08-17 15:29:42.212 6355 6407 W DeltaChat: src/imap/idle.rs:110: Idle wait errored: Io(Os { code: 110, kind: TimedOut, message: "Connection timed out" })
08-17 15:29:42.232 6355 6407 W DeltaChat: src/scheduler.rs:147: IMAP IDLE protocol failed to init/complete: io: Broken pipe (os error 32)
08-17 15:29:42.276 6355 6407 I DeltaChat: src/imap/select_folder.rs:32: Expunge messages in "DeltaChat".
08-17 15:29:42.277 6355 6407 W DeltaChat: src/imap/mod.rs:342: failed to close folder: NoSession
08-17 15:29:42.277 6355 6407 W DeltaChat: src/imap/idle.rs:110: Idle wait errored: Io(Os { code: 110, kind: TimedOut, message: "Connection timed out" })
08-17 15:29:42.277 6355 6407 E DeltaChat: Antwort von IMAP posteo.de:143: io: could not resolve address `("posteo.de", 143)` Einige Anbieter platzieren zus盲tzliche Informationen im Posteingang, die Sie z.B. im Web-Frontend abrufen k枚nnen. Bitten Sie Ihren Anbieter oder FreundInnen um Rat, wenn es Probleme gibt.
08-17 15:29:42.278 6355 6407 W DeltaChat: src/imap/mod.rs:421: failed to setup imap handle: IMAP Connection Failed params: io: could not resolve address `("posteo.de", 143)`
08-17 15:29:42.278 6355 6407 W DeltaChat: src/scheduler.rs:130: imap connection failed: IMAP Connection Failed params: [email protected] imap:[email protected]:***:posteo.de:143:cert_automatic smtp:[email protected]:***:posteo.de:587:cert_automatic AUTH_NORMAL 0x4IMAP_STARTTLS 0x100SMTP_STARTTLS 0x10000
08-17 15:29:42.278 6355 6407 I DeltaChat: src/imap/idle.rs:136: IMAP-fake-IDLEing...
08-17 15:29:42.279 6355 6407 W DeltaChat: src/scheduler.rs:147: IMAP IDLE protocol failed to init/complete: io: Broken pipe (os error 32)
08-17 15:29:42.281 6355 6407 I DeltaChat: src/imap/select_folder.rs:32: Expunge messages in "Sent".
08-17 15:29:42.281 6355 6407 W DeltaChat: src/imap/mod.rs:342: failed to close folder: NoSession
08-17 15:29:42.282 6355 6407 E DeltaChat: Antwort von IMAP posteo.de:143: io: could not resolve address `("posteo.de", 143)` Einige Anbieter platzieren zus盲tzliche Informationen im Posteingang, die Sie z.B. im Web-Frontend abrufen k枚nnen. Bitten Sie Ihren Anbieter oder FreundInnen um Rat, wenn es Probleme gibt.
08-17 15:29:42.283 6355 6407 W DeltaChat: src/imap/mod.rs:421: failed to setup imap handle: IMAP Connection Failed params: io: could not resolve address `("posteo.de", 143)`
08-17 15:29:42.283 6355 6407 W DeltaChat: src/scheduler.rs:130: imap connection failed: IMAP Connection Failed params: [email protected] imap:[email protected]:***:posteo.de:143:cert_automatic smtp:[email protected]:***:posteo.de:587:cert_automatic AUTH_NORMAL 0x4IMAP_STARTTLS 0x100SMTP_STARTTLS 0x10000
08-17 15:29:42.283 6355 6407 I DeltaChat: src/imap/idle.rs:136: IMAP-fake-IDLEing...
08-17 15:29:42.334 6355 6407 W DeltaChat: src/imap/idle.rs:110: Idle wait errored: Io(Os { code: 110, kind: TimedOut, message: "Connection timed out" })
08-17 15:29:42.354 6355 6773 W Adreno-EGL: <qeglDrvAPI_eglGetConfigAttrib:607>: EGL_BAD_ATTRIBUTE
08-17 15:29:42.372 6355 6407 W DeltaChat: src/scheduler.rs:147: IMAP IDLE protocol failed to init/complete: io: Broken pipe (os error 32)
08-17 15:29:42.374 6355 6407 I DeltaChat: src/job.rs:1077: loading job for Imap-thread
08-17 15:29:42.398 6355 6407 I DeltaChat: src/imap/select_folder.rs:32: Expunge messages in "INBOX".
08-17 15:29:42.398 6355 6407 W DeltaChat: src/imap/mod.rs:342: failed to close folder: NoSession
08-17 15:29:42.414 6355 6407 E DeltaChat: Antwort von IMAP posteo.de:143: io: could not resolve address `("posteo.de", 143)` Einige Anbieter platzieren zus盲tzliche Informationen im Posteingang, die Sie z.B. im Web-Frontend abrufen k枚nnen. Bitten Sie Ihren Anbieter oder FreundInnen um Rat, wenn es Probleme gibt.
08-17 15:29:42.415 6355 6407 W DeltaChat: src/imap/mod.rs:421: failed to setup imap handle: IMAP Connection Failed params: io: could not resolve address `("posteo.de", 143)`
08-17 15:29:42.415 6355 6407 W DeltaChat: src/scheduler.rs:130: imap connection failed: IMAP Connection Failed params: [email protected] imap:[email protected]:***:posteo.de:143:cert_automatic smtp:[email protected]:***:posteo.de:587:cert_automatic AUTH_NORMAL 0x4IMAP_STARTTLS 0x100SMTP_STARTTLS 0x10000
08-17 15:29:42.415 6355 6407 I DeltaChat: src/imap/idle.rs:136: IMAP-fake-IDLEing...
08-17 15:29:42.212 6355 6407 W DeltaChat: src/imap/idle.rs:110: Idle wait errored: Io(Os { code: 110, kind: TimedOut, message: "Connection timed out" })
08-17 15:29:42.232 6355 6407 W DeltaChat: src/scheduler.rs:147: IMAP IDLE protocol failed to init/complete: io: Broken pipe (os error 32)
08-17 15:29:42.276 6355 6407 I DeltaChat: src/imap/select_folder.rs:32: Expunge messages in "DeltaChat".
08-17 15:29:42.277 6355 6407 W DeltaChat: src/imap/mod.rs:342: failed to close folder: NoSession
Referring to https://github.com/deltachat/deltachat-core-rust/pull/1704 , I guess that Expunge should not be called after IMAP IDLE has failed, correct?
Interesting.
What would be the expected behavior? Wifi was active (as far as DC can tell) so there probably was a connection.
Would you want DC to be inactive during Airplane mode even if there is an active Wifi?
Wifi was active (as far as DC can tell) so there probably was a connection.
@angelo-fuchs No, just the WiFi (WLAN) interface of the phone was active, but not connected.
@gerryfrancis I see.
It looks like a bug here. ApplicationDcContext#isNetworkConnected checks ConnectivityManager#getActiveNetworkInfo and NetworkInfo#isConnected which should only return true if there is a connection that can be used.
We need to check if isNetworkConnected is called everywhere it should be.
@angelo-fuchs I think, I have found a hint: Usually the message "No network." appears in Delta Chat when it is opened and a connection to the internet is not available. This does not happen when I reboot the phone and enable airplane mode immediately before I open the app. (I waited approximately 15 seconds before I did so.)
This is what happens when I enable airplane mode:
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: Work [ id=267c5f01-3903-4fdd-af96-adea247d5fbb, tags={ org.thoughtcrime.securesms.connect.FetchWorker } ] was cancelled
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: java.util.concurrent.CancellationException: Task was cancelled.
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: at androidx.work.impl.utils.futures.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:1184)
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: at androidx.work.impl.utils.futures.AbstractFuture.getDoneValue(AbstractFuture.java:514)
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: at androidx.work.impl.utils.futures.AbstractFuture.get(AbstractFuture.java:475)
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: at androidx.work.impl.WorkerWrapper$2.run(WorkerWrapper.java:298)
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: at androidx.work.impl.utils.SerialExecutor$Task.run(SerialExecutor.java:91)
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
08-18 21:36:12.238 6226 6276 I WM-WorkerWrapper: at java.lang.Thread.run(Thread.java:919)
@angelo-fuchs Is that correct?
We need to check if isNetworkConnected is called everywhere it should be.
In general, by design, we do _not_ check if network is available or not before doing something on the network.
the reason for that is that these checks are not reliable. in flaky networks, the shift between "network available" and "no network" may be so fast, that there is not much sense to check the state before trying doing something. the state may have changed after we checked the state ;)
so, the approach of the core is just to "keep trying" - of course, with larger backoff timeouts. only if android signals that there is "potentially some network state change" - eg. because of wifi connected, we pass these information to the core, that resets the backoff timeouts then (roughly).
(just some general background information, i did not read through all logs in this issue, might be there is another bug that is fixable from android side)
@r10s Thank you for your explanation, I thought something similar after reading the logs, which went in the "try and error" direction. ;) Nevertheless I believe that the issue is caused by the cancellation of FetchWorker. (See my comment above.) When this happens, we have these two lines afterwards:
08-17 15:29:42.212 6355 6407 W DeltaChat: src/imap/idle.rs:110: Idle wait errored: Io(Os { code: 110, kind: TimedOut, message: "Connection timed out" })
08-17 15:29:42.232 6355 6407 W DeltaChat: src/scheduler.rs:147: IMAP IDLE protocol failed to init/complete: io: Broken pipe (os error 32)
FetchWorker was "killed" before by enabling airplane mode, but it seems that the rest is not aware of it, so procedures are continuing (and failing):
08-17 15:29:42.276 6355 6407 I DeltaChat: src/imap/select_folder.rs:32: Expunge messages in "DeltaChat".
08-17 15:29:42.277 6355 6407 W DeltaChat: src/imap/mod.rs:342: failed to close folder: NoSession
Of course, this is just my personal explanation of what is happening in the background, because I cannot read the code. ;)
OK, regarding FetchWorker, it seems that I followed the wrong path, because the task is no longer cancelled after Delta Chat is excluded from battery optimization.
But the remaining question is: Why are Expunge operations still initiated shortly after "Connection timed out"?
On Tue, Aug 18, 2020 at 14:07 -0700, bjoern wrote:
so, the approach of the core is just to "keep trying" - of course, with larger backoff timeouts.
Are you sure we do exponential backoff for smtp and imap connection attempts?
Are you sure we do exponential backoff for smtp and imap connection attempts?
no :)
my comment was more about that network-state-checking might not be as helpful as one might assume.
however, the job is retried at some point in time, imu, this implies trying to set up the connection not later than that. but maybe earlier for $reason.
@r10s OK, that seems to be the case. But what I still cannot understand is why Expunge operations are called immediately after "Connection timed out" each and every time, just milliseconds afterwards. To me it looks like a trigger, but still it does not make sense to launch further network operations immediately after it has been clear that no network is available.
r10s:
We need to check if isNetworkConnected is called everywhere it should be.
In general, by design, we do _not_ check if network is available or not before doing something on the network.
the reason for that is that these checks are not reliable. in flaky networks, the shift between "network available" and "no network" may be so fast, that there is not much sense to check the state before trying doing something. the state may have changed after we checked the state ;)
so, the approach of the core is just to "keep trying" - of course, with larger backoff timeouts. only if android signals that there is "potentially some network state change" - eg. because of wifi connected, we pass these information to the core, that resets the backoff timeouts then (roughly).
I can confirm this simple "keep trying" and therefore I opened #1580. For a while now I'm trying to get a better approach than that. But shure @r10s is right: network available information from device is not as reliable as we could think! I'm faced very often by a flaky network so this issue came up.
I added some more logs to the code to see what's going on.
My new approach is based on the "network available" status again. But simply control this information from device events is not sufficient. It needs to be feed by connect and retry incidents too to come to a more reliable information!
Any network activity is then done or not based on that new status.
I'm working on that and see some better results meanwhile, but I'm still not confident with it !
get_backoff_time_offset(tries) function returns a very random backoff time. I replaced it by a more predictable version:
fn get_backoff_time_offset(tries: u32) -> i64 {
//~let n = 2_i32.pow(tries - 1) * 60;
//~let mut rng = thread_rng();
//~let r: i32 = rng.gen();
//~let mut seconds = r % (n + 1);
//~if seconds < 3 {
//~seconds = 3;
//~}
let seconds = match tries {
0 => 3,
1..=4 => tries * 60,
5..=6 => tries * 180,
7..=9 => tries * 600,
10..=12 => tries * 1200,
13..=17 => tries * 2400,
_ => tries * 3600,
};
seconds as i64
}
keep trying ...