I have an android app in kotlin that talks to an express nodeJS api via mutual authentication. The app is designed to keep collecting data after the screen is off/on, survive reboots, etc. This is done with the user's consent.
This is occuring on an Android 8.1 Samsung tablet that has no other apps installed (other than the standard samsung installs)
After some time of the screen being off, (Time until the error shows up ranges from 20 minutes to 3 hours) we get the below exception until we restart (We left it running in this error state for >8 hours, and it never recovered) the app or recreate retrofit object. The foreground service is still running and collecting data without an issue.
The issue is not a server problem, I had a shell script curl the server with the same parameters and that worked for ~16 hours.
We have tried:
100 items, which translates to a ~120kb body)Workaround:
When this exception is thrown, we create a new retrofit service, drain all connectionss, then resume posting to the API.
Versions used:
implementation 'com.squareup.retrofit2:retrofit:2.6.1'
implementation 'com.squareup.retrofit2:converter-gson:2.6.1'
implementation 'com.squareup.okhttp3:okhttp:4.2.0'
implementation 'com.squareup.okhttp3:logging-interceptor:4.2.0'
implementation 'com.squareup.okhttp3:okhttp-brotli:4.2.0'
Stack Trace:
javax.net.ssl.SSLException: Read error: ssl=0xa65d7940: I/O error during system call, Connection reset by peer
at com.android.org.conscrypt.NativeCrypto.SSL_read(Native Method)
at com.android.org.conscrypt.SslWrapper.read(SslWrapper.java:391)
at com.android.org.conscrypt.ConscryptFileDescriptorSocket$SSLInputStream.read(ConscryptFileDescriptorSocket.java:567)
at okio.InputStreamSource.read(Okio.kt:102)
at okio.AsyncTimeout$source$1.read(AsyncTimeout.kt:159)
at okio.RealBufferedSource.indexOf(RealBufferedSource.kt:349)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.kt:222)
at okhttp3.internal.http1.Http1ExchangeCodec.readHeaderLine(Http1ExchangeCodec.kt:210)
at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.kt:181)
at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.kt:105)
at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.kt:82)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:37)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:87)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:82)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:87)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:84)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:71)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:87)
at com.someCompany.someApp.api.RetrofitFactory$logoutInterceptor$$inlined$invoke$1.intercept(Interceptor.kt:75)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:87)
at okhttp3.brotli.BrotliInterceptor.intercept(BrotliInterceptor.kt:39)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:87)
at okhttp3.logging.HttpLoggingInterceptor.intercept(HttpLoggingInterceptor.kt:215)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:87)
at com.someCompany.someApp.api.RetrofitFactory$headersInterceptor$$inlined$invoke$1.intercept(Interceptor.kt:75)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:112)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:87)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.kt:184)
at okhttp3.RealCall$AsyncCall.run(RealCall.kt:136)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
at java.lang.Thread.run(Thread.java:764)
What the server sees:
BadRequestError: request aborted
at IncomingMessage.onAborted (/app/node_modules/raw-body/index.js:231:10)
at emitNone (events.js:106:13)
at IncomingMessage.emit (events.js:208:7)
at abortIncoming (_http_server.js:445:9)
at socketOnClose (_http_server.js:438:3)
at emitOne (events.js:121:20)
at TLSSocket.emit (events.js:211:7)
at _handle.close (net.js:561:12)
at Socket.done (_tls_wrap.js:360:7)
at Object.onceWrapper (events.js:315:30)
at emitOne (events.js:116:13)
at Socket.emit (events.js:211:7)
at TCP._handle.close [as _onclose] (net.js:561:12)
RetrofitFactory:
object RetrofitFactory {
fun makeRetrofitService(context: Context, isAuthenticated: Boolean = false): RetrofitService {
val baseUrl: String = if (BuildConfig.DEBUG) {
"https://api.staging.someCompany.com/"
} else {
"https://api-api.someCompany.com/"
}
return Retrofit.Builder().baseUrl(baseUrl)
.addConverterFactory(GsonConverterFactory.create(makeGson()))
.client(makeClient(context, isAuthenticated))
.build().create(RetrofitService::class.java)
}
private fun makeGson(): Gson {
return GsonBuilder().excludeFieldsWithModifiers(Modifier.TRANSIENT).create()
}
fun makeClient(context: Context, isAuthenticated: Boolean): OkHttpClient {
val hostnameVerifier = HostnameVerifier { hostname, _ ->
HttpsURLConnection.getDefaultHostnameVerifier().run {
if (BuildConfig.DEBUG) {
hostname == "relay-api.staging.someCompany.com"
} else {
hostname == "relay-api.someCompany.com"
}
}
}
val sslAndMgr = if (isAuthenticated) {
clientAuthSslContext(context)
} else {
basicSslContext(context)
}
return OkHttpClient.Builder()
.connectTimeout(15, TimeUnit.SECONDS)
.readTimeout(15, TimeUnit.SECONDS)
.sslSocketFactory(sslAndMgr.first, sslAndMgr.second)
.addInterceptor(headersInterceptor()).addInterceptor(loggingInterceptor())
.addInterceptor(BrotliInterceptor)
.addInterceptor(logoutInterceptor())
.hostnameVerifier(hostnameVerifier)
.retryOnConnectionFailure(true)
.build()
}
private fun loggingInterceptor() = HttpLoggingInterceptor().apply {
level =
if (BuildConfig.DEBUG) HttpLoggingInterceptor.Level.BODY else HttpLoggingInterceptor.Level.NONE
}
private fun logoutInterceptor() = Interceptor { chain ->
val mainResponse = chain.proceed(chain.request())
if (mainResponse.code == 401) {
Certificates(someApp().applicationContext).logout()
}
mainResponse
}
private fun headersInterceptor() = Interceptor { chain ->
chain.proceed(
(chain.request().newBuilder()
.addHeader("Accept", "application/json")
.addHeader("Accept-Language", "en")
.addHeader("Content-Type", "application/json").build()
)
)
}
private fun basicSslContext(context: Context): Pair<SSLSocketFactory, X509TrustManager> {
val certMgr = Certificates(context)
val keyStore = KeyStore.getInstance(KeyStore.getDefaultType())
keyStore.load(null, null)
keyStore.setCertificateEntry("serverCert", certMgr.loadServerChain())
val trustMgrFactory =
TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm())
trustMgrFactory.init(keyStore)
val sslContext = SSLContext.getInstance("TLS")
sslContext.init(null, trustMgrFactory.trustManagers, null)
return Pair(sslContext.socketFactory, trustMgrFactory.trustManagers[0] as X509TrustManager)
}
private fun clientAuthSslContext(context: Context): Pair<SSLSocketFactory, X509TrustManager> {
val certMgr = Certificates(context) //This is a helper that loads certificates/PKs as needed
val keyStore = KeyStore.getInstance(KeyStore.getDefaultType())
keyStore.load(null, null)
keyStore.setCertificateEntry("ca", certMgr.loadServerChain())
keyStore.setCertificateEntry("client", certMgr.loadClientCert())
keyStore.setKeyEntry(
"private",
certMgr.loadPrivateKey(),
null,
arrayOf(certMgr.loadClientCert(), certMgr.loadCA())
)
val kmf: KeyManagerFactory = KeyManagerFactory.getInstance("X509")
kmf.init(keyStore, null)
val trustMgrFactory =
TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm())
trustMgrFactory.init(keyStore)
val sslContext = SSLContext.getInstance("TLS")
sslContext.init(kmf.keyManagers, trustMgrFactory.trustManagers, null)
return Pair(object : DelegatingSSLSocketFactory(sslContext.socketFactory) { //DelegatingSSLSocketFactory from https://github.com/square/okhttp/blob/master/okhttp/src/test/java/okhttp3/DelegatingSSLSocketFactory.java
@Throws(
IOException::class
)
override fun configureSocket(sslSocket: SSLSocket): SSLSocket {
sslSocket.sslParameters.needClientAuth = true
sslSocket.needClientAuth = true
return super.configureSocket(sslSocket)
}
}, trustMgrFactory.trustManagers[0] as X509TrustManager)
}
}
Where we actually call the api:
val res = CoroutineScope(Dispatchers.IO).launch {
try {
//call Room DB and get list of items out
val body = BulkRelayRequest(items.toTypedArray())
val response = apiService.postBulkRelayLogs(body)
if (response.isSuccessful) {
//update room DB
Log.i(TAG, "Posted ***** with ids ${response.body()!!.payload.ids}")
} else {
//failure :(
val eParams = Bundle()
firebaseAnalytics.logEvent("API_SEND_FAILED", eParams)
Log.e(TAG, "Error: failed to post ****** to API")
}
} catch(e: SSLException)
{
isDraining = true
apiService = RetrofitFactory.makeRetrofitService(context, true)
Log.e(TAG, "SSL Error occurred. Draining connections and restarting retrofit")
}
catch (e: Throwable) {
val eParams = Bundle()
firebaseAnalytics.logEvent("API_SEND_FAILED", eParams)
Log.e(TAG, "Error: failed to post *** to API and threw an exception", e)
}
finally {
connectionsActive--
}
}
res.ensureActive()
Can you reproduce on other devices?
Yes, on other Samsung 8.1 devices. We were starting to test on other versions, but we developed the workaround, so we concluded our testing.
It's unclear what action we should take on this. Can you provide an executable test case to reproduce?
Square maintains this library, part of maintaining is fixing bugs. Your more familiar with the library and would be able to get to the bottom of this much faster. If it was something obvious, sure, but I have no idea what the root cause of this exception is.
Me neither to be honest.
Does the problem occur when you use HttpsURLConnection to talk to the same server?
@atotalnoob Your setup is incredibly custom, we won't be able to reproduce from this without being able to run your client code against your server. Generally I can guess 3 classes of problems
1) Your client code e.g. something broken with your cert selection logic or similar.
2) OkHttp - client auth and SSL Session management.
3) Bug in Android SSL etc.
To test 3 you could try conscrypt-android e.g. https://github.com/square/okhttp/pull/5473/files This is only if later conscrypt has fixed it.
To confirm 2 we need an isolated reproduction against MockWebserver.
To confirm 1 you are on your own at the moment. Add more debugging, tell us what would help etc. Maybe pull out more bits to reuse like the TrustManager, SSLContext and tell us at which point exactly the workaround stops working.
Lastly extract this code and run it on the JVM and use the println style SSL tracing they provide.
I will work on 2 and 3 and report back.
No action to take on this for now. Will reopen once we have new information.
I have reproduced the same issue.

The OkHttpClient, from okhttp3 package, when the property .retryOnConnectionFailure, (true/false) is called, sporadically gives the error related to sslException.
@atotalnoob, you should try to remove this property from your OkHttpClient.Builder, and see if the problem dissapear
Regards
@swankjesse - you have new information. Please reopen.
@maxpinto - Thanks for this, currently testing it on devices
Can you provide an executable test case to reproduce this? We use the device鈥檚 built-in SSL stack and it looks like this device has an SSL bug.
Guys, I have the same issue. I can reproduce it on Samsung EVERY time in my wifi network ( just wait 15-20 min). And sometimes on another devices when they have simple mobile connection.
Build info:
build.manufacturer: samsung
build.model: SM-A705FN
build.cpu_abi: arm64-v8a
okHttpVersion = '3.12.1'
retrofitVersion = '2.5.0'
I don't have retryOnConnectionFailure just only retry operator in rx chain.
@NooAn upgrade to 4.4.0?
@swankjesse of course I did it ;) I did it again and get the same state.
I suppose problem is doze mode. Because when I have USB connection there is not reproduce any more. Or when I switch off battery saver on Samsung. Or when I got this bug and then I connect my phone to computer via USB and network connection was restored. Also, some difference between WiFi and mobile networks.
So, when the app went from background I always get java.net.SocketTimeoutException: timeout except few moments above.
Problem still exists for us. Even with the suggested fix
We also get it on non-samsung devices.
We also get the issue in the foreground, not when the device goes into doze
I'm having the same experience described above. It's always either SocketTimeoutException or SSLException, and retryOnConnectionFailure(true) doesn't appear to change anything. If I catch the exception and "manually" execute the operation again, it will sometimes succeed after the first try has failed.
This likely has to do with Android's Doze mode. If the app has battery optimization disabled, the issue almost never happens but it's challenging to inform users to do so.
Most helpful comment
I will work on 2 and 3 and report back.