Ktor: Sporadic OkHttp errors after upgrading to ktor 1.3.1

Created on 9 Mar 2020  Â·  7Comments  Â·  Source: ktorio/ktor

Ktor Version and Engine Used (client or server and name)

implementation("io.ktor:ktor-client-core-jvm:1.3.1")
implementation("io.ktor:ktor-client-core:1.3.1")
implementation("io.ktor:ktor-client-jackson:1.3.1")
implementation("io.ktor:ktor-client-logging-jvm:1.3.1")
implementation("io.ktor:ktor-client-okhttp:1.3.1")
implementation("io.ktor:ktor-jackson:1.3.1")
implementation("io.ktor:ktor-metrics-micrometer:1.3.1")
implementation("io.ktor:ktor-metrics:1.3.1")
implementation("io.ktor:ktor-server-host-common:1.3.1")
implementation("io.ktor:ktor-server-netty:1.3.1")

Describe the bug

We've started seeing sporadic OkHttp exceptions in our tests when we upgraded from 1.3.0 to 1.3.1. Upon downgrading back to 1.3.0 (with no other changes) the tests were fine again.

The exception is:

java.io.EOFException: \n not found: limit=0 content=…
    at o.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:240)
    at o.i.h.Http1ExchangeCodec.readHeaderLine(Http1ExchangeCodec.java:242)
    at o.i.h.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.java:213)
    ... 20 common frames omitted
Wrapped by: java.io.IOException: unexpected end of stream on http://called-service/...
    at o.i.h.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.java:236)
    at o.i.c.Exchange.readResponseHeaders(Exchange.java:115)
    at o.i.h.CallServerInterceptor.intercept(CallServerInterceptor.java:94)
    at o.i.h.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
    at o.i.c.ConnectInterceptor.intercept(ConnectInterceptor.java:43)
    at o.i.h.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
    at o.i.h.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
    at o.i.c.CacheInterceptor.intercept(CacheInterceptor.java:94)
    at o.i.h.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
    at o.i.h.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
    at o.i.h.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
    at o.i.h.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
    at o.i.h.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
    at o.i.h.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
    at o.i.h.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:221)
    at o.RealCall$AsyncCall.execute(RealCall.java:172)
    at o.i.NamedRunnable.run(NamedRunnable.java:32)
    at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(Thread.java:834)

It looks like the HTTP client receives an EOF while reading the response headers from calling http://called-service/... I don't think the culprit is the remote web server though, because this server is called from other services and it's only the one with ktor 1.3.1 that throws these errors. And this one is also fine with ktor 1.3.0.

This issue has already been raised before in the OkHttp repository, for example here.

From what I can see, both ktor 1.3.0 and 1.3.1 both depend on com.squareup.okhttp3:okhttp:3.14.2, so maybe this is due to a change in the way ktor uses OkHttp?

To Reproduce

Not easy to reproduce, unfortunately, because the error only occurs in a small percentage of requests (~5%). I'm hoping someone else observes this issue and can come up with a reproducible test.

Client bug

Most helpful comment

I've also got this issue.

  • Ktor 1.3.2
  • OkHttp 4.4.1
  • Ktor config is as simple as HttpClient(OkHttp).

The issues does not occur if I send a Connection: close header with each request.
Given that it's likely an issue related to connection reuse.

I've managed to reproduce it on Android as well as on a regular JVM.
With the following code, the second request fails 100% for me:

HttpClient(OkHttp).use { httpClient ->
    coroutineScope {
        repeat(10) {
            try {
                val data = httpClient.request<ByteArray>("https://***hidden***")
                println("Received ${data.size} bytes.")
            } catch (e: Throwable) {
                e.printStackTrace()
            }

            delay(5_000)
        }
    }
}

Unfortunately I cannot share the URL (publicly).

If the delay between requests is >= 5 seconds and < 5 minutes, then every other request will fail.

  • 5 minutes is the default maximum keep-alive time of OkHttpClient.
  • 5 seconds is probably some server-internal keep-alive limit.

My guess is that the request fails with that EOFException if a connection is still in the connection pool for reuse but once it's actually used the server closes it because it has timed out server-side. Due to that only every other request is affected.

What's confusing is that I don't have that issue if I use OkHttpClient directly. The same test (request + 5 second waiting + request) works perfectly fine there, so the problem is limited to Ktor for me.

Update

Ktor automatically sets OkHttpClient's retryOnConnectionFailure to false.
As per its documentation, this is one of the cases that is handled by the retry logic:

Stale pooled connections. The ConnectionPool reuses sockets to decrease request latency, but these connections will occasionally time out.

When I set retryOnConnectionFailure to false in an OkHttpClient I can reproduce the issue there even without Ktor!

So the workaround is pretty simple:

HttpClient(OkHttp) {
    engine {
        config {
            retryOnConnectionFailure(true)
        }
    }
}

All 7 comments

Do you have a preconfigured okhttp client?

Is the "called-service" running under ktor as well? Is it upgraded as well?

The called service is running a Wiremock server.

We don't use a "preconfigured" OkHttp client, but we do some heavy customisation on the HttpClient. I'm going to test with some simplified settings and will let you know.

Update:

I can confirm the errors still occur when HttpClient is simply configured with:

HttpClient(OkHttp) {
    install(JsonFeature) {
        serializer = JacksonSerializer(jacksonObjectMapper(), ::jacksonConfiguration)
    }
}

I just got this error on this app with 1.3.2 using OkHttp. It's seemingly random. Maybe once every 50 requests.

java.io.IOException: unexpected end of stream on https://api.github.com/...
        at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.java:236)
        at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.java:115)
        at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:94)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:43)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:221)
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:172)
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
        at java.lang.Thread.run(Thread.java:923)
     Caused by: java.io.EOFException: \n not found: limit=0 content=…
        at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:240)
        at okhttp3.internal.http1.Http1ExchangeCodec.readHeaderLine(Http1ExchangeCodec.java:242)
        at okhttp3.internal.http1.Http1ExchangeCodec.readResponseHeaders(Http1ExchangeCodec.java:213)
        at okhttp3.internal.connection.Exchange.readResponseHeaders(Exchange.java:115) 
        at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:94) 
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) 
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:43) 
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) 
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) 
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) 
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) 
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) 
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) 
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) 
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) 
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) 
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) 
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:221) 
        at okhttp3.RealCall$AsyncCall.execute(RealCall.java:172) 
        at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) 
        at java.lang.Thread.run(Thread.java:923) 

I've also got this issue.

  • Ktor 1.3.2
  • OkHttp 4.4.1
  • Ktor config is as simple as HttpClient(OkHttp).

The issues does not occur if I send a Connection: close header with each request.
Given that it's likely an issue related to connection reuse.

I've managed to reproduce it on Android as well as on a regular JVM.
With the following code, the second request fails 100% for me:

HttpClient(OkHttp).use { httpClient ->
    coroutineScope {
        repeat(10) {
            try {
                val data = httpClient.request<ByteArray>("https://***hidden***")
                println("Received ${data.size} bytes.")
            } catch (e: Throwable) {
                e.printStackTrace()
            }

            delay(5_000)
        }
    }
}

Unfortunately I cannot share the URL (publicly).

If the delay between requests is >= 5 seconds and < 5 minutes, then every other request will fail.

  • 5 minutes is the default maximum keep-alive time of OkHttpClient.
  • 5 seconds is probably some server-internal keep-alive limit.

My guess is that the request fails with that EOFException if a connection is still in the connection pool for reuse but once it's actually used the server closes it because it has timed out server-side. Due to that only every other request is affected.

What's confusing is that I don't have that issue if I use OkHttpClient directly. The same test (request + 5 second waiting + request) works perfectly fine there, so the problem is limited to Ktor for me.

Update

Ktor automatically sets OkHttpClient's retryOnConnectionFailure to false.
As per its documentation, this is one of the cases that is handled by the retry logic:

Stale pooled connections. The ConnectionPool reuses sockets to decrease request latency, but these connections will occasionally time out.

When I set retryOnConnectionFailure to false in an OkHttpClient I can reproduce the issue there even without Ktor!

So the workaround is pretty simple:

HttpClient(OkHttp) {
    engine {
        config {
            retryOnConnectionFailure(true)
        }
    }
}

Please check the following ticket on YouTrack for follow-ups to this issue. GitHub issues will be closed in the coming weeks.

This issue still persist on ktor release 1.5.2.

A work-around can be a custom ConnectionPool like the following:

HttpClient(OkHttp) {
    engine {
        clientCacheSize = 0
        config {
            connectionPool(ConnectionPool(5, 10, TimeUnit.SECONDS))
        }
    }
}

The default should be ConnectionPool(5, 5, TimeUnit.MINUTES).

Was this page helpful?
0 / 5 - 0 ratings