ktor-client-apache: "Decoder shouldn't be completed while the coroutine is on suspension" exceptions

Created on 21 Apr 2020  路  14Comments  路  Source: ktorio/ktor

Ktor Version and Engine Used (client or server and name)
ktor client 1.3.2, Apache

Describe the bug
After upgrading from 1.2.6 to 1.3.2, requests in our JVM client app started sporadically failing with

Caused by: java.util.concurrent.CancellationException: Failed to execute request
    at kotlinx.coroutines.ExceptionsKt.CancellationException(Exceptions.kt:22)
    at io.ktor.client.engine.apache.ApacheResponseConsumerDispatching.failed(ApacheResponseConsumerDispatching.kt:131)
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.executionFailed(DefaultClientExchangeHandlerImpl.java:99)
    at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:426)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.exception(HttpAsyncRequestExecutor.java:155)
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:276)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:226)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
    ... 1 more
Caused by: java.lang.IllegalStateException: Decoder shouldn't be completed while the coroutine is on suspension
    at io.ktor.client.engine.apache.ApacheResponseConsumerDispatching.consumeContent(ApacheResponseConsumerDispatching.kt:104)
    at org.apache.http.impl.nio.client.MainClientExec.consumeContent(MainClientExec.java:329)
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.consumeContent(DefaultClientExchangeHandlerImpl.java:157)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:325)
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
    ... 8 more

Perhaps https://github.com/ktorio/ktor/commit/f9eab658d3fc6a788008b1fb53ea87804a8ca280 is relevant?

To Reproduce
Run many ktor-client requests in production.

Expected behavior
No internal errors.

bug

Most helpful comment

Hi @e5l & co: we have been seeing this issue quite frequently but it was sporadic and evasive as mentioned in @glasser 鈥榮 initial report. After plenty of investigation we finally managed to find a stable way to reproduce the issue. It seems to be caused by specific response lengths (e.g. of 8175+1, 12263+1 or 16351+1 bytes).

On my Mac/JDK 8 setup I can reproduce the issue with this call:

HttpClient(Apache).request<String>("https://raw.githubusercontent.com/ssp/ktor/4f1986df68e3594714ea12949c8af8274be99d01/ktor-client/ktor-client-apache/jvm/test/io/ktor/client/engine/apache/8175.txt")

provided in this branch:

https://github.com/ssp/ktor/tree/8175-decoder

Unfortunately I couldn鈥檛 figure out the root cause yet as the issue seems to disappear when I start using breakpoints and I am not familiar with the inner workings of the client.

All 14 comments

Hi @e5l & co: we have been seeing this issue quite frequently but it was sporadic and evasive as mentioned in @glasser 鈥榮 initial report. After plenty of investigation we finally managed to find a stable way to reproduce the issue. It seems to be caused by specific response lengths (e.g. of 8175+1, 12263+1 or 16351+1 bytes).

On my Mac/JDK 8 setup I can reproduce the issue with this call:

HttpClient(Apache).request<String>("https://raw.githubusercontent.com/ssp/ktor/4f1986df68e3594714ea12949c8af8274be99d01/ktor-client/ktor-client-apache/jvm/test/io/ktor/client/engine/apache/8175.txt")

provided in this branch:

https://github.com/ssp/ktor/tree/8175-decoder

Unfortunately I couldn鈥檛 figure out the root cause yet as the issue seems to disappear when I start using breakpoints and I am not familiar with the inner workings of the client.

Ah, multiples of 4088? Does 4088 have the same problem?

(Grepping the repo for 4088 finds that it is a magic number used in some of the ktor-io byte channel stuff...)

@glasser no it didn't seem to happen for 4088, 8176 was the first occurrence my test found

Hi @ssp, thanks for the investigation!

I'm getting a similar error with a system that makes a 20,000 queries it happens every time. If I reduce that number to 10,000 it happens less often (but not by a factor 0.5)

@e5l any news about this? Unfortunately I didn鈥檛 understand the low-level handling of the data in the code, so I am not sure I can offer help here. As I keep seeing this error frequently with the ktor clients I use: is there any way to work around it?

The issue is fixed in master and will be available in the next release!
Closed

thanks @e5l, looking forward to that release :D

Any idea on when the next release will be? I don't know how to work around this one.

@e5l Please don't feel the need to reply if you're too busy. My team is currently dealing with this issue in one of our production systems, and if you could point me to the commit(s) which address the issue, that would be most helpful.

Hey, we're going to release the fix on 17 Aug.

We are also running into this issue. I'd just like to question why the release of the fix will take until 17th Aug, if the patch is already implemented. Not wanting to sound rude or anything - I also don't know the development process behind ktor - but if the goal is that ktor shall be used in enterprise-critical production systems - which I assume it is - such patches should be made available as hotfix as fast as possible.

@weickmanna Fully agree and understand, and part of our efforts with Ktor is precisely this. We'll be sharing our roadmap very soon. In regard to this actual hotfix, the issue is that we've got a bunch of open things and it's kind of difficult right now to do a release, but part of our plan is to actually improve this and streamline the process to make sure it's faster and more frequent. I'm sorry for the problems cause, but bear with us for a little bit. Thanks.

Was this page helpful?
0 / 5 - 0 ratings