Ktor: Opening a client websocket session freezes after many failed attempts

Created on 13 Feb 2019  路  11Comments  路  Source: ktorio/ktor

Ktor Version

1.1.2

Ktor Engine Used(client or server and name)

CIO client

JVM Version, Operating System and Relevant Context

jvm 1.8, macOs

Feedback

For my android app, I am trying to make the websocket client reconnect after failure, in case the user has no/bad internet connection. So if calling HttpClient.wss or HttpClient.ws fails I catch the error, delay for a short time and try again... But, the problem is after a certain amount of reconnect attempts (usually between 5 and 30 attempts) calling HttpClient.wss or HttpClient.ws causes the program to stall (no errors are thrown).

I think it is related to ktor improperly closing the websocket session after calling HttpClient.wss or HttpClient.ws fails. I've think I have simplified the issue as much as I can.
For example by forcing an error by using a bad websocket host address:

fun main() = runBlocking {
    for (i in 0 .. 100) {
        GlobalScope.launch(Dispatchers.Default) {
            connect(i)
        }.join()
    }
    return@runBlocking
}

suspend fun connect(i: Int){
    val client = HttpClient(CIO).config {
        install(WebSockets)
    }
    try {
        client.wss(host = "test.badaddress.org") {
            send(Frame.Text("Hello World"))

            for (message in incoming.map { it as? Frame.Text }.filterNotNull()) {
                println(message.readText())
                break
            }
        }
    } catch (t: Throwable){
    }
    println(i)
    client.close()
}

produces

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

but if you comment out: client.close() in the above program the full program runs producing numbers 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

Additionally, modifying the above program to use a proper websocket host address... like replacing host = "test.badaddress.org" with host = "echo.websocket.org" the program runs as expected producing

Hello World
0
Hello World
1
Hello World
2
Hello World
...
97
Hello World
98
Hello World
99
Hello World
100
bug

All 11 comments

Hi @luca992, thanks for the report.
The issue is fixed in master https://github.com/ktorio/ktor/commit/2b56035c611100f62fbc9374ea3c2610f5b67286 :)

@e5l hey I should have mentioned I saw that commit and I tried running the above program using ktor built from master. I still get the same result.

Thanks for the notice, I'll try to investigate.

@e5l
I don't know it is the same issue or not (maybe I should write to Coroutines creators). Please, take a look.
This code:

while(true) {
        Thread.sleep(1000)
        CoroutineScope(Dispatchers.Default).launch {
            println("started")
            delay(100)
            println("delay ended")
            HttpClient(CIO).close()
            println("created and closed")
        }
    }
    thread {
        Thread.sleep(10000000)
    }

prints:

started
delay ended
created and closed
started
delay ended
created and closed
started
delay ended
created and closed
started
delay ended
created and closed

And that's it. Then nothing will be printed. Just four iterations and application hangs (number of CPU threads == 4). If you will remove .close() then evething will be fine (except memory leak from engine) and will work until death. Is it related to Ktor or to Coroutines?

@avently I also encountered that issue.. I figured it was related.... And I was too lazy to submit a separate issue haha

Hi @luca992, it looks like the WebSocket freeze is caused by map:

  1. The map wait for WebSocket close
  2. The client closes the connection after exiting from the client block.

Could you try manually close the connection?

Hi @avently, it's the separate bug in kotlinx.coroutines(reported and would be fixed soon):

Reproducer wo ktor:

        while (true) {
            Thread.sleep(1000)
            CoroutineScope(Dispatchers.Default).launch {
                println("started")
                delay(100)
                println("delay ended")
                ExperimentalCoroutineDispatcher(4).close()
                println("created and closed")
            }
        }

Hi @luca992, it looks like the WebSocket freeze is caused by map:

  1. The map wait for WebSocket close
  2. The client closes the connection after exiting from the client block.

Could you try manually close the connection?

@e5l
Well, I do not think map is the issue. As that block of code isn't even called when using a bad address. host = "test.badaddress.org" ... To make sure I commented out

/*send(Frame.Text("Hello World"))
for (message in incoming.map { it as? Frame.Text }.filterNotNull()) {
    println(message.readText())
    break
}*/

and get the same result

And sorry, what do you mean by manually closing the connection? How would I do that?

You could call close() on the WebSocket session manually.

@e5l I would try that, but I don't think the WebSocket Session instance is exposed for me to access if an error like java.nio.channels.UnresolvedAddressException is thrown. The callback block block: suspend DefaultClientWebSocketSession.() -> Unit is never run. Or is is accessable elsewhere?

edit: Actually, I do not think calling close on the websocket session is even possible. An error like java.nio.channels.UnresolvedAddressException is thrown when block() is run in io.ktor.client.features.websocket.HttpClient.webSocketRawSession. When an error is thrown there it prevents ClientWebSocketSession from being created

edit 2:
After further debugging I believe the issue is not with websocket feature. Pretty sure it is the same coroutine issue @avently reported. https://github.com/Kotlin/kotlinx.coroutines/issues/990 looks like it has been fixed

Built latest dev coroutine branch from source and my sample program runs properly. Can confirm this issue was due to the coroutines issue. So I'll close this. I'm guessing it shouldn't be too long 'til a new version of coroutines is out with the fix.

Was this page helpful?
0 / 5 - 0 ratings