Kotlinx.coroutines: withContext from inside a cancelled coroutine should not be executed

Created on 31 Jan 2019  路  13Comments  路  Source: Kotlin/kotlinx.coroutines

Consider the following example:

runBlocking {
    println("before withContext")
    withContext(EmptyCoroutineContext) { println("withContext 1") }
    println("cancelling")
    coroutineContext.cancel()
    withContext(EmptyCoroutineContext) { println("withContext 2") }
    withContext(EmptyCoroutineContext) { println("withContext 3") }
    println("after withContext")
}

Currently this outputs the following:

before withContext
withContext 1
cancelling
withContext 2

kotlinx.coroutines.JobCancellationException: Job was cancelled

That is, the second withContext is executed despite the fact the parent coroutine has already been cancelled by the time the child is created.

This behavior seems a bit counterintuitive, and it is also inconsistent with how a similar example using a Default coroutine dispatcher works:

runBlocking {
    println("before withContext")
    withContext(Dispatchers.Default) { println("withContext 1") }
    println("cancelling")
    coroutineContext.cancel()
    withContext(Dispatchers.Default) { println("withContext 2") }
    withContext(Dispatchers.Default) { println("withContext 3") }
    println("after withContext")
}

Output:

before withContext
withContext 1
cancelling

kotlinx.coroutines.JobCancellationException: Job was cancelled

Note that async { ... }.await() works fine with either of the dispatchers:

runBlocking {
    println("before async.await")
    async(EmptyCoroutineContext) { println("async.await 1") }.await()
    println("cancelling")
    coroutineContext.cancel()
    async(EmptyCoroutineContext) { println("async.await 2") }.await()
    async(EmptyCoroutineContext) { println("async.await 3") }.await()
    println("after async.await")
}

Output:

before async.await
async.await 1
cancelling

kotlinx.coroutines.JobCancellationException: Job was cancelled

My suggestion is to make the behavior of child coroutines created using different builders consistent so that a child doesn't start executing if a parent has already been cancelled by the time the child is created.

BTW IDEA has an quick fix suggesting to replace async { ... }.await() with withContext, which now happens to change the semantics w.r.t. cancellation when a child uses the same dispatcher.

breaking change enhancement

Most helpful comment

@elizarov I see the problem that NonCancellable in launch/async not only tells us about cancellation, but also implicitly detaches the coroutine scope from parent. As I understand your posts about structured concurrency, the key point of scopes is "parent scope will be completed only after all child scopes". And it is true with coroutineScope { launch { ... } }. But it is not with coroutineScope { launch(NonCancellable) { ... } }. Although child coroutine still running, parent will be completed instantly. I think this is very unintuitive behaviour.

All 13 comments

This happens so by design. The reason for such behavior is to enable withContext(NonCancellable) { ... } to execution suspending code inside the coroutine despise its cancellation. Do this explanation help?

@elizarov Couldn't a special case be added for NonCancellable instead? Maybe it could also work by checking if resulting context (from current + passed argument) is active?

@elizarov thanks, the explanation makes sense. Although I would expect the NonCancellable thing to be a special-case check, just as @LouisCAD suggests, so that the general behavior is consistent across the contexts/builders.

@elizarov If withContext should call always, then why third withContext(EmptyCoroutineContext) {...} did not executed? Moreover this is very unexpectedly that async(ctx) { body }.await() and withContext(ctx) { body } has different behaviour. For example, I did spent 30 minutes just now to find a bug. The reason was different behaviour between async(NonCancellable) { body } and withContext(NonCancellable) { async { body } }. I don't understand why first hangs as expected, but the next one just ignoring NonCancellable context and completes instantly (same with launch instead of async).

fun main() = runBlocking<Unit> {
    launch {
        withContext(NonCancellable) {
            async {
                delay(10000000000)
            }
        }
        coroutineContext.cancel()
    }
}
fun main() = runBlocking<Unit> {
    launch {
        async(NonCancellable) {
            delay(10000000000)
        }
        coroutineContext.cancel()
    }
}

Unfortunately, the ship might have sailed for this particular decision, but I do see the problem it causes.

The way it was originally designed is that withContext(NonCancellable) { ... } is a synonym for withContext(Job()) { ... }, for you can use any of them to get a context that is detached for the parent's cancellation. It is also consistent with how you can use async and launch in that you can do either launch(Job()) { ... } or launch(NonCancellable) { ... } to launch a new coroutine in a scope, yet detach it from parent.

In the experimental version we also had an optional start parameters to withContext, so that you can use for both cancellable and non-cancellable start at will. However, the devil is in defaults. Just adding an option does not protect users from making mistakes.

The reason your third block is not executed, is because withContext checks on cancellation on completion of the block, before returning.

I do not have easy solution out of this, but to provide a separate withContext { ... } builder somehow aptly named with a different behavior that checks for cancellation before running the block. Maybe we could, indeed, tweak the behavior of existing withContext, so that it checks the cancellation of the _new_ context before it executes the code. We'll have to study the impact of this change.

@elizarov Maybe this could be the behavior of the new invoke operator currently in PR #980?

I've investigated what happens if we change this behavior and check for cancellation on the entrance to withContext. No test that we have had failed. I interpret this as, although this is a breaking change, it should have a minor impact and would do it mostly for good. withContext is already "cancellable" (on the return path when switching dispatcher), so making it more cancellable will just make it more explicit, actually helping people to avoid subtle bugs they might have. So I'm adding PR with a change.

While we talk about the cancelation semantics of withContext, an issue came up on slack yesterday:
https://kotlinlang.slack.com/archives/C1CFAFJSK/p1550766435021300

The issue is, that canceling a coroutine at the end of a withContext block leaves the returned object in an unowned state. This makes failure to release a ressource in the following code 1) surprising and 2) implicit:

import kotlinx.coroutines.*

class Ressource {

    init {
        println("Initializing ressource")
        Thread.sleep(1000)
        println("Slow initialization done")
    }

    fun release() {
        println("Released")
    }

}

suspend fun initWithContext() {
    var ressource: Ressource? = null
    try {
        ressource = withContext(Dispatchers.Default) {
            Ressource()
        }
    } finally {
        ressource?.run {
            release()
        } ?: println("Ressource not released")
    }
}

fun main() = runBlocking {
    val initJob = launch {
        initWithContext()
    }
    yield()
    initJob.cancel()
}

Which outputs:

Initializing ressource
Slow initialization done
Ressource not released

@uluckas Having this YouTrack issue resolved would allow to avoid the resource leak demonstrated above the following way:

import kotlinx.coroutines.*

class Resource {

    init {
        println("Initializing resource")
        Thread.sleep(1000)
        println("Slow initialization done")
    }

    fun release() {
        println("Released")
    }

}

suspend fun initWithContext() {
    val resource: Resource
    try {
        withContext(Dispatchers.Default) {
            resource = Resource()
        }
    } finally {
        resource.release()
    }
}

fun main() = runBlocking {
    val initJob = launch {
        initWithContext()
    }
    yield()
    initJob.cancel()
}

~In fact, you can already resolve this issue by changing back the local resource to a nullable var.~
Assigning a resource after another suspension point will always be risky.

EDIT: My bad, even with contracts for suspend functions, this needs to be a var because the variable may have not been initialized the compiler will say.

@uluckas withContext was supposed to be cancellable by design and this issue goes into opposite direction -- make it more cancellable. I've also added a better documentation as a part of this PR. See: https://github.com/Kotlin/kotlinx.coroutines/blob/e16efed51867d18cc4a2c5a416e9bf8ba2c77c49/kotlinx-coroutines-core/common/src/Builders.common.kt#L121

This is also related to a bunch of other problems (see #279 and #845).

@elizarov I see the problem that NonCancellable in launch/async not only tells us about cancellation, but also implicitly detaches the coroutine scope from parent. As I understand your posts about structured concurrency, the key point of scopes is "parent scope will be completed only after all child scopes". And it is true with coroutineScope { launch { ... } }. But it is not with coroutineScope { launch(NonCancellable) { ... } }. Although child coroutine still running, parent will be completed instantly. I think this is very unintuitive behaviour.

@Dougrinch this is indeed the case. Let's move discussion on overriding Job in the context into #1001

@uluckas withContext was supposed to be cancellable by design and this issue goes into opposite direction -- make it more cancellable. I've also added a better documentation as a part of this PR. See:

kotlinx.coroutines/kotlinx-coroutines-core/common/src/Builders.common.kt

Thanks @elizarov for the update

Was this page helpful?
0 / 5 - 0 ratings