Kotlinx.coroutines: Coroutines Android Slow Launch Times

Created on 19 Dec 2019  路  14Comments  路  Source: Kotlin/kotlinx.coroutines

Every Android developer knows that the time spent in Application.onCreate() is very critical.
Therefore, I tried to speedup my app startup by offloading some initialisation work to a coroutine.
However, my profiling results were surprising.

First Try: Dispatchers.IO

val start = System.currentTimeMillis()
GlobalScope.launch(Dispatchers.IO) {
    doWork()
}
val time = System.currentTimeMillis() - start
Timber.d("Launch time: $time")

I consistently saw times between 20-30 milliseconds.

Second Try: Dispatchers.Default

Same as the code above, but with Dispatchers.Default instead of Dispatchers.IO.
This also took more than 20 milliseconds.

Third Try: Multiple Coroutine Launches

For the sake of profiling, I measured the launch time of multiple coroutines:

val start = System.currentTimeMillis()
for (counter in 1..5) {
    GlobalScope.launch(Dispatchers.IO) {
        Timber.d("Hello world from coroutine %s", counter)
    }
}
val time = System.currentTimeMillis() - start
Timber.d("Launch time: $time")

The average launch time is a little bit better, but it is still not so good.
It took more than 50 milliseconds to launch those five coroutines.

Fallback to AsyncTask

I ended up using a classical thread instead of a coroutine. The thread is abstracted by android.os.AsyncTask:

class PrimitiveAsyncTask(val runnable: () -> Unit) : AsyncTask<Void, Void, Void>() {
    override fun doInBackground(vararg params: Void?): Void? {
        runnable()
        return null
    }
}

fun doAsync(runnable: () -> Unit) {
    PrimitiveAsyncTask(runnable).execute()
}
doAsync {
    doWork()
}

As you would expect, this piece of code took less than 1 millisecond to launch the AsyncTask.

My Setup

I use the following libraries:
org.jetbrains.kotlinx:kotlinx-coroutines-core
org.jetbrains.kotlinx:kotlinx-coroutines-android

As outlined below, I tested this with the versions 1.3.3 and 1.1.1.

waiting for clarification

Most helpful comment

I was able to implement a very simple workaround: Launch a dummy coroutine via an AsyncTask from Application.onCreate(), in order to initialize the coroutines library ahead of time.

class MyApplication : Application() {

    override fun onCreate() {
        super.onCreate()
        preloadCoroutinesLibrary()
    }

    private fun preloadCoroutinesLibrary() {
        PrimitiveAsyncTask.doAsync {
            // Preload coroutine library asynchronously to avoid slow launches afterwards.
            GlobalScope.launch(Dispatchers.IO) {} // Dummy coroutine.
            GlobalScope.launch(Dispatchers.Default) {} // Dummy coroutine.
        }
    }
}
class PrimitiveAsyncTask(val runnable: () -> Unit) : AsyncTask<Void, Void, Void>() {

    companion object {
        fun doAsync(runnable: () -> Unit) {
            PrimitiveAsyncTask(runnable).execute()
        }
    }

    override fun doInBackground(vararg params: Void?): Void? {
        runnable()
        return null
    }
}

For me, this workaround is sufficient for the time being.
Nevertheless, it might be advisable to further optimize the initialization code of the coroutines library.

All 14 comments

There's many information about those ;)

For simplicity you should use coroutines 1.3.3 that will work with proguard and R8 without doing anything.

For the rest giving ms without hardware details and probably in debug mode does not mean a lot :(

Can you run your code under profiler and give us a more detailed picture on what is going on there, please? I do suspect that what you are seeing in your first try and others is that dispatcher in kotlinx.coroutines is creating threads lazily and the first time you use it it has to create at least two threads. In your third try, when you launch several coroutines in a loop, it has to create even more threads, since it does not know the nature of tasks you are submitting. Creating those threads could be a source of the latencies you see.

What we can do as an improvement is to offload creation of additional threads to the background thread, but we'll still have to create at least one thread on the first use.

I have now changed the setup of my experiments:
Instead of measuring time in the Application.onCreate() of my Android app, I have written instrumented Unit tests.
Although those Unit tests are still running on an Android device, the Unit tests do not launch any activity or service. As a result, the Unit tests have less noise and I expect a better reproducibility.

The results are not as bad as before, but still unsatisfying.
Here are the launch times:

Single coroutine Dispatchers.Main: 5.252 milliseconds
Single coroutine Dispatchers.Unconfined: 5.756 milliseconds
Single coroutine Dispatchers.Default: 7.066 milliseconds
Single coroutine Dispatchers.IO: 8.581 milliseconds

50 coroutines Dispatchers.Main: 5.801 milliseconds
50 coroutines Dispatchers.Unconfined: 32.751 milliseconds
50 coroutines Dispatchers.Default: 10.39 milliseconds
50 coroutines Dispatchers.IO: 51.269 milliseconds

Compare this with traditional threads:

Single AsyncTask: 0.125 milliseconds
50 AsyncTasks: 0.451 milliseconds

EDIT:
I produced these results with the tests in this hello world application:
https://github.com/fkirc/coroutines-performance-tests

I have now run the same Unit tests with Coroutines version 1.3.3.
However, I did not see any significant performance improvements compared to 1.1.1.
Hence, I believe that this is a recent issue that needs further investigation.

This line is especially puzzling:

Single coroutine Dispatchers.Unconfined: 5.756 milliseconds

It looks like it has nothing to do with the scheduler after all. Either this test setup has some flaw or it is an artefact of some Android initialization/loading implementation details.

It turned out that my app was doing way too much stuff during startup, which rendered the measurements unstable.
Therefore, I created this hello world application to get reproducible results:
https://github.com/fkirc/coroutines-performance-tests

Beside of Android tests, this repo also contains tests with plain JUnit.
Using this repo, I made the following core observations:

  • Pure JVM coroutine launches are as slow as Android coroutine launches, if not slower.
  • The extremely slow launches only occur in the JUnit test that is executed first. Changing the test execution order changes the results.

This indicates that the majority of the time is spent on the initialization of the coroutine library.
Subsequent coroutine launches are still slower than I would expect, but not >= 5 milliseconds.

Perhaps there is some performance problem in the coroutines initialization code.
An initialization time of 20 milliseconds might be fine for backend workloads, but this is not so good for Android.

I was able to implement a very simple workaround: Launch a dummy coroutine via an AsyncTask from Application.onCreate(), in order to initialize the coroutines library ahead of time.

class MyApplication : Application() {

    override fun onCreate() {
        super.onCreate()
        preloadCoroutinesLibrary()
    }

    private fun preloadCoroutinesLibrary() {
        PrimitiveAsyncTask.doAsync {
            // Preload coroutine library asynchronously to avoid slow launches afterwards.
            GlobalScope.launch(Dispatchers.IO) {} // Dummy coroutine.
            GlobalScope.launch(Dispatchers.Default) {} // Dummy coroutine.
        }
    }
}
class PrimitiveAsyncTask(val runnable: () -> Unit) : AsyncTask<Void, Void, Void>() {

    companion object {
        fun doAsync(runnable: () -> Unit) {
            PrimitiveAsyncTask(runnable).execute()
        }
    }

    override fun doInBackground(vararg params: Void?): Void? {
        runnable()
        return null
    }
}

For me, this workaround is sufficient for the time being.
Nevertheless, it might be advisable to further optimize the initialization code of the coroutines library.

Can you, please, retest with the latest version of coroutines? Did the startup time improve? Can you benchmark what kind of initialization code inside the coroutines library is taking the most time?

I have now updated https://github.com/fkirc/coroutines-performance-tests to coroutines-1.3.5 and repeated the experiments.

The most critical measurement is the first GlobalScope.launch(Dispatchers.IO).
All subsequent coroutine-launches are significantly faster than the first launch.
The results for the first GlobalScope.launch(Dispatchers.IO) are as follows:

  • ~ 30 milliseconds for PlainJVMCoroutineLaunchTests.
  • ~ 15 milliseconds for AndroidCoroutineLaunchTests.

Compared to coroutines-1.3.3, I cannot see a significant difference in coroutines-1.3.5.
As soon as I have time, I will fork the coroutines-library to benchmark the internals.

In the meantime, my above-listed recommendation remains:
I recommend to "preload" the coroutines-library with a simple AsyncTask if it is critical for the launch-performance of an Android-app.

Have the same issue, but only in debug mode (in just launch app mode - it's ok, about 7-8ms), simple task without coroutine will executed for 1ms for me, but with coroutine ... :) look at this: time calculated via meassureTimeMillis, the same action was fired 8 times, as you see below.

I/System.out: Loaded data 41ms
I/System.out: Loaded data 34ms
I/System.out: Loaded data 28ms
I/System.out: Loaded data 28ms
I/System.out: Loaded data 23ms
I/System.out: Loaded data 26ms
I/System.out: Loaded data 22ms
I/System.out: Loaded data 22ms

I also tried use @fkirc solution, but I see no difference.

I suggest reading the following articles ([1], [2]) to be familiar with other's people systematic approach to measuring the startup overhead.

I've also followed the author path and tried to analyze each contributor to startup time.

TL;DR

  • Accidentally measuring debug build is the most common pitfall in the analysis even when one is aware of it :)
  • On my low-end device (Xiaomi A3) coroutines are initialized in average ~6 ms, consecutive launches have amortized costs of 0.3-0.5 ms
  • Similar RxJava code (flowable + subscribeOn + subscribe) also initializes in roughly 6 ms
  • Coroutines initialization is pretty self-contained and most of the time is spent in class loading (which is expected for any large enough pile of classes). We can win around 10% by cutting out Random usages (to avoid calling PlatformImplementationsKt), but it ain't much. The only possible major improvement is to throw away the benefits of our optimized CoroutineScheduler and use preloaded by Zygote ExecutorService instead, but this is a questionable approach that has to be measured separatelyy.

I'm closing this as "Not a problem", but if anyone is believing that the current state of initialization is a serious issue, feel free to create a new GitHub issue with a systematic comparison of the problem and/or analysis of potential solutions.

[1] https://medium.com/specto/android-startup-tip-dont-use-kotlin-coroutines-a7b3f7176fe5
[2] https://medium.com/specto/dont-run-benchmarks-on-a-debuggable-android-app-like-i-did-34d95331cabb

Thank you for the writeup.
I also agree that this issue is not a catastrophic problem, but rather a target for future optimizations.

Nevertheless, I don't like to distinguish between debug and production builds.
We are all developers, and we need debug builds to troubleshoot problems.
If debug builds are "not accurate", then I consider this as a serious platform problem instead of a "gotcha to remember". But anyways, this is probably outside the scope of this issue.

Could you please elaborate on why initialization time difference is a real problem for the development?

Semantically, the debug build behaves exactly like a release one. And in any language and toolchain, it is an expected property of debug builds being (often significantly) slower than release builds (or builds with -O3 etc.).

And optimizing code that is running in debug builds is a real black hole: it requires a lot of time, it duplicates compiler optimizations, it doesn't actually affect end users of Android applications in any way and, what's more important, it consumes time that otherwise could be spent on new features, stabilization of old features, improving the performance of release builds and so on.

Well, it is not a particularly large problem in this case, but rather a general "platform goal".
The less differences between debug and production, the easier it becomes to develop and debug.
But anyways, I think that this is not in the scope of this particular issue.

Was this page helpful?
0 / 5 - 0 ratings