This is known error.
Some libraries have not done anything (https://github.com/google/gson/issues/924) other ones have implemented workaround (https://github.com/ReactiveX/RxJava/issues/3459).
Couple of stacktraces from my project:
com.mirrorai.app_issue_crash_5D5E4CFA0353000119B5FCFF34F93B11_DNE_0_v2.txt
com.mirrorai.app_issue_crash_5D61DBF600A40001134DD9BDD9BD8B29_DNE_0_v2.txt
I'm not convincing you to implement workaround like RxJava did but i think that at least it will be convenient to keep this issue to prevent questions from other users of coroutines library.
Proguard is turned off.
Coroutines version is 0.24.0.
Kotlin version is 1.2.60.
Let's discuss how we might approach working around this problem. We don't directly use Atomic*FieldUpdater classes. We rely on https://github.com/kotlin/kotlinx.atomicfu to "include" them via bytecode transformation. It is relatively straightforward for us to create a new transformation variant that changes atomic fields to a regular Atomic* classes. But how do we deliver the corresponding binaries? I see two options:
Atomic*) in the main code and VarHandle version in META-INF/version/9 for server-side folks running on JDK 9+. Pro: Everything works for everybody by default.
Con: Anyone running on JDK 8 or anyone targeting non-affected Android >5.0 will get slightly worse performance than they have now(sic!), because of additional dereference in Atomic* classes versus optimized Atomic*FieldUpdater code that we have now.
Atomic*FieldUpdater classes in the main code and VarHandle version in META-INF/version/9 and a _separate_ "Samsung-fiendly" jar file with a different _classifier_. Pro: By default if you depend on kotlinx-coroutines-core:<version> you get the best possible performance for your platform.
Con: If you target Android <= 5.0 with affected devices, you'll have to specify kotlinx-coroutines-core:<version>:samsung-workaround in your dependencies (might need a better name for a classifier).
What do you think?
Additional question: Anyone knows more details on the conditions under which Samsung devices crash with Atomic*FieldUpdater? What if we make those fields public, for example? Would it help? Are there any details on what exactly is broken on those Samsung Android 5.x devices?
It seems it happens on Samsung Android 5.0.x, not 5.x. If I'm not mistaken, the issue title should be edited accordingly.
Regardless, I think @digitalbuddha may have some additional information (or not) about this issue, he said he reached out to Samsung.
I was never able to get any info from Samsung as to why this issue occurred. Once rxjava applied their fix we never saw issue again
I strongly dislike the idea of making kotlinx.coroutines performance worse on all devices and JDK < 9 projects just because Samsung f*d up.
Android app bundles may, probably with Google's collaboration, support different compiled code for different android versions. If this can be done in some way, then apps published with app bundles may use a version free of problematic code for API 21 devices, and use the regular version for all other versions (API 20 and lower, and API 22 and up).
This would work with the second option @elizarov mentioned.
I know this is more complex than the RxJava approach, but it's less compromising.
Another solution is to compile a version of the app with the samsung workaround with a minSdk and a maxSdk of 21, and publish two other versions, one for maxSdk 19 (20 being Android Wear…), and one for minSdk 22. But here again, I think it should be automated using app bundles (which can probably be extended since they use protobuf internally), so we don't have to manage three different version codes.
Where I'm starting to wonder how a samsung api 21 workaround version would work though, is when it comes to libraries. I don't want to have to publish multiple versions of my android libraries that use kotlinx.coroutines because of Samsung, and I would even less want to do it on a non Android library. Would there be a way to avoid transitive dependencies hell when using the samsung workaround?
In the case 2, the kotlinx-coroutines-core:<version>:samsung-workaroundis the only viable option for an Android developer
I continue to vote for doing nothing. This is not a library's problem.
If someone wants to fix this, write an Android Gradle plugin transform that bytecode-rewrites these into the slower format such that you fix all libraries at once and can opt-in and out based on things like minSdkVersion, splits, and bundles.
@JakeWharton I think you're right. I just submitted an issue on Android's issue tracker for this fix to be integrated into Android Gradle plugin: https://issuetracker.google.com/issues/112526256
That's not quite what I said... AGP already has a pluggable bytecode transformation model so this doesn't need to (and likely won't) be implemented by Google. Anyone can do this who is willing. Samsung should do it, actually.
Google certified these devices, and Samsung didn't even respond to Mike when he was working for New York Times.
So "anyone" willing can do it (as long as you are fluent in bytcode among other things), yes.
However, I think at the moment, only Google can integrate such a solution targeting only certain devices with app bundles, so other devices (or other API levels at least) don't pay the performance hit.
Transforms have access to variant information so again I disagree and hope the Android Gradle team doesn't do it. Variants have the API information which is the best you'll get in terms of targeting. It has nothing to do with app bundles other than that makes it easier to deal with the API splits.
hope the Android Gradle team doesn't do it
Why so?
How would you make such a bytecode transform targeting only API 21 Samsung devices work in practice without app bundles? I mean, without the burden of having to setup the build manually to produce multiple apks.
And if using an app bundle to separate the dex files having the fix for Samsung API 21 and the ones not needing it, how would it work in practice? I know zero documentation that explains this, and I don't even know if current app bundle spec supports it.
It doesn't, which is why I said it wasn't related to bundles. If the variant has a minSdk lower than 23 you fix the bug. That's it. All of this can be done with variants and transforms already. the AGP team has far more impactful work to do. This would have already been fixed in AGP years ago if the problem was pervasive enough. It isn't, and I'm actually helping your case by _not_ fixing libraries which further proves the rarity of occurrence and justifies their ignorance of the problem.
It's minSdk lower than 22 actually, this issue affects only Android 5.0.x (API 21) Samsung devices. I added a comment to the issue on Android's issue tracker to request ability to split per API levels for bundle built apps, so anyone can make a gradle plugin fixing this, without making the performance worse for non affected API levels.
The problem is that it is hard (and in general impossible) to write a universal bytecode transformer that takes code using Atomic*FieldUpdater classes and rewrites it into Atomic* classes. However, it is not that hard for us as a part of atomicfu project, because we control what kind of code patterns are allowed around atomics, so we can define a new transformation variant and produce a separate version of the jar for those users of kotlinx.coroutines that need to target buggy Samsung 5.0.x devices. Doing nothing is not a great option for us, since we don't want to let our Android users down, so reading this feedback I'm inclining to go with option two.
We can even call the alternative classifier android and setup up dependencies in such a way that when you depend on kotlinx-coroutines-android:<version>, then it transitively brings kotlinx-coroutines-core:<version>:android with it.
Yes, yes, I know that Android did nothing wrong and it is all Samsung's fault, but I don't think that a bit of performance degradation due to replacement of Atomic*FieldUpdater with a separate Atomic* instances would be really noticeable in a typical Android coroutine usage scenarios.
We can even call the alternative classifier android and setup up dependencies in such a way that when you depend on
kotlinx-coroutines-android:<version>, then it transitively bringskotlinx-coroutines-core:<version>:androidwith it.
I have trouble understanding that part.
Do you mean that we'll be able to choose whether we use the version with the fix or not (depending on whether we support the devices in the build), and it'll automatically edit the transitive dependencies of libraries relying on kotlinx.coroutines?
@LouisCAD Editing dependencies of other libraries is tricky. You'll have to manually exclude the dependency this library has and include the dependency you want. That is the downside of "alternative classifier" version. Maybe we can automate this part by writing a small custom Gradle plugin that does this replacement or maybe we can use Gradle metadata feature to do it. We'll need to study it better.
IIRC, there's something called dependency resolution strategy in gradle
that can make this easy enough.
On Tue, Aug 14, 2018, 12:14 PM Roman Elizarov notifications@github.com
wrote:
@LouisCAD https://github.com/LouisCAD Editing dependencies of other
libraries is tricky. You'll have to manually exclude the dependency this
library has and include the dependency you want. That is the downside of
"alternative classifier" version. Maybe we can automate this part by
writing a small custom Gradle plugin that does this replacement or maybe we
can use Gradle metadata feature to do it. We'll need to study it better.—
You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub
https://github.com/Kotlin/kotlinx.coroutines/issues/490#issuecomment-412823498,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGpvBWZYwiQXLyf1GvRP956eWMnRaOkdks5uQqLtgaJpZM4V61LM
.
So, I just got someone trying CameraCoroutines which uses kotlinx.coroutines 0.24.0 on a Samsung Galaxy S5 (SM-G900F) running Android 5.0, and it didn't crash with the error from this report, so it seems it doesn't affect all Samsung devices running Android 5.0.x.
Therefore, we need additional info to know which device models have been seen affected by this bug. @asfdfdfd and @digitalbuddha, could you give affected device models?
Same story here. We've found an in house Samsung Android 5.0.x device and we had not managed to make it crash so far.
This was 3+ years ago. Only one I have reference to is SM-T805Y
I have no device that i could use to reproduce this crash but i've received it via Crashlytics from these devices:
Galaxy S4
Galaxy Note3
Galaxy S5
Galaxy Alpha
Galaxy Tab4 10.0
@asfdfdfd You said you received it on Galaxy S5 devices, but we couldn't reproduce on one running Android 5.0 (model SM-G900F).
Since Samsung makes multiple variants of their devices, could you share the exact model numbers?
Also, if you have any clue about which calls to kotlinx.coroutines cause the issue, please tell us.
So, I got someone else test the same CameraCoroutines project using kotlinx.coroutines 0.24.0 on a Samsung Galaxy S4 (GT-I9505), running Android 5.0.1, and there was no issue on it, so I'm starting to wonder if all of kotlinx.coroutines is affected, or if only some part of it is, or if it's that Samsung was not consistent in its bug distribution.
If anyone has a clue about what call leads to the failure (whether on the Atomic*FieldUpdater or kotlinx.coroutines library), please tell so a reproducer can be setup.
@LouisCAD
could you share the exact model numbers
Crashlytics has not any additional info about devices, but i was able to find something in Google Play Console:
Samsung Galaxy S4 (ks01ltelgt), Android 5.0
Samsung Galaxy S5 (klte), Android 5.0
Samsung Galaxy Note3 (ha3g), Android 5.0
Samsung Galaxy S4 (jflte), Android 5.0
if you have any clue about which calls to kotlinx.coroutines cause the issue
I have absolutely no idea. I've looked at all these places one more time and found nothing suspicious. One of crashes comes from something like:
class MyClass {
init {
launch { loadData() }
}
private fun loadData {
// Non-coroutine code here.
}
}
I do really think build variants are the way to go as JakeWharton said. There are other issues related to Samsung that I am already using build variants to fix. It's so common in Android to have per API variants, that it is in the docs.
It may be solved using some plugin to do transformation or maybe just multiple jars, excluding transitive dependency with Gradle and choosing one jar for each variant, but the thing is, Android already has tools to deal with it.
App bundles don't look like it has anything to do with transforming and compiling itself, also it's quite new and most people I know don't use it yet.
Also, anyone has any statistics on, probably, how many people is affected right now? Maybe, I hope, it isn't worth the effort anymore.
@asfdfdfd Could you reply to comment #3 here on Android's issue tracker?
Also, @elizarov, you can list the workarounds also requested in this comment on Android's issue tracker
@LouisCAD done
Okay, now i've got crash on Samsung Galaxy S5 with Android 6.0.1
@asfdfdfd Can you reproduce it in Samsung Remote Test lab? https://developer.samsung.com/remotetestlab/rtlDeviceList.action
Bumping this to see if there's still something in the work?
This is now my first crash sources it's pretty rare but got dozens of variations of crashes on those devices.
There is nothing in the work.
We don't mind to workaround this, if it is possible, but we need a stable repro and we failed to find a device which crashes with AFU.
Note that we need a reproducer in order to make workaround with AtomicReferenceFieldUpdater (e.g. by making target fields public), we are not going to get rid of A*FU
@qwwdfsad I can't reproduce and the issue is certainly a very rare one got about 50 crashes in 10 different stack traces on a week timeline. Can't give all numbers but I'm still at more than 99,99% crash free sessions.
I have a tons of Samsung users but unfortunately I do not have the way to have cross linked stats about phone model and OS version :( And most of those devices also runs 4.x and 5.1. But I'm pretty confident that this is not a 100% repro thing.
There's many different crashes in many different places. Don't know if gather all the crashes could help.
I still min SDK 16 as a large user base with old phones as remotes, so can't move to min 22, but I totally can have a special build for SDK 16 -> 21 that is slower via an external library, and a nominal one for API 22.
Don't know if can be related but see my comment here: https://github.com/Kotlin/kotlinx.coroutines/issues/830#issuecomment-441619668 despite using both workaround there was still some race, with hope the issues are somehow related.
@LouisCAD i was not able to reproduce this issue in Samsung Remote Test Lab. Probably i'll have to buy Samsung device. 🤔
If the issue didn't reproduce on a device from their test lab, then you
have no guarantee a device you'd buy would reproduce it.
On Tue, Dec 4, 2018, 7:46 PM asfdfdfd notifications@github.com wrote:
@LouisCAD https://github.com/LouisCAD i was not able to reproduce this
issue in Samsung Remote Test Lab. Probably i'll have to buy Samsung device.
🤔—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Kotlin/kotlinx.coroutines/issues/490#issuecomment-444211958,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGpvBVIhppil0jYXG7VBvreeCe6PWtmDks5u1sL5gaJpZM4V61LM
.
Yeah, but i have more strange Samsung bugs (not coroutines related) so it will be useful anyway.
@elizarov Is there something special about coroutine Mutex and atomicFu?
I have introduced a Mutex as a temporary solution and only use simples mutex.lock() mutex.unlock().
On some samsung devices Android 5 it seems that this does not trigger the normal coroutine random crash we know about but just wait for ever on the first lock. Is there a way to have the crash happen?
So after a few more hours in prod, Mutex don't work on some (or all hard to tell) Samsung Android 5 at 100%, not random like this issue, 100% reproductible on those devices (that unfortunately I don't own).
There is nothing special about Mutex and AFU; Mutex implementation uses AFU under the hood
We have the same issue on around 150/200 of our users on some Samsung devices in the latest week.
This decreases a bit our QoS.
The ratio between users and crash is 1:~1
We can't reproduce the problem locally and it happens when the first class which references AFU is instantiated (in our case a Channel).
We'd like to continue to use the coroutines features in our code but we'd like to find a solution to this problem.
Considering Samsung probably won't ever release a fix for these Atomic*FieldUpdater, have you planned to provide a separate dependency which replaces Atomic*FieldUpdater with Atomic*?
If not, do you think it's a viable solution to use DexClassLoader to load the Atomic*FieldUpdater classes written by us with the same methods if the Atomic*FieldUpdater classes are not found when the app starts?
All the calls to java.misc.Unsafe inside Atomic*FieldUpdater classes can be done through reflection.
@Fondesa We have not find time yet to implement Atomic*FieldUpdater -> Atomic* transformation in our atomicfu project: https://github.com/Kotlin/kotlinx.atomicfu
Has anyone found any workarounds or patches that I can apply?
Workaround - use Rx. Lol.
As a side note - it does not seem a 100% crash rate for affected users (seems to just happen once time per user), so you could just ignore the issue.
I confirm it happens only once per user.
Today i've noticed Samsung users with Android 7 that has java.lang.NoSuchMethodError
com.mirrorai.app_issue_crash_5D64FE1501DE000129234B4EA7EF93F5_DNE_0_v2.txt
@asfdfdfd Samsung Galaxy A3 2016 on Android 7 more specifically. Do you have this device at hand?
Here's what I can tell from my users:
Doesn't happen on A5 2016 on Android 7.
Doesn't happen on A3 2017 on Android 8.
@LouisCAD
Do you have this device at hand?
No.
We built a version of kotlinx.coroutines that doesn't rely on AFU by disabling bytecode transformation in atomicfu plugin. And then we replaced all coroutines dependencies with non-AFU ones using the following code:
configurations.all {
resolutionStrategy {
dependencySubstitution {
all { DependencySubstitution dependency ->
if (dependency.requested instanceof ModuleComponentSelector) {
if (dependency.requested.group == 'org.jetbrains.kotlinx' && dependency.requested.module.startsWith('kotlinx-coroutines-')) {
dependency.useTarget "$dependency.requested.group:$dependency.requested.module:$kotlinxCoroutinesVersion-noafu"
}
}
}
}
}
The artifacts we built aren't published anywhere but you can easily build them yourself.
And its happened to me now (2nd time, first with rxjava 3 years ago). Dropbox is crashing when using ConflatedBroadcastChannel on samsung 5.0, 5.01, 5.02 devices. Would appreciate a fix (pretty please)
Any updates/workaround on this?
Most helpful comment
I continue to vote for doing nothing. This is not a library's problem.
If someone wants to fix this, write an Android Gradle plugin transform that bytecode-rewrites these into the slower format such that you fix all libraries at once and can opt-in and out based on things like minSdkVersion, splits, and bundles.