New crash in version 2.6.6 affecting 12 users with 79 reports. :/ It appears to be caused by uploading. I tried to contact the user to get more information, but we'll see if he responds.
java.lang.NullPointerException:
at fr.free.nrw.commons.modifications.ModificationsSyncAdapter.onPerformSync (ModificationsSyncAdapter.java:105)
at android.content.AbstractThreadedSyncAdapter$SyncThread.run (AbstractThreadedSyncAdapter.java:272)
(Yes, the stack trace is only 2 lines. Not sure why)
Android 5.1 | 60 | 75.9%
Android 7.0 | 10 | 12.7%
Android 8.0 | 5 | 6.3%
Android 6.0 | 2 | 2.5%
Others | 2 | 2.5%
A37f (A37f) | 59 | 74.7%
ZenFone 3 Max (ZC520TL) (ASUS_X008_1) | 8 | 10.1%
Mi A1 (tissot_sprout) | 3 | 3.8%
OnePlus3T (OnePlus3T) | 2 | 2.5%
Others | 7 | 8.9%
@psh @maskaravivek This appears to be caused by a NPE from the line:
if (contrib.getState() == Contribution.STATE_COMPLETED)
Reading the code in context I see something straight away
contributionCursor.moveToFirst();
contrib = contributionDao.fromCursor(contributionCursor);
if (contrib.getState() == Contribution.STATE_COMPLETED){
...
}
Firstly, we are not checking the return value of contributionCursor.moveToFirst() - it will return false if the cursor we are interacting with was empty (ie, we are unable to move anywhere). It's a common mistake to forget to check the return value of moveToFirst() and moveToNext() and leads to bugs every time.
I believe what's happening is that the cursor is empty, as there's a guard in fromCursor() -
if (cursor.getCount() > 0) {
return ...
}
return null;
That's the most obvious reason I can see for the contribution being null on the line you mention. I think it would also be valid to add a simple null check
if (contrib != null && contrib.getState() == Contribution.STATE_COMPLETED){
...
}
but I am curious about the relationship between "modifications" and "contributions" - how can the data be out of sync in such a way that there's no corresponding contribution for the modification in question?
Thanks @psh !
Unfortunately much of our modifications and contributions code is legacy code from 4+ years ago, and the logic behind them confuses me sometimes. :/ One of the reasons I can guess is that the contribution (upload) failed, but the modification (category selection) went through without checking for whether the upload worked.
However, if that was the case, that doesn't answer the question of why this only became an issue recently? Our first report of this crash ever was on 17 Jan, but the numbers were small (1 or 2 reports) before 21 Jan (2.6.6) where it exploded to 13 users and 132 reports.
I also wonder if the // This code is fraught with possibilities of race conditions, but lalalalala I can't hear you! line might be relevant to this problem, lol.
From @whym at #1090 (copying over here for continuity):
Looks good to me. Upload went through for me, but I had't experienced the crash, so this is kind of expected.
Any idea on helping people report this issue without making them experiencing crashes?
We could perhaps add a log message when a null is detected in one of those variables. (Well, it may not be much of help, but at least people who can use logcat can help us.)
Hi team, I am able to reproduce the bug with devices 5.0.2 and 4.42 API. It happens with category and without category uploads. When I follow debug break points, weirdly ModificationSyncAdapter is called even if I don't add any categories.
@neslihanturan Thanks, this is very helpful! :) Additionally, could you please link the image that hasn't had categories added yet?
@psh The recent changes to ModificationsSyncAdapter were mostly the Dao injections that were added in #1047 . Do you think that could be relevant to this problem?
Here: https://commons.wikimedia.org/wiki/File:Special_bread_from_Turkey.jpg
Hi team, since this is one of our most critical issue I tried to investigate. I think it is related with #1042 and @Inject's creating an instance when it is necessary. But I am not a Dagger expert yet to detect the problem. However, since I am able to upload form my beta account and not able to upload from my production, I prepared a comparison to catch something. Sharing with you and hoping it will make a flash of inspiration of you. I have marked the differences between called methods.



If you want a detailed investigation of any other informations such as object references etc. please let me know. It seems like it is all I can do to help this discussion.
Hi guys, apologies for my absence as I have been away this weekend. This is a fairly critical bug with the crash rate being 7.79% at the moment impacting ~300 user sessions. Unfortunately if a fix cannot be found soon we may need to revert the recent changes to ModificationsSyncAdapter for the time being. :(
If the revert has to be done, will the result of the revert be the new master, or will it be done in a separate branch?
Ah I didn't see the link above. So it's in https://github.com/commons-app/apps-android-commons/tree/2.6.7 ?
@whym Yeah, the revert is done in the 2.6.7 branch. That is also the branch that I will be releasing if it solves the crash for @neslihanturan . I am still unsure if it should be merged into master later though, or if we should keep it separate for the time being. What do you think?
Also, I should note that the 2.6.7 branch works well for me with the exception of the number of uploads (in the action bar) not being incremented after each upload. The revert seems to have broken that (AFAIK, no other changes were made, but I might have made a mistake in one of the conflict resolutions).
Oh dear. 2.6.7 also tries to add a category multiple times. :/ I am glancing through my revert additions/deletions in comparison to #1047 's, but they look accurate to me and I don't see any discrepancies, especially not anything that could cause this. Strange.

And I am still unable to upload, after revert.
Oh wow, a 5th "bistro" just got added to that category list, lol. I think we need to get a better understanding of how modifications work.
Okay here are test results, I went back from master to see if I am able to upload, all of the trials are refresh installs:
1- On commit 660439894533bf42c19d3128deb81549d60b1eaa, just before notifications merge, unable to upload with no error.
2- On commit 435e7900bdc5a1f14b8d1a3677f9c5f3d8ca5f7b
3- On commit 38d3edfe1da0ce2db55415adfb2d7491fbe5c83d
4- And on commit a38e465cd65bb452a64b1fb7cbe06814d7258b1f
5- Even on 7abb530207df4cf97a119e0a9cf15146708ff8aa this commit is from December...
Same thing happens. Upload fails with no crash. So maybe this problem is not related with out current changes as #1047, only a backend problem?
Talked to @neslihanturan about this. Some of our speculations:
At any rate, our crash rate currently is 12.7%, so waiting until we have a definite solution isn't an option. I am going to be creating a branch that rolls back to 2.6.5, cherry picking the 2 major fixes (the Dagger fix and description fix), and releasing 2.6.7 with that today (with fingers all crossed that it'll help, lol).
In the meantime, we will leave this issue open and it still remains the highest priority bug. We cannot do another release with the current master until we find out what's going on with this.
It seems that even the rollback isn't going to be that simple. I was cherry-picking the Dagger fix and wondering why there were so many conflicts and compile errors. It appears that the Dagger fix that we want (#1062 ) builds on top of other changes to those files (like #1047 ), so it will not be that easy to cherry-pick the Dagger fix independently of the other changes.
I have some thoughts on how we can prevent this problem in the future and will create an issue for that. But in the meantime, I'm not sure how we can handle this. I see two options:
@neslihanturan and @maskaravivek , thoughts?
How about re-releasing 2.6.5 as 2.6.7 without cherry-picking anything? Was it as bad or even worse?
Hi team, I will share something that I learned by change (trying to create a new account with my VPN IP). My account is not blocked, that's why I don't have any notifications and I am able to upload by browser. But I learned that IP address of my VPN is blocked (the one that I used on my Android device, not laptop). I am not able to test to upload via app without VPN (since there is still block from Turkey), and very unlcky only I could reproduced the bug. However, I think reverting will fix our urgent bug.
And I think as @whym suggested, we can use 2.6.5.
@whym The Dagger crash from 2.6.5 #1036 would make that a very unattractive option. Admittedly 6% crash rate is better than 12%, but still. If @maskaravivek can rework his PR to be independent of the other changes, that might be the best bet?
Interestingly, it seems like there might be a 3rd option. I closed #1090 because @neslihanturan , who was the only person able to reproduce the bug, said that while her crashes were fixed by that PR, her uploads failed. She tested this 6 days ago, and her IP block was done 10 days ago. So, there is a possibility (albeit not a guarantee) that #1090 actually does fix the problem. We thought it didn't because her uploads failed... but they might be failing because of the IP block instead.
Option 3 - reopen and merge #1090 in hopes that the problem will be fixed. This could be the ideal solution, however it is also the most risky - until Neslihan is able to fix the IP block, we have no way of knowing whether it truly does solve the problem or not.
@neslihanturan
But I learned that IP address of my VPN is blocked (the one that I used on my Android device, not laptop). I am not able to test to upload via app without VPN (since there is still block from Turkey),
You might be able to avoid this problem by getting whitelisted (IP block exemption).
You might be able to avoid this problem by getting whitelisted (IP block exemption).
Sweet! Too bad this is done via email, so we can't post in support of the application. @neslihanturan Please feel free to mention your involvement in this project in the email that you send them. Also feel free to have them contact me if they wish. Perhaps you could link to our grant proposal (which has your name on it and endorsements from reputable members of the community) as well?
If there is a problem with corrupted data, it should be simple enough to release a version of the app where we bump the version of the database by 1 and add steps in the migration script that fixes the data (for instance, finding and deleting orphan records in one of the tables).
After my VPN problem is fixed, I am no longer able to reproduce the crash:/
It appears that future uploads do not fail, either. :)
This is a good thing IMO, it means that the issue is much less severe than we thought.
I have made the NPE temp fix at #1090 , but we should continue trying to find the reason for the empty cursor. When we find it and have a fix ready for it, we can remove the null check from ModificationsSyncAdapter to see if our fix works.
Just in case anyone here is still trying to find the root cause for this issue (or any issue, for that matter) and isn't aware of git bisect, give it a try! You'll almost always find the root cause without a lot of effort. I've used it multiple times with fruitful results.
I don't think this still happens. If it does do reopen.
Most helpful comment
@neslihanturan
You might be able to avoid this problem by getting whitelisted (IP block exemption).