When replicating to tecno wx3 the indexeddb database seems to be dying. Errors in the console are showing. Reloading the app finishes the app start process and the app becomes usable.
Publishing 'progress' event with: Object {purged: 2325, processed: 486, total: 486}
Purge complete, purged 2325 documents
Publishing 'optimise' event with: undefined
Database has a global failure DOMException: Transaction timed out due to inactivity.
Uncaught (in promise) n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Steps to reproduce:
Using user filysa on purge-upgrade-muso-mali instance
Log in to a fresh app install on tecno wx3. Have chrome://inspect going for the mobile app
What should happen:
The app should launch without error.
What actually happens:
The app replicates and purges documents then the errors showing the 404 error page and the error above are shown in the log. Currently we have a solution to handle this when it is unhandled but this is being handled.
https://medic.slack.com/files/UBHER12S2/FFZ6U3HEJ/indexed_db_went_bad.log
Environment:
Currently we have a solution to handle this when it is unhandled but this is being handled.
To clarify -- this quote refers to the changes from #5045 which prompt the user to reload the page for unhandled rejection DOMException: Failed to execute 'transaction' on 'IDBDatabase': The database connection is closing..
The exception in this issue here is DOMException: Transaction timed out due to inactivity. Although they are both DOMExceptions, this isn't the same issue seen in Siaya. As written, I wouldn't expect to see a reload dialog for the error reported in this issue (when handled or unhandled).
@SCdF As the resident IDB and purging expert does this ring any bells?
Originally I thought it might be the view cleanup that happens after purge, but looking at the whole log there is a lot more going on.
A slightly more complete look at that log is:
Purge function has changed, running purge
Publishing 'start' event with: Object {totalContacts: 486}
user
Object {name: "filysa", roles: Array[5], locale: "en"}
Publishing 'progress' event with: Object {purged: 0, processed: 1, total: 486}
user
Object {name: "filysa", roles: Array[5], locale: "en"}
Publishing 'progress' event with: Object {purged: N, processed: M, total: 486}
^ 486 times
Purge complete, purged 2325 documents
Publishing 'optimise' event with: undefined
Database has a global failure DOMException: Transaction timed out due to inactivity.
Uncaught (in promise) n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Uncaught (in promise) DOMException: Failed to execute 'transaction' on 'IDBDatabase': The database connection is closing.(…)
^ * 2800 times
n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
db.type() is deprecated and will be removed in a future version of PouchDB
Initiating changes service
Database has a global failure DOMException: Transaction timed out due to inactivity.
Error updating icons n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Error initialising watching for db changes n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Attempting changes initialisation in 5 seconds
Error initialising zscore util n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Error fetching settings n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Error loading facilities n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Error fetching settings n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Error getting tasks n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Error checking authorization n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
^ * 25 times
n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"} "Possibly unhandled rejection: {"status":500,"name":"indexed_db_went_bad","message":"unknown","error":true,"reason":"TimeoutError"}"
n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"} "Possibly unhandled rejection: {"status":500,"name":"indexed_db_went_bad","message":"unknown","error":true,"reason":"TimeoutError"}"
n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"} "Possibly unhandled rejection: {"status":500,"name":"indexed_db_went_bad","message":"unknown","error":true,"reason":"TimeoutError"}"
Error fetching form definitions n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Failed to retrieve forms n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Error fetching contact n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Initiating changes service
Error initialising watching for db changes n {status: 500, name: "indexed_db_went_bad", message: "unknown", error: true, reason: "TimeoutError"}
Attempting changes initialisation in 5 seconds
^ * many more
The key numbers here are:
I can't get these to line perfectly up, but they get suspiciously close to each other.
@newtewt can you confirm that for this user the purging actually occurred? That is to say, were the documents no longer accessible?
My hunch is that it IndexedDB has broken either before purge has run, or because we are doing an all_docs at the start and idb is "not ready" somehow, and so none of the purges were actually successful.
In terms of the actual error, it's hard to track down. Apart from one specific usage that we aren't going to get to, it's just used whenever a transaction fails.
One option is that we could inject failures in pdb's code and narrow it down to only places where the promise chain doesn't except.
I'll confirm today that the docs are unavailable. I was going based on the console log that purging completed.
@SCdF purging completes properly. The home_visit docs are not available anymore after refreshing and getting into the app.
I'm trying to work out if this is a blocker for 3.3.0 or if we should add it to 3.5.0.
@newtewt Do you have any idea how often this would happen?
@SCdF I'm leaning towards bumping it - what do you think?
This happens consistently for the filysa user on first replication and purging. So it is reproducible. However, I logged in as user madinagu and didn't experience that error. This user has more docs than filysa user.
Have you tried wiping the filysa user db, or logging in as them on another device? It's possible the db has been corrupted and isn't recovering.
I'm wiping all the data in the Android app. I would need to try another device to be sure as I can't remember off the top of my head right now.
That's messed up. @SCdF Would you mind investigating this user to see if they have a dodgy doc or purge configuration that's messing them up? The goal is just to get 3.3.0 to the point of go/no-go so if in your opinion this isn't required then bump it out to 3.5.0 and we'll pick it off the stack when we get to it.
I'm on it (well, once I get creds etc)
So it took four different devices, but I managed to replicate it, on our old friend the Tecno Y4.
It's unclear why it happens on that device over the others, though it is the worst device I have.
@newtewt can you post a user who does succeed compared to this one, and specifically:
I'm trying to work out if there is some kind of breaking point ratio.
In the mean time, I'm going to pull this user's data down locally and I will try mucking with the code to see if I can produce any interesting results.
@SCdF Here are the two users I'm using on the wx3 and seeing the error. I tried on my pixel 2 and tecno J5 with no errors. Let me know if you want more.
user | total | number of docs after purging | sees error | device
-- | -- | -- | -- | --
filysa | 4792 | 2467 | yes | tecno wx3
madinagu | 14751 | 7255 | no | tecno wx3
So I'm guessing that we're using a WX3 because Muso people have them deployed?
I believe so. @abbyad mentioned that is what the CHW's are using in Mali.
So I'm going to remove the compaction (optimise) step.
We are already supposedly doing this on every write (and the code seems to exist in PouchDB), and manual compaction is seq based, so it's going to run over the whole changes feed again, which is a lot.
There is no reason this should cause crashes, but there is also no reason for us to manually compact if we're always compacting when we write.
@newtewt this build is available in purge-no-optimise branch build.
I'm going to test this locally as well, but you should test it on your phones when you get a chance.
So far this has been working fine on purge-upgrade instance. I haven't seen the crash today. I've initial replicated a few times in browser, tecno wx3, and tecno j5. I'd say let's push a PR and get another 3.3 beta and give it another once over.
@SCdF I'll leave it for you to merge and backport accordingly. The diff looks fine - we should probably delete the translations for "optimize" at the same time.
I made enough changes after @garethbowen 's comment that I made a PR and have linked it to Gareth.
@garethbowen leave it to me to merge / forward port.
@newtewt @ngaruko I've merged this into 3.3.x so AT (and hopefully RT) can start. Wait for the build to finish and you'll be all good.
@SCdF To get AT unblocked I've ignored your directive and merged the PR. I also forward ported to 3.4.x and created a PR for you to merge into master.
Coolio, so:
**breathes out**
Replicating and purging is occurring as expected. Closing