Currently migrations run on the server when api starts up and then they get marked as completed and they're never run again. This worked fine with always online clients, but now we use offline first we cannot guarantee all the docs are synced when the migration is run.
Work out how to migrate docs after the upgrade.
A couple of ideas: we could either run migrations on client side when the ddoc is updated, or on the server side on replication.
Running server side on replicate: we'll never know when a client is up to date which means we'll have to run each migration on each replicated document for all eternity. This would also require a rewrite of how we do migrations to deal with one doc at a time.
Running client side on update: this might slow down the handset which the migration is running. We have to execute migrations before setting up db sync which means using the changes feed since last sync to determine which docs have been changed locally, then migrating them in the bootstrap, before launching the app. This also requires a rewrite of how we do migrations to deal with one doc at a time. It also means working out how to distribute migrations to the client.
I think of the two options, running on client side is the lesser of two evils, because the migrations will run once on every device and then never again.
Dave pointed out the pouchdb-migrate plugin which could be useful. One of the options they allow is since which means we may be able to...
since last synced sequenceWe can use this pouchdb plugin to get all the unsynced docs to know which we need to migrate before push.
Ok... I think we can put this off until we need it, but here's the plan:
There are cons to client-side migration as well unfortunately.
This leads me to think about two changes:
firstRun(db) fn that does all that, and filter(db, change) and migrate(db, change) that is run over every change.Pros:
Cons:
Thoughts @garethbowen ? Sorry for brain-dumping on your ticket…
@SCdF No worries - two brains dumping are better than one.
I don't much like your proposed solution because as you say, it needs to run on every doc forever, even if there's no way it could possibly make a difference (eg: 0.4 migrations running for projects that started on 2.11).
Let's voice chat about this some time - I feel like we're close to a solution.
OK so @garethbowen and I had a chatty chat.
Here is the current apogee of our collective thought, intermingled with my post collective thought confusion:
doc and modifies it in place but doesn't write anything, which is used directly for both _changes and the initial bulk wrapper)_Things that are still gross / unknown:
Gareth: did I get that right?
@SCdF
does [documents are split by document type] matter anymore if they are server side? Would the optimisation be gating by type vs. gating by a filter fn per migration, and we care for simplicity, performance?
It doesn't matter much in practice but I think it matters conceptually. If we want to move to a world of data schemas then I think it makes sense to have versioned schemas per type. If want to stay with schema-less data then we should stick with a filter function. It probably falls outside the scope of this work - we can introduce schemas later.
are there situations where we'd want to support blocking and non-blocking migrations?
Yeah sure. I think @estellecomment was looking at this at one point. I think the majority of our migrations could be non-blocking. However given they're ordered, if the last migration is blocking then we have to block on all migrations until the last one is executed.
somehow api blocks on sentinel?
I'm not worried about where the code lives yet - the migrations might stay in api? This will be more clear once we've decided what we're doing...
I guess we'd potentially want to write two versions of the migration, one that uses bulk docs / views for speed, and one that deals with individual docs for changes.
I'd really rather not (complexity). However if you have the meta doc store the schema version (or whatever) and the doc type so you can deterministically work out if you need to run migrations on that doc you can have a view which returns all docs which should be run through a given migration. Then you can query that view for the first 100, run a batch through the migration map, bulk save, and query the view again. The query the view code could be written once in the migration runner, so all the migration writer has to do is write the mapping function.
we might want api to actually show some maintenance page for web users, and return a certain HTTP code for replicators, so we know we are in this state?
Re: non-blocking migrations, no I don't remember doing anything about that...
I get the point of schema version. Cool. Not quite sure it all works out when version are server-only though, but maybe I'm missing some bits.
Assume there’s code v1, and schema v1, and we push an upgrade to code v2 and schema v2.
Server gets new code v2. Api blocks until migrations are run. Schema is now v2 for all docs. Api starts.
Meanwhile, offline, a client on v1 edits an existing doc (editedDoc) and creates a new doc (newDoc).
Client gets online, and gets code v2. Gets changes for migrated docs. Conflict on editedDoc (what happens??). Client pushes its changes up to server.
Server gets changes for editedDoc and newDoc.
newDoc has no schema version. Sentinel runs all migrations on it, then gives it a v2 schema version. All good.editedDocChanges are synced back down to client. Client gets editedDoc v2 and (if migration was necessary) newDoc v2.
re: conflicts in general
Our managing of conflicts remains the same before and after this change, and your scenario above plays out identically (in terms of conflicts) with the current migration scheme. We currently have no conflict resolution, so CouchDB will just pick one based on which has the highest hash (or lowest, I forget, it's not relevant).
Conflicts are not auto-detected at any point in Couch / our app. So Sentinel will not be aware that there is a conflict, and neither will the client. You have to explicitly detect them and manage them. Using your example above, you will either randomly keep or randomly lose the client's changes, and never notice.
I think dealing with this is really important. However, it's just as important now as it would be after this change. If anything, you could attempt to argue it's slightly _less_ important after this change, because having a metadata document that is only on the server side (where we could put schema version, transition history etc) may mean we can reduce the frequency that sentinel writes to client-facing documents.
@alxndrsn, please triage before the end of this sprint.
Still needs doing.
Deprioritised out of 3.0.0.
NB: in general I'd say we'd simply avoid doing this by not having massive migrations. However, it is likely that the flexible hierarchy work (#3639) will force us to migrate contacts, which in turn would be much easier to do if we solve this ticket.
(Up for discussion, but IMO unless @garethbowen is super duper sure we could leave this until we're sure that flexible hierarchy requires it)
As you say, this isn't required until we need to do a migration on data that can be changed on the phone (reports, places, or people). Flexible hierarchy is one feature that would really benefit from being able to efficiently and reliably migrate contact data so that all places have the same type. However we could make it work in a backwards compatible way so it's not technically required, it does however make the code much simpler and less error prone.
We have other examples of where our data structure is causing code complexity which need to be resolved eventually (messages vs reports, inlined contacts, etc) and these would also require efficient and reliable migrations. I think the best approach would be to bundle all these migrations together and solve this issue as part of a 4.0 release which will be some time away yet meaning that we don't need to solve this right now.
@alxndrsn @SCdF What do you think?
@garethbowen I agree with this. FWIW I think this is a complex enough problem that we don't have time to solve it before 2019 anyhow, given other priorities.
FWIW I think this is a complex enough problem that we don't have time to solve it before 2019 anyhow, given other priorities.
:+1:
Removing from 3.2.0 as discussed.