Cht-core: Bulk delete errors on conflict

Created on 25 Apr 2018  路  28Comments  路  Source: medic/cht-core

Raised from #4113

The bulk delete should make sure that when it's done all documents that were asked to be deleted are deleted.

Currently this can fail to be true if there are conflicts, which not only stops the conflicted documents from being deleted, but also halt the bulk delete completely.

One solution would be to take conflicts and add those documents to the end of the queue to be retried later.

2 - Medium Bug

All 28 comments

Getting errors on beta-old even for a couple of reports...Version 2.15.0-beta.14. Is this a regression @garethbowen ?
image.

cc @rmhowe @SCdF

@ngaruko Can you switch to the network tab and see what the response is? I suspect you're getting an error or something which isn't valid JSON so is failing to parse.

Network tab...
Response
image

Request payload...
{"docs":[{"_id":"dd32e2d08c7def675f5de6ab960e3295"},{"_id":"dd32e2d08c7def675f5de6ab960e2c1d"},{"_id":"dd32e2d08c7def675f5de6ab960e211f"},{"_id":"dd32e2d08c7def675f5de6ab960e1e0c"},{"_id":"dd32e2d08c7def675f5de6ab960e1345"},{"_id":"dd32e2d08c7def675f5de6ab960e1280"},{"_id":"dd32e2d08c7def675f5de6ab960e02bf"}]}

@ngaruko api server logs would be helpful here.

Does this reproducibly break or just sometimes?

@SCdF It seems to break all the time. Easily reproducible on beta-old. Even for a couple of reports.
image.

@ngaruko working locally for me but can definitely see the bug on beta-old, can you post the api server logs?

I looked at the logs and I found this @SCdF @rmhowe

2018-05-29T11:19:30.854Z - error: TypeError: Cannot read property 'filter' of undefined
        at utils.updateParentContacts.then.updatedParents (/srv/storage/gardener/data/working_dir/aHR0cDovL2xvY2FsaG9zdDo1OTg0L21lZGljL19kZXNpZ24vbWVkaWMvbWVkaWMtYXBp/node_modules/medic-api/services/bulk-docs.js:79:41)
        at process._tickDomainCallback (internal/process/next_tick.js:129:7)

Interesting, it looks like the bulk-docs-utils package is not being updated for api. Not sure on the exact behaviour of npm install with respect to file: packages, or if perhaps the grunt-auto-install package is failing somewhere, or even how install is triggered when deploying to beta, will need to dig a little deeper. @SCdF do you know off the top of your head the details of how file: packages are updated and where install is triggered when deploying?

Edit: so it looks like npm install /some/path just creates a symlink to that directory, which is where I then get incredibly lost as I have no idea about the file structure on builds such as beta and what happens when symlinks are copied

This is a known issue with gardener being terrible: you basically have to delete api and sentinel before upgrading. People should know this, but it's very easy to forget, and that sounds like what has happened here.

@henokgetachew do you have any bright ideas for how to avoid people forgetting to do this in the future. Maybe we really do have to fix gardener and deploy a new version?

How involved and time consuming is the fix to Gardener?

Moving this back to AT.

How involved and time consuming is the fix to Gardener?

No idea. It would involve:

  • Investigating the code and fixing the issue
  • Knowing how to "deploy" / "package" / "something" gardener (no idea)
  • Knowing how to update old medic os to the new version
  • Then deploying that everywhere.

Anyway, if we want to do this we should raise a ticket about it, this has nothing specific to do with this ticket and this ticket still needs to be AT'd correctly (ie 2.15.0-beta.whatever needs to be deployed properly).

@ngaruko I've deleted api and sentinel and then upgraded beta-old.dev. If you can give this a test today to make sure everything looks good, that would be helpful as it's the last item waiting for AT for 2.15.0.

As far as updating gardener, I'm not sure if it's worthwhile given that we will be upgrading people to 3.0 fairly soon and gardener won't be used anymore. It does add a little extra effort to upgrades in the meantime, but we won't be doing many upgrades between now and 3.0's release.

@henokgetachew do you have any bright ideas for how to avoid people forgetting to do this in the future.

I can script it so that it's at least easier. But people would still have to remember to do it.

@henokgetachew / @sglangevin yeah that's the issue IMO: how do we make sure people remember this stupid manual step.

Having said that, I think at the moment we have two deploy situations:

  • Lots of people deploy alphas and betas
  • Henok deploys finals

Alpha and beta deployments are about to move completely to horti, and even for legacy ones (which I have no doubt will continue for months) worst case AT fails and we wonder why for a little bit: we at least haven't broken production.

So IMO as long as Henok is the only one who deploys finals, and he remembers / has a script then we're fine.

It still seems not to be working. Does the bulk-delete works for you @sglangevin on beta-old.dev?

@ngaruko What's in the log?

I wouldn't be surprised if the error is related to the bad report that was causing #4596. We should delete that report and see if that resolves the issue.

Same error as above @garethbowen.
And I am not sure if it is related to #4596 @sglangevin. I think this issue has to be looked at because at the moment, not even one report can be deleted, apparently. Is anyone able to delete a report on this instance?

@ngaruko I just deleted a report on that instance using the delete button and it worked fine. I'm wondering if bulk delete is failing due to the same report that is causing the other issue. Even if you aren't trying to delete that report, perhaps it is still breaking the bulk delete process in some way.

The bulk-delete-utils shared library on beta-old hasn't been updated. I'm looking in to it.

When I followed the instructions that Stefan has pinned in the #development channel in Slack and installed the next beta the code was updated and bulk delete appears to work for me. No idea what happened but it's possible the instructions weren't followed, or that the instructions don't work 100% of the time. Anyway, this is now unblocked and ready for you to test again @ngaruko

Thanks @garethbowen - I updated beta-old.dev last (I believe) and I did follow the instructions, so I'm not sure what happened. Glad it appears to be fixed!

Thanks @garethbowen and @sglangevin. Bulk delete is now working. While on this, I discovered another strange behavior. If a report is selected and the user uses a filter (I used 'date' and form type in here but will try other filters as well), the selected report still shows on the RHS with 'no report' on the LHS if no matching report (see screenshot 1) or reports matching the filter on the LHS. Not entirely related, I will raise an issue for this, unless there is a known issue related @sglangevin ?.

Screenshot 2 | Filter == form type | instance: beta-old.dev |version 2.15.0-beta.16
image

Screenshot 2 | Filter == form type | instance: alpha.dev |version 3.0.0
image

@ngaruko I think this is by design - we don't change the RHS when displaying search results on the LHS.

@rmhowe It appears this change was never forward ported to master - can you have a look please?

@garethbowen yep sorry must have forgotten to forward port it, having some issues with tests on master currently but will do once they're resolved

Was this page helpful?
0 / 5 - 0 ratings