Cht-core: Search doesn't work in Nepali or with accented characters

Created on 18 Apr 2017  路  22Comments  路  Source: medic/cht-core

On alpha.dev, with demo user (non-admin) :
image

image

Also confirmed with a random Contact in care-sindhuli, just in case my fake Nepali name was special, with admin user.

1 - High Bug Upgrading

Most helpful comment

@alxndrsn Yup, thanks, saw it! It's just that everybody keeps reviewing this when it's not in Code Review yet :)
I will call out an official code review soon, and yall can all knock yourself out.
Thanks for the input anyway!

All 22 comments

@bishwas-medic Is this blocking your 2.x upgrade?

I can do away with it for few weeks but I see this becoming a big problem after we've upgraded all instances to v2.x. Specially with the associated contact mess. To associate a contact to a user, we need to search using the contact's name and since we can't search in Nepali we can't associate any existing contacts in most of our projects to any users because all their names are in Nepali. The PMs will certainly not like it and nor will the partner staffs. So, not immediately blocking...but I'd love to see this fixed within this month

@bishwas-medic the upgrade should automatically set the associated contact for each place, so this should only affect new projects, not existing projects.

Also, if something is a bug that will prevent PMs or partners from doing their work, then I would call it blocking. We can prioritize this next iteration.

the upgrade should automatically set the associated contact for each place

@sglangevin How does this work? User names are matched with the existing People in the Contacts?

Thanks for moving it into next iteration.

@bishwas-medic I think I misunderstood your original comment. I meant that your primary contacts would all be migrated, as long as the facility table is filled out completely.

The users you had in 0.4 are "full access" users (national_admins) so they don't require an associated contact so I don't think this issue is relevant for them. Full access users can create people and places in the app without an associated contact. There was a regression on that but it is fixed in v2.11.1.

Are you planning to created any restricted users for the projects that are upgrading?

If the full_access users won't require to have associated contacts, that'd be handy to start with. My concern now is about the release date for v2.11.1.

But we do plan to have restricted users for partner's IT staff so that we can assign them limited permissions like no bulk delete, no contact delete etc and only the main project managers from Medic and the partner will have "full access".

@bishwas-medic let's take this conversation off of this thread. I'll respond to your questions on Slack.

@garethbowen , can your review the first wave? It's just refactoring at this point. I'd like to merge it in separately first.
https://github.com/medic/medic-webapp/pull/3486

(I can tag the commit with this issue number, or not, since it's not fixing the issue...)

Back to you @estellecomment

Words starting with non-ascii chars are not indexed. Removing that.

I've requested one change and one unit test.

@SCdF Can you also please review this PR? You added the regex that's being removed and I just wonder if you can remember why it was important.

Commented. tl;dr; yes it's intentional.

So, when you remove that intentional check for non-ascii characters, on lg.dev data (english and swahili, both ascii languages), you get these numbers of keys output:
contacts_by_freetext : 201,696 -> 201,786 (90 non-ascii keys added)
contacts_by_type_freetext : 201,696 -> 201,786 (90 non-ascii keys added)
reports_by_freetext : 287,788 -> 287,964 (176 non-ascii keys added)

So it's negligible. I'm leaving it.
(cc @SCdF )

Posted this a couple of days ago, but I guess it got missed. I think the following line can be deleted:

!key.match(/(^$)/)

@alxndrsn Yup, thanks, saw it! It's just that everybody keeps reviewing this when it's not in Code Review yet :)
I will call out an official code review soon, and yall can all knock yourself out.
Thanks for the input anyway!

Review time! Assigning to @garethbowen, but @SCdF and @alxndrsn if you want to take a look you're welcome to.

Whole PR : https://github.com/medic/medic-webapp/pull/3515
For your reviewing convenience, PR without the refactoring (first commit) so you can see the fix itself: https://github.com/medic/medic-webapp/compare/1424c69434c7c189e703709ff03b562fa841639c...3392-refactor-search-smore

Just a question about functionality: if I search for eleve in webapp, will 茅l猫ve be matched?

@alxndrsn that would be a really cool feature. We'd have to make the view generation a lot smarter though, and have it (and the client-side search code) down-grade 茅 to e and so on. I'm not sure if there is a small list of these anywhere that we could use in the view, or if it's massive (massive meaning a slower generating view).

馃憤 I wasn't sure if this was meant to be covered by this ticket, but sounds like we need a separate one.

For implementation, looks like this would work couch-side: https://stackoverflow.com/questions/990904/remove-accents-diacritics-in-a-string-in-javascript#answer-18391901

And we're already lower-casing, so we can ditch half the map.

Doesn't look like it's set up to handle non-latin characters, but perhaps we don't have projects for which that would be relevant yet.

We may need a separate ticket for normalising unicode inside this view as well (similar to https://github.com/medic/medic-webapp/issues/3302); it would be good, but tricky, to test this.

if I search for eleve in webapp, will 茅l猫ve be matched?

Yup!

Closing, thanks for the review(s)!

LGTM.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SCdF picture SCdF  路  5Comments

alxndrsn picture alxndrsn  路  4Comments

abbyad picture abbyad  路  3Comments

diannakane picture diannakane  路  6Comments

abbyad picture abbyad  路  4Comments