Meilisearch: Accents work strangely with synonyms

Created on 14 Sep 2020  路  4Comments  路  Source: meilisearch/MeiliSearch

Describe the bug
I had some issues with some of the synonyms working and some not and it seems I managed to identify a bug.
It seems words with accents are not used from the synonym table (and/or the query is deaccentized before looking up synonyms (?))

To Reproduce
Steps to reproduce the behavior:
Add synonyms:

Expected behavior
In a previous issue (#949) it was mentioned there is no typo tolerance with synonyms, however it now seems when considering synonyms, there is some sort of deaccenting the query before looking up in the synonyms table (?)

bug

All 4 comments

Hey @mzperix,

I just looked into the code base and it seems like we forgot to unidecode (standardize: remove accents and lowercase words) therefore words in the query doesn't match the non-standardized synonyms.

I advise you to do that by hand: remove accents and lowercase the words on both sides, until we fix that.

We will fix that in the next release, thank you for this bug report :)

Hi, I wanted to work on this as my first issue! I was planning on using this crate, however, I'm unsure of exactly where the search queries are parsed in the code base. Thanks!

Like what we did with facets (which are lowercased), we will need to store the synonyms in two different places. On one side the one we currently store need to be de-unicased, but we also need the original user input and keep the two lists in sync, so when the user request for the synonyms, the unicased versions are returned. I am currently on implementing this one.

In the end, this is not possible to do it in a straighforward manner without impacting user that use non-latin scripts. This involves work done on the tokenizer. In the meantime, I suggest that the synonyms are registered in a lowercase and de-unicoded format.

Re-opening this issue, and linking it to the tokenizer tracking issue (#624)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

carlosb1 picture carlosb1  路  5Comments

bidoubiwa picture bidoubiwa  路  4Comments

curquiza picture curquiza  路  4Comments

vird picture vird  路  3Comments

ayalon picture ayalon  路  3Comments