Dexie.js: How to search an index with multiple words that contains those words?

Created on 14 Dec 2018  路  7Comments  路  Source: dfahlander/Dexie.js

Hey there! Loving Dexie so far. I have pretty much everything set up except for searching correctly.

I was wondering if there was a way to search words in a string (Example: 'Comparing similar JS frameworks') with a value such as 'Com frame'. It would then get that results, and any other row of data that contains those values.

I have used startsWithIgnoreCase, and it works good. But, I need something like 'containsIgnoreCase'.

I've seen the fullTextSearch.js file, (maybe I'm just not implementing things correctly) but I don't think it will work for my case.
I saw the Lunr example, and I'll be looking into that as well. Any advice or other examples that might help?

Most helpful comment

Good! Performance tip: store messageWords in lowercase and use

Promise.all (
  prefixes.map(prefix => 
    db.table.where('messageWords').startsWith(
      prefix.toLowerCase()
    ).limit(SOME_LIMIT).toArray()))

...instead. This will increase performance in the following three ways :

  1. Avoiding nocase search enables native b tree searching
  2. Avoiding anyOf enables Dexie to use IDBIndex.getAll() internally
  3. Limiting to a reasonable number of hits prohibits slow searches on short prefixes and many matches.

All 7 comments

In issue #775, there's a link to an updated piece of code that use lunr to make full text search. Lunr is still lexeme based, and most full text search systems are, which means if you search for "needle" in the text "do we have some needles" it will find it, since "needle" and "needles" are just different inflections.

You can combine lunr search with Dexie's startsWith() to also match prefixes of each lexeme (replace equals() with startsWith() in the sample of #775). But chances are that you may get more matches than you need.

In order to find any substring, like String.includes(), from a database without a full index scan, another more expensive indexing strategy must be used. I have no example of it, but can be implemented using trigram, see http://blog.scoutapp.com/articles/2016/07/12/how-to-make-text-searches-in-postgresql-faster-with-trigram-similarity

@dfahlander
Thanks for the reply! I saw that post before asking this question though.
Also, I figured it out. I'm just using the FullTextSearch.js file, then using text.split(' ') on my search query values.

So that way, I'm sending an array of values over to .startsWithAnyOfIgnoreCase, which will then look in my equivalent of FullTextSearch's messageWords variable. Gets all the results I want perfectly.

Good! Performance tip: store messageWords in lowercase and use

Promise.all (
  prefixes.map(prefix => 
    db.table.where('messageWords').startsWith(
      prefix.toLowerCase()
    ).limit(SOME_LIMIT).toArray()))

...instead. This will increase performance in the following three ways :

  1. Avoiding nocase search enables native b tree searching
  2. Avoiding anyOf enables Dexie to use IDBIndex.getAll() internally
  3. Limiting to a reasonable number of hits prohibits slow searches on short prefixes and many matches.

@dfahlander Thanks so much! I'll switch over to all this.

@dfahlander Having another issue. Do you have any ideas about how to get the search to grab results that are most similar?

Example: I have a card called "Enormous Baloth"
User types in "eno bal", that should theoretically be the card that shows up first, or is very high on the list. However, I mostly get results where the 2nd or 3rd word has "eno" or "bal".

Any ways around this?

EDIT:
Sorry about the close & reopen. Anyways, the example above is unrealistic.

Example 2: Let's say we have the same card, "Enormous Baloth".
A user enters "Enormous". The results vary, and Enormous Baloth is the ~25th result. The other results have parts of their name that contain "Enormous", but it's the 2nd, 3rd, 4th.... word.

How does Dexie determine what to sort by in this example?

As this is a common use case, I suppose there are algorithms out there (that is implemented in other libs or database engines). If I had time now, I would like to dive into them, but unfortunalety I haven't right now.

One simple thing you could do, would be to inspect the returned arrays of my previous snippet (the result after awaiting Promise.all(...)), and detect whenever the same object occurs in multiple results. Then manually sort those first. This would at least prioritise those results where both words matches.

Thanks for all the help! I'm getting closer to what I want to do with the data. This all helped a lot!

Was this page helpful?
0 / 5 - 0 ratings