Docfx: Full-search implementation concerns

Created on 1 Sep 2016  路  9Comments  路  Source: dotnet/docfx

First of all full-search is awesome. Really cool. But let me criticize a bit.

  1. Why do we need search-stopwords.json?
    lurn already contains built-in stopwords for English. But you remove default stopWordFilter, then load a separate stopwords index file and generate a filter based on it. Why?
    That search-stopwords.json contains the same stopwords as default builtin filter!
    Moreover lunr addons for languages (from https://github.com/MihaiValentin/lunr-languages) contains their own stop words.
  2. Why not build index in build-time? Why instead do you load json in run-time and then add item by item into index. It can be done (and usually done) in build time. Then in run-time we can just load an index file:
    $.getJSON("index.json", function (data) { engine = lunr.Index.load(data); })
    That's all.
    I understand that you enrich search results with title and keywords which are absent in lunr.search's result. But it can be done via additional index file.
  3. no i18n
    Index should be built with honor of other languages. lunr natively supports only English. For additional languages support we need to add addons (from https://github.com/MihaiValentin/lunr-languages):
    in buildtime:
var lunr = require('lunr');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);
var lunrIdx = lunr(function() {
  this.use(lunr.multiLanguage('en', 'ru'));
  // config ref/fields
});

in runtime:

lunr.multiLanguage('en', 'ru');
engine = lunr.Index.load(data);

I can create a template for customization of index building but I think it should possible without template customization. Also please see #650 - these're problems with encoding of extracted keywords for indexing.

Area-Search enhancement

Most helpful comment

The performance of this runtime index processing isn't so good. The doc site I have has an index.json file of about 8.5 MB. This means that search isn't available for a minute or two while it's being processed.

It's mentioned above that it might be possible to do this processing at build time instead of runtime in the browser. If that is possible, does anyone know how i can achieve that?

All 9 comments

Thanks for @evil-shrike 's comments, it's quite reasonable and insightful. I'm glad to share something with you:

  1. For the first question, actually, it's a way to solve the issue #279 , so users can customize the stop-words to avoid the problem.
  2. For the second one, the index.json generated by DocFX is not kind of serialised data which lunr.Index.load need. And additional index file would make the process more complicated?
  3. For the third one, I agree with you, we should support other languages.

Thanks @evil-shrike . Feel free to share it here if you have more concerns.

the index.json generated by DocFX is not kind of serialised data which lunr.Index.load need.

sure, we have to build it - similar to that how it's built in runtime currently and only call index.toJSON at the end, and we'll have index json for Index.load.
The need of additional index file would be compensated by the fact that we won't need stopwords.json (it'll used at buildtime and embeded into the generated index).

If no stopwords.json exists, how can users customize the stop-words? For example, what if user what to search the word 'let', which is included in default lunr.js stopwords?

I understand, I meant we don't need it at runtime (load a file from the server) if index would be built in built-time (with custom stopwords).

We're experiencing isses with the second point.
It seems the search index is built every time you load and/or navigate the page. This causes problems for docs sites with a medium/big-sized index.json file. It takes almost 10s for the lunr search index to be built. I.e. 10 seconds where Search does not work.
I agree that the Lunr-index should be built build-time, and only loaded run-time.

I'm trying to customize the search-stopwords.json so I can filter Italian stopwords (we are writing documentation in Italian), but without success. I tried to override the file inside my custom template and also to set the array directly inside search-worker.js, but apparently nothing happens and the index.json in the _site root is always huge. Could anybody explain my how I can do it?

The performance of this runtime index processing isn't so good. The doc site I have has an index.json file of about 8.5 MB. This means that search isn't available for a minute or two while it's being processed.

It's mentioned above that it might be possible to do this processing at build time instead of runtime in the browser. If that is possible, does anyone know how i can achieve that?

I would like to know how to go about this as well. We are concerned about processing large index.json files at runtime. Any updates on this? @qinezh

Checking back in on build-time indexes. Our search takes over a minute for the idex to be built on most desktops. It looks like search is just broken, because users would give up before the index is created.

Was this page helpful?
0 / 5 - 0 ratings