Mkdocs-material: Inconsistent search behavior

Created on 10 May 2019  Â·  4Comments  Â·  Source: squidfunk/mkdocs-material

Description

Search appears to work with beginning of word, but then not when the full word is queried.

Expected behavior

Query: "where"
Expected preview options: anything heading/paragraph/etc containing "where"

Actual behavior

video

Steps to reproduce the bug

  1. Create pages containing "which"
  2. Create pages containing "where"
  3. Create pages containing "what"
  4. Search for "wh" – you get appropriate results
  5. Search for "what" – you get appropriate results
  6. Search for "which" – you get no results.
  7. Search for "where" – you get no results.

Package versions

  • Python: Python 3.7.3
  • MkDocs: mkdocs, version 1.0.4
  • Material: Version: 4.2.0

Project configuration

site_name: 'DEDocketResearch'
theme: 'material'
extra_css: [extra.css]
extra_javascript: [extra.js]
nav:
  - '': 'index.md'
  - '1-1-DELADocketSurvey': '1-1-DELADocketSurvey/1-1-DELADocketSurvey.md'
  - '1-2-DEJMTScrapeAndSurvey': '1-2-DEJMTScrapeAndSurvey/1-2-DEJMTScrapeAndSurvey.md'
  - '1-3-LAJMTInitialComparison': '1-3-LAJMTInitialComparison/1-3-LAJMTInitialComparison.md'

System information

  • OS: macOS Sierra
  • Browser: Chrome

I've saw the related issues, but couldn't figure if/how this was related exactly.

This project is awesome and @squidfunk is so responsive!

question

All 4 comments

Have you seen #1097? It's definitely related to Lunr.js stemmer. Furthermore, which/what/where may be stopwords.

Stop words. Right. That makes sense: "wh" stems correctly (including results with "which"), but "which" as a whole is filtered out.

I'm looking at this https://github.com/olivernn/lunr.js/issues/212 – Any guidance on doing this within the theme? Do you expect I will have to rebuild?

For future reference, I will try to lay out the process which Material currently uses for localization and in the end sketch out how to achieve what you're asking for:

The English localization file partials/language/en.html is the base from which all other languages _extend_, which means that if a localization file does not specify a value for a placeholder, it will always fall back to the respective English translation. This is particularly true for the placeholders that were introduced after some of the localizations where submitted, like for example the skip.to.content placeholder for the equally-titled button. French, for example, will show "Skip to content", as it doesn't specify a translation for the placeholder. Now, this file contains three placeholders that are used to configure search behavior:

https://github.com/squidfunk/mkdocs-material/blob/367fef75b26d3007df1f05f4b75dc7d41407c883/material/partials/language/en.html#L11-L13

Lunr.js provides stemmers for some languages through lunr-languages (which is integrated with Material), but not for all (currently 36) supported by Material. Because I wanted to support search in those languages, I fiddled around with Lunr.js and found out that if I disable stemming and the stopword filter, those languages could be searched, too. The search experience may not be as smooth as it is with English, but it's better than nothing, like for example Hebrew:

https://github.com/squidfunk/mkdocs-material/blob/367fef75b26d3007df1f05f4b75dc7d41407c883/material/partials/language/he.html#L11-L13

This was the reason why I pulled search configuration into the localization files. As those values must be accessible from JavaScript, they are defined as meta tags within the head section:

https://github.com/squidfunk/mkdocs-material/blob/367fef75b26d3007df1f05f4b75dc7d41407c883/material/base.html#L30-L42

This approach seems to work reasonably well. Some languages use stemmers from other languages, as they _seem_ to work well enough (Chinese and Korean use jp, Serbo-Croatian uses ro, etc.). Why _seem_? Because I don't speak those languages, but when integrating them I always try to search some of the localized terms to see whether Lunr.js catches them on a best effort basis.

So, answering your question, how could you disable stemming and stopword filtering? You just need to override partials/language/en.html and unset the three placeholders:

  • search.language: if this is set, the respective lunr.<language>.js file containing the stemmer for the language is loaded and initialized. Otherwise, no stemmer is loaded.
  • search.pipeline.stopwords: if this is set to false, the _stopword filter_ will be removed from the pipeline.
  • search.pipeline.trimmer: if this is set to false, the _trimmer_ will be removed from the pipeline.

Why is this not possible via mkdocs.yml? Because up to now, nobody needed it. In theory, we could make it configurable by adjusting partials/language.html, which is the entry point for localization, but I think it won't be necessary. I want to keep configuration as lean as possible.

I hope that this shows that a lot of thought was put into how search works and how it can be localized and scaled to so many languages without much effort.

This does indeed show a lot of thought, and setting "search.pipeline.stopwords": false, does fix the issue.

Thanks again for work, and continued attention, on this project.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tiangolo picture tiangolo  Â·  3Comments

HerbFargus picture HerbFargus  Â·  4Comments

michael-nok picture michael-nok  Â·  3Comments

Timber232 picture Timber232  Â·  3Comments

BamBalaam picture BamBalaam  Â·  4Comments