Minimal-mistakes: Search for 'search' in Docs - no results

Created on 3 Jan 2018  路  17Comments  路  Source: mmistakes/minimal-mistakes

screen shot 2018-01-03 at 9 59 38 am

Went to learn about new search feature and how to use/enable.

Won't Fix Bug Support

All 17 comments

:man_shrugging: Not sure, possible that keyword is omitted from the Lunr.js index for some reason.

https://mmistakes.github.io/minimal-mistakes/docs/configuration/#site-search

The search feature is wonderful.

I was wondering if perhaps there were doc(s)/page(s) that were not being indexed/included. So this is because the term 'search' may be filtered by default by Lunr.js.

That's my guess at least. I believe there are "stop" words and this is likely one of them. Sure there is a way to override that but I haven't really dug into Lunr's docs too much.

@nickgarlis, who added the search functionality might know more about it.

I can confirm that I have "search" in one of my articles and when using Lunr it does not return any results. What about "Sear"?

The other thing about search is the index being built isn't exhaustive and doesn't contain the entire document's content. If it did the .js file would be pretty big and kill page load performance.

For reference this is what the index builds for each document: title, excerpt (limited to 20 words in the body, categories, and tags. So if the keyword doesn't appear in the title, in a category/tag, or in the first 20 words of the content, it's not going to show up in the index. Which is what I believe is going on with "search".

title: {{ doc.title | jsonify }},
excerpt: {{ doc.content | strip_html | truncatewords: 20 | jsonify }},
categories: {{ doc.categories | jsonify }},
tags: {{ doc.tags | jsonify }},

So interesting. Thanks for the reference.

Is the index definition tunable?

i.e. For the Docs site/section, indexing/identifying based on, e.g. Heading 1, 2, and 3.

@jaybe-jekyll No it's not that sophisticated. It's simplify taking {{ content }}, which is everything you put in the body of your Markdown file (below the YAML front matter), strips it of all HTML, then truncates it to 20 words.

Depending on how you're using the theme you could create a duplicate of /assets/js/lunr-en.js and put it in your repo. Then modify this line excerpt: {{ doc.content | strip_html | truncatewords: 20 | jsonify }} to include more words. If your site isn't that large that's probably safe.

@mmistakes I am starting to think that using truncatewords while indexing content might not be that necessary. The index JSON is pre-built since we use Liquid to loop inside of our content and it's only about 28 kB. That itself is a huge performance gain. We could maybe get rid of truncatewords: 20 or use a larger number ?

@nickgarlis Using the content in /test I get a lunr-en.js 46 kB file with truncatewords: 20. Removing the truncate it jumps up to 181 kB.

So I think we need to be careful here. I haven't run it through the /docs files, but those are pretty big and include way more posts. I could see the Lunr index quickly approaching a 1 mb. Which is a lot of extra JavaScript load to put on the pages.

And I know there are some users who have hundreds if not thousands of posts. That search index would be mammoth.

I'm fine with providing a basic search experience with Lunr. Then with #1416 start adding other search providers. Algolia is really nice and offloads these potentially large indexes. The tradeoff is you have to jump through some hoops to make it GitHub Pages compatible since it requires a plugin to push up the index, not to mention their free tier caps out at 10k records.

@mmistakes You are right, those files can get pretty big easily, I guess it depends on the user.
I don't know much about Algolia though

@nickgarlis So here's a question. I've been looking at some other implimentations of Lunr.js with Jekyll and came across this one from CloudCannon.

They have things setup slightly different, but I did notice that they aren't using Liquid to spit out JSON for both this.add and var store like you have. Which appears to bloat the .js file since it's spitting out a title, excerpt, category array, tag array, and id twice for every document.

In CC example they only do that for store and pull lunr data from there.

Seems like this would help cut down the size of the index and open it up for increasing the amount of words captured from {{ content }}. They also throw a strip_newlines filter on {{ content }} which may or may not help shrink things down.

What are your thoughts?

Here's an example from my test lunr-en.js

First this.add item

   this.add({
          title: "Lhasa Apso",
          excerpt: "The Lhasa Apso (/藞l蓱藧s蓹 藞忙pso蕣/ lah-s蓹 ap-soh) is a non-sporting dog breed originating in Tibet. It was bred as an...",
          categories: [],
          tags: [],
          id: 0
      })

Which then appears in store

{
        "title": "Lhasa Apso",
        "url": "/test/pets/lhasa-apso/",
        "excerpt": "The Lhasa Apso (/藞l蓱藧s蓹 藞忙pso蕣/ lah-s蓹 ap-soh) is a non-sporting dog breed originating in Tibet. It was bred as an...",
        "teaser":

            null

      }

Their is obviously some different fields, but excerpt is the same and both and likely where the meat of the file bloat is going to come from.

@mmistakes I really like their implementation however, I am not sure if there is going to be a performance gain this way. Don't you think it's better to have large JSON data that's generated with liquid than having to loop inside of a huge JSON file using Javascript (meaning the loop would take place upon loading the page) just to index the pages ? Could this slow things down ?

@nickgarlis I'm questioning more of the DRYness of the current solution. The fact that there are 2 for loops generating similar JSON data in the same file seems wasteful to me. Especially since almost half of the JavaScripts filesize could be saved if there was only a single source of JSON data.

I'm wondering if it can be done with a single for loop with all the post data (title, excerpt, tags, categories, etc.) instead of duplicating most of that twice. If there was a build process attached to this that minified the file and gzip'd maybe some of the duplicated text would be less of an issue. But's not the case right now, unless you use a tool outside of Jekyll to do that after the fact.

Not exactly the same thing but one of my older themes does something similar with a less sophisticated search script. Single JSON file with all the post data, that's queried.

https://mmistakes.github.io/so-simple-theme/search/

@mmistakes Yes you're right its way better this way, you can test it out here . If you don't have anything else to add I can pull request. We could maybe make another script file like they've done in CloudCannon to store our data and that could be used by both lunr-en.js and lunr-gr.js ?

@nickgarlis I like it, great work!

Yeah think it makes sense to create a separate data store file that lunr-lang.js uses. That way it's clear what each file is doing. Do what you think makes sense. As far as naming it I don't have any special requirements. Maybe keep it prefixed with lunr- so it's clear it goes with the others as there will potentially be other search provider scripts.

Might even make sense to break up the Lunr stuff into it's own folder in /assets/js/lunr/.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

alkamid picture alkamid  路  4Comments

asarkar picture asarkar  路  4Comments

floatingpurr picture floatingpurr  路  3Comments

z0ph picture z0ph  路  3Comments

ashleyconnor picture ashleyconnor  路  4Comments