Nim: allow github-like search syntax in documentation browser

Created on 18 Oct 2018 · 17Comments · Source: nim-lang/Nim

TLDR

support github-like syntax in documentation search box, example:

# uses a limit of 30 search results; uses exact instead of fuzzy search
exact: "my_exact_search_term" limit:30

details

(originally proposed here: https://github.com/nim-lang/Nim/issues/9406#issuecomment-431126906)

I really like @rayman22201 's suggested exact: "my_exact_search_term" syntax, and we should allow future extensions in a way similar to github issue search (see https://help.github.com/articles/searching-issues-and-pull-requests/)

It would solve a number of issues, where there's no way to satisfy all needs ( too many / too little search results, too fuzzy / too strict, etc). Instead of guessing what user wants, let the user specify it, eg:

[docgen] search box shows irrelevant symbols: comandlines => colors: colMaroon #9406
[docgen] The search box doesn't find certain symbols, eg: fmt, multisync #9198
It is not easy to find generic procs like system.find() - Nim forum (arbitrary 19 hard limit on search results)

The simple search box can provide all user options in pure text (as opposed to lots of UI buttons/checkboxes etc); same as in google search or github issue search:
It's trivially extensible in case we need to down the line;
eg:

foobar # uses sane defaults, could be same as current search algorithm parameters
exact: "my_exact_search_term" limit:30 # uses a limit of 30 search results
regex: "color\w+"
fuzzy: kolor
in:doccomment "alias for"
in:params typedesc
foobar sort:alpha

Note: using search query box to specify search options is better than a complex UI with buttons/checkboxes: easier to copy paste / share, easier to maintain.

Most helpful comment

9198 Is a legitimate bug that I need to look into with the fuzzy matcher.

I'm not completely opposed to this idea, though I would like to get some other opinions / votes, because it adds a lot of complexity to the search.

I do have some thoughts and questions about it:

What would the default search mode be? A new user is not going to use this syntax intrinsically. Most people will just put a term in the search box.

My original goal for the fuzzy search was to make the search more friendly for new users, not more complex.

There needs to be very clear and obvious documentation for this search syntax and it needs to live in an easy to find place
Some of these modes would be very complex to implement / require a big change to the search mechanism. Your "in:params" or "in:doccomment" for example, would require a ctags like index built from the source code, or tap into the AST in a similar way to nimsuggest. This is a hard problem.
Also, I think it's a non-goal to fully replicate github search. This is a doc search, not a code source search. There are other / better tools for that.
The search mechanism currently currently uses the index that is generated from the docs as it's database. It's not a full text search, or a source code search, or anything that fancy.
It's actually fairly simple, and I think that is by design. I don't think we need to go overboard with search modes.
I think this idea is orthogonal to having a separate page that just shows search results. That is a standard practice that many websites use, including github. it's familiar to users, and reduces clutter on the actual doc pages. It's also much less complicated then having to remember to type "limit: 30" all the time.
Speaking of the "limit: 30" syntax, if you don't do specify a limit, what is the default? If you forget one of these parameters, sensible defaults need to be in place. Another reason I think a separate search results page is a better solution to this particular problem of limited search results.

In the end, my biggest issue is that that this kind of complex search command syntax is a "pro user" feature, that many people do not use naturally, especially new users, and those are the people that benefit the most from document search.

rayman22201 on 19 Oct 2018

👍4

All 17 comments

9198 Is a legitimate bug that I need to look into with the fuzzy matcher.

I'm not completely opposed to this idea, though I would like to get some other opinions / votes, because it adds a lot of complexity to the search.

I do have some thoughts and questions about it:

What would the default search mode be? A new user is not going to use this syntax intrinsically. Most people will just put a term in the search box.

My original goal for the fuzzy search was to make the search more friendly for new users, not more complex.

There needs to be very clear and obvious documentation for this search syntax and it needs to live in an easy to find place
Some of these modes would be very complex to implement / require a big change to the search mechanism. Your "in:params" or "in:doccomment" for example, would require a ctags like index built from the source code, or tap into the AST in a similar way to nimsuggest. This is a hard problem.
Also, I think it's a non-goal to fully replicate github search. This is a doc search, not a code source search. There are other / better tools for that.
The search mechanism currently currently uses the index that is generated from the docs as it's database. It's not a full text search, or a source code search, or anything that fancy.
It's actually fairly simple, and I think that is by design. I don't think we need to go overboard with search modes.
I think this idea is orthogonal to having a separate page that just shows search results. That is a standard practice that many websites use, including github. it's familiar to users, and reduces clutter on the actual doc pages. It's also much less complicated then having to remember to type "limit: 30" all the time.
Speaking of the "limit: 30" syntax, if you don't do specify a limit, what is the default? If you forget one of these parameters, sensible defaults need to be in place. Another reason I think a separate search results page is a better solution to this particular problem of limited search results.

rayman22201 on 19 Oct 2018

👍4

I don't meant to overengineer this, I gave example features we could envision adding down the line, in a way that's relatively familiar (google search box, github issue search) and extensible and doesn't involve changing UI, unlike what was suggested here which involved adding checkboxes.

What would the default search mode be?
Speaking of the "limit: 30" syntax, if you don't do specify a limit, what is the default

the same as what we're using currently; ie as sane defaults as possible

timotheecour on 19 Oct 2018

I don't meant to overengineer this

Fair enough. Sorry if I came off aggressive. I just want to make sure the scope is well specified. This is the kind of feature that is an easy target for feature creep and yak shaving.

rayman22201 on 19 Oct 2018

I wonder whether it's realistic to search docs using JS only. Shouldn't we be moving to a separate search project ala hoogle?

Big advantage of this would be that you could just make it work exactly as you wish and we wouldn't need this discussion. (Plus, better performance)

dom96 on 19 Oct 2018

Well one could use wasm for performance but ya, it'd still have limitations; in any case I'm all for a search backed by a server instead of just javascript.
Additional features would be:

search across nimble packages (sometimes not clear in which package is a symbol defined esp with nested dependencies)
render source code online with clickable links so you can "jump to definition"
option to serve it locally (hence, user can customize if needed)

regarding performance:
if server is hosted (not local), it's not clear it would be faster because of network issues
if server is local, then for sure, but we'd need an easy setup (ala nimble install noogle)

timotheecour on 19 Oct 2018

@dom96 Moving to a third party search engine is fine for Nim itself and the stdlib, but does nothing to help doc generation in the general case, or for private nim code that needs doc generation.

Are suggesting that the search feature be removed from the generic case of doc generation completely, and let it be the users problem?

Or are you suggesting implementing a server component to nim doc generation as @timotheecour seems to be implying? Which adds complexity in a different way.

I know pydoc and sphinx for python all provide search features for doc generation. What do other languages do?

The performance isn't really an issue imo. The current code searches the Nim index itself very quickly, and that's a pretty big project for Nim. It obviously will hit limits eventually, but for even the medium term future I don't see that as a huge issue.

rayman22201 on 19 Oct 2018

👍1

I think we should have a global documentation search that is not limited to the standard library this here in go: https://godoc.org/

krux02 on 19 Oct 2018

Ideally the suggested docsearch server and nimsuggest (or whatever tool solves https://github.com/nim-lang/Nim/issues/8747) would share some code, so that we don't duplicate core functionality.

timotheecour on 19 Oct 2018

I think we should have a global documentation search that is not limited to the standard library this here in go: https://godoc.org/

But what about private codebases?

Either way, this sounds waaaay out of scope for this issue.

If a docsearch server is something people want, it should be a separate feature request.
Integrating it into nimsuggest is a very good idea for this. But that belongs in a nimsuggest feature request, or even a new project entirely, but not here.

Honestly my main concern is making the current solution work well enough for now, which aside from a few bugs seems to be the case, unless people tell me otherwise.

My intention with the fuzzy match was never to re-architect the search infrastructure.
It really wasn't a big change! Essentially all I did was replace the simple regex search with a slightly fancier function. Dochack.js is 99% the same as it has been since Araq first wrote it.

rayman22201 on 19 Oct 2018

Are suggesting that the search feature be removed from the generic case of doc generation completely, and let it be the users problem?

Or are you suggesting implementing a server component to nim doc generation as @timotheecour seems to be implying? Which adds complexity in a different way.

No no no no. I'm suggesting @timotheecour (or someone else who's passionate about this) create a separate project that acts as a search engine for Nim's stdlib + nimble packages.

It doesn't have to mean that we remove client-side doc search and it sure as hell doesn't mean adding a server component into nim doc (ever heard of separation of concerns?)

Ideally the suggested docsearch server and nimsuggest (or whatever tool solves #8747) would share some code, so that we don't duplicate core functionality.

~~Please no. Just create a separate clean project for this.~~ Maybe I'm imagining the worst case scenario here. I suppose a package that handles Nim code search that's usable by nimsuggest and any other project would be fine.

dom96 on 20 Oct 2018

@dom96
I agree with your points; one point though:

a package that handles Nim code search that's usable by nimsuggest

can nimsuggest depend on an external nimble package though?
or should it be the other way around? The only thing that's needed is exposing the nimsuggest querying as an API;
I wonder whether depending on compilerapi is enough for external packages for that purpose?

timotheecour on 20 Oct 2018

That's one reason why putting everything in the Nim repo sucks. Eventually
we will want tools like nimsuggest to depend on nimble packages (unless
@Araq has some ideas on how to handle this)

Putting this functionality in nimsuggest and then having other packages
depend on it sucks too.

On Sat, 20 Oct 2018, 01:14 Timothee Cour, notifications@github.com wrote:

@dom96 https://github.com/dom96
I agree with your points; one point though:

a package that handles Nim code search that's usable by nimsuggest

can nimsuggest depend on an external nimble package though?
or should it be the other way around? The only thing that's needed is
exposing the nimsuggest querying as an API;
I wonder whether depending on compilerapi is enough for external packages
for that purpose?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nim-lang/Nim/issues/9431#issuecomment-431531768, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAPDezOkgLaHq0ffEMRnZnfIGH7VyX_vks5ummrIgaJpZM4Xuzfi
.

dom96 on 20 Oct 2018

here's what @rayman22201 and I discussed on IRC (https://gitter.im/nim-lang/Nim?at=5bca3e713844923661674718)

make code that exports index.html also export indexdata.json (we can integrate the 2 later so that later index.html reads indexdata.json)
this is 90% about fixing this pre-existing issue: [search] --json2 doesn' work with ./koch web (needed to get theindex.json for tooling) · Issue #8495 · nim-lang/Nim
add a cmdline tool wrapping docsearch utility (could be simply a isMainModule block somewhere in tools/dochack/ ) as follows:

./docsearch --index:indexdata.json --query:foobar
./docsearch --index:indexdata.json --queryFile:queries.json
./docsearch --index:indexdata.json --queryFile:queries.json --evaluate:true

queries.json optionally contains groundtruth for each query, allowing you to evaluate precision/recall (or your favorite metric from https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval))

tweak params and experiment with better fuzzy search; we can now get an objective score for these automatically instead of “guesstimating" how well it works with a few queries on the web UI

timotheecour on 20 Oct 2018

@dom96 It doesn't have to mean that we remove client-side doc search and it sure as hell doesn't mean adding a server component into nim doc (ever heard of separation of concerns?)

@dom96 I completely agree with all the points about separation of concerns. I'm not sure if this can be so cleanly separated though.

@timotheecour : I wonder whether depending on compilerapi is enough for external packages for that purpose?

This is an important question. what is the right separation?

Is it:
compilerapi -> nimsuggest
compilerapi -> docsearch

or is it:
compilerapi -> nimsuggest -> docsearch

edit:
-> means api boundry / module dependency.
so a -> b means b depends on a

And of course the related question of whether those tools live inside the main Nim repo or are separate nimble packages are valid questions also. I think that comes down to how @Araq wants to organize the project at the end of the day.

A tangential but related question: how would the generated doc html interact / be affected by this?
It's still not clear to me.

Would you choose which search you want when building the docs? "docsearch server" or "client side search"?

If docsearch is a separate thing, then what is the interface? what does this look like?

If docsearch only works via command line or as an api server, then it just sounds like a new mode for nimsuggest akin to LSP.
If docsearch generates it's own html, then it's essentially a competing doc generation tool. That sounds like unnecessary fragmentation and confusing.
a third option I'm not thinking of? what are everyone's thoughts on this?

rayman22201 on 20 Oct 2018

[just answering on 1 aspect here]
in an ideal world an external project should be able to do whatever's currently builtin in Nim (eg: nim doc, nimsuggest, nimfix), possibly via compilerapi. Ie, Nim would have no "superpowers" or special priviledges compared to what's possible in library code. This would allow customization without having to hack nim compiler or maintain a separate fork.
Whether this is possible or how hard it'd be to allow that, I don't know.

timotheecour on 20 Oct 2018

👍1

@timotheecour can you put an example at the top on how you would like to have the search syntax. I would like the issue to with a summary of the issue and an example. Links to external discussions are nice information, but they are very verbose and don't get to the point.

krux02 on 20 Oct 2018

added a TLDR

timotheecour on 21 Oct 2018

Was this page helpful?

0 / 5 - 0 ratings