Plots2: Brainstorming search results that are autosuggested and shown on results page

Created on 28 Feb 2018 · 15Comments · Source: publiclab/plots2

Update: this is a long conversation and there are some next steps being broken out. Please continue to use this issue for brainstorming! Thanks :)

Original issue continues below:

Please describe the problem

The system by which autosuggested results seems to choose and rank content suggestions is mysterious, and seems like a black box.

Autosuggested results have a display limit of 15 assorted content types, but do not provide an overview of Public Lab resources on a topic.

What did you expect to see that you didn't?

I expect to understand what the results mean.

Please show us where to look

The Search box in the menu bar

break-me-up enhancement more-detail-please

Source

ebarry

Most helpful comment

So I left some maybe not super helpful comments on https://github.com/publiclab/plots2/pull/3286 -- and just pulling it back here, I want to highlight that one of the questions we try to answer may need to be:

What is the best default sorting AND default search type for /each result type/ -- acknowledging that the best ordering for nodes might not make sense for profiles.

Make sense?

jywarren on 6 Sep 2018

🎉1 👍1

All 15 comments

It is actually a black box! Full text search is a complex problem and we solve it with the "fulltext" module of MySQL, our database system; some pretty arcane (but thorough) documentation is here: https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html

It does seem we can tune/adjust it, though. There is, for example, a "natural language" option which attempts to algorithmically determine "relevance" -- https://dev.mysql.com/doc/refman/5.7/en/fulltext-natural-language.html

We use this fulltext feature on this line:

https://github.com/publiclab/plots2/blob/47b30044aa26027f7eb083bce56dd607a5f73d02/app/models/node.rb#L29

It does look like we could "turn on" natural language mode by making that say:

    Revision.where('MATCH(node_revisions.body, node_revisions.title) AGAINST(? IN NATURAL LANGUAGE MODE)', query)

We may also need to then add ordering by relevance -- so, i /think/ that would be:

Revision.select('node_revisions.body, node_revisions.title, MATCH(node_revisions.body, node_revisions.title) AGAINST("' + query.to_s + '" IN NATURAL LANGUAGE MODE) AS score')
  .where('MATCH(node_revisions.body, node_revisions.title) AGAINST(? IN NATURAL LANGUAGE MODE)', query)

It might take some testing out.

Would you like to try this out? I have to point out that I do NOT know what will happen. The documentation for "natural language" says:

Every correct word in the collection and in the query is weighted according to its significance in the collection or query. Thus, a word that is present in many documents has a lower weight, because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are combined to compute the relevance of the row. This technique works best with large collections.

As to the second issue, --

...but do not provide an overview of Public Lab resources on a topic.

I expect to understand what the results mean.

How might we break this down a bit? Do you mean that you'd like to show a mix of types, or that you'd like to show explanatory information about what different types are?

Thanks!

jywarren on 28 Feb 2018

🎉1

I tested the above query and it does run, although again, I'm not super clear on how it works. But it'd be pretty easy to put it into production if you'd like!

jywarren on 28 Feb 2018

What I'd like to see in the auto-suggest is a list of search terms based on weight (popular, busy pages first). On the results page I would like to see keyword results weighted by relevance (popularity, whether the word in question is included in a tag or a title, etc), and then sorted by type (note, profile, question, comment, etc). I would then like to be able to search within the keyword results (say, I'm interested in spectrometers, but would like to narrow down my search to find examples of how they've been used in schools)

bronwen9 on 28 Feb 2018

Hi, Bronwen, thanks. Let's break this into separate features:

[ ] auto-suggest search ordered by popularity (is this # of views, or likes, or another preference?)
[x] results page ordered by relevance (popularity, whether the word in question is included in a tag or a title, etc)
[ ] results page displays each type (note, profile, question, comment, etc) separately -- like this, for example? https://publiclab.org/search/dynamic (that page doesn't work well yet)
[ ] ability to refine search within the keyword results (say, I'm interested in spectrometers, but would like to narrow down my search to find examples of how they've been used in schools) -- how would you specify this, do you think? Could you continue typing in the search input and see the results narrow more? Or is there another interface you'd like to suggest?

Thanks! This is super helpful.

jywarren on 28 Feb 2018

And for the second one up there, do you mean not "relevance" as is defined in my comment above about "natural language search" but a definition of popularity such as "likes" or "views"?

jywarren on 28 Feb 2018

I think we'd probably want to create a rubric for relevance could includes likes/views, but also weights results based on KIND of page (a wiki page with search term in the title might always show up higher on a list than, say, a comment).

One example where we're struggling with kinds of results is a search for "open hour. On our website, this search brings up 15 research notes in the auto suggest, and two research notes on the keyword search, but none of them direct to our Open Hour page. I do think a popularity ranking would help with this, and might be simpler than introducing a semantic search feature, but I can see either offering improvements.

When I perform the same search on google (without boolean operators), I see a list or results that starts with our main open hour page, followed by items tagged with "openhour" and "open-hour", followed by links to pages for individual open hours. This would seem to be a sensible rubric for page-type sorting (providing that it's still possible to browse or narrow searchers for all occurrences of a search term on our site)

openhour

openhour2

bronwen9 on 1 Mar 2018

👍1

Cool - super helpful. I think there's probably a way to do a more complex
ranking (maybe not Google-level pageRank but something) however I wonder if
we took a few proposals and made them testable, and examined the results.
For example it'd be pretty easy to set up views-based or likes-based
ordering, and not much harder to do natural language relevance as I
outlined above. If we made an option to view results for a given search
query in all three, we could see which seems to work better for us.

If that sounds good, we can start those code changes and have something to
look at in a week or so; what do you think of that as a next step? We could
tackle this iteratively and look at more advanced search rubrics as a
follow-up?

Thanks!!

On Thu, Mar 1, 2018, 10:02 AM bronwen9 notifications@github.com wrote:

I think we'd probably want to create a rubric for relevance could includes
likes/views, but also weights results based on KIND of page (a wiki page
with search term in the title might always show up higher on a list than,
say, a comment).

One example where we're struggling with kinds of results is a search
for "open hour. On our website, this search brings up 15 research notes in
the auto suggest, and two research notes on the keyword search, but none of
them direct to our Open Hour page. I do think a popularity ranking would
help with this, and might be simpler than introducing a semantic search
feature, but I can see either offering improvements.

When I perform the same search on google (without boolean operators), I
see a list or results that starts with our main open hour page, followed by
items tagged with "openhour" and "open-hour", followed by links to pages
for individual open hours. This would seem to be a sensible rubric for
page-type sorting (providing that it's still possible to browse or narrow
searchers for all occurrences of a search term on our site)

[image: openhour]
https://user-images.githubusercontent.com/8331717/36850950-07ea6d18-1d36-11e8-8ed6-e80faf55bba4.gif

[image: openhour2]
https://user-images.githubusercontent.com/8331717/36851397-1cfe5466-1d37-11e8-89bc-bc21bf98c4a7.gif

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/publiclab/plots2/issues/2421#issuecomment-369619020,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABfJxGXx4qzmp9kf39jrk3Rly_N9qa7ks5taA05gaJpZM4SXKWA
.

jywarren on 1 Mar 2018

Ah, sorry for the late response, but I think that it would be great to try some of these. I think at some point we're going to need the ability to work with boolean operators (whether that's through additional search fields or allowing for more than one word or phrase in the field), but I think any of these options would help get us closer to understanding where things are going haywire in the existing search. Plus-one to trying all three!

bronwen9 on 7 Mar 2018

Work now ongoing in #2518 -- this will result in:

https://publiclab.org/search/pumpkins?order=natural for natural language match ordering
https://publiclab.org/search/pumpkins?order=likes for ordering by likes
https://publiclab.org/search/pumpkins?order=views for ordering by views
https://publiclab.org/search/pumpkins for ordering by original page creation dates

Soon!

(update: now live on the site!)

jywarren on 20 Mar 2018

🎉1

Hi, this needs some review and reorganization now that the above searches work -- @bronwen9 and @ebarry -- thanks for your help so far! Some additional steps might be:

[x] create a button or set of links to change the sorting on pages like https://publiclab.org/search/oil-spill
[ ] choose one of these as the default sorting for the typeahead auto-complete suggestions

Also just cleaning up the lead of this issue a bit or starting a new one with our next steps clearly laid out would be helpful! Thanks!

jywarren on 25 Mar 2018

🎉1 😄1

As the dynamic search work is upcoming (as per your original schedule), I'm not sure if this one is on your radar, @milaaraujo and @stefannibrasil -- what do you think?

jywarren on 21 Aug 2018

👍1

we have some few things to finish this week, we are planning to start working on improving the search next week!

stefannibrasil on 22 Aug 2018

@ebarry @bronwen9 @jywarren we started addressing some of your concerns here #3295. Please keep in mind that this PR is mostly on the front-end, but it will help with our planning! :)

stefannibrasil on 5 Sep 2018

🎉1 👍1

I have some notes to share with you, but I need to organize them better before sharing with you xD

stefannibrasil on 5 Sep 2018

🎉1 👍1

What is the best default sorting AND default search type for /each result type/ -- acknowledging that the best ordering for nodes might not make sense for profiles.

Make sense?

jywarren on 6 Sep 2018

🎉1 👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Add tags in modals of Nodes in Spam2

keshavsethi · 3Comments

Digest for Unmoderated Notes

keshavsethi · 3Comments

match hyphens in tags search on /tags with JS Typeahead library

jywarren · 3Comments

Fix "Insert Header" Toolbar Button Responsive Styling

noi5e · 3Comments

comments on questions not editiable despite edit button

shapironick · 3Comments