Peertube: Improve search relevance

Created on 11 Oct 2018  Â·  12Comments  Â·  Source: Chocobozzz/PeerTube

For example I wanna find this video that has this title: _JDLL2016 - Utilisation de Docker au dela du portable d'un codeur avec Kubernetes et Atomic - Michael Scherer_

From framatube.org if I search for:
_Utilisation de_
then in the search results I see the right video and this is good but if I add another keyword, for example:
_Utilisation de atomic_
then I don't see the right video in the search results, but all I did was adding another keyword from the same title so it should have narrowed down the search results instead of not showing the video at all.

Search Type

Most helpful comment

I think a good search engine should be part of core because it's an essential tool that many rely on

I agree, as soon as it does not make PeerTube installation more complex (i.e need a running elasticsearch instance first)

All 12 comments

We have two different ways to search videos in the database:

  • Using the LIKE '%str%' :arrow_right: it's the reason why PeerTube finds the video in your first search
  • Using the similarity

In your second search, you don't specify some words between Utilisation de and atomic so the LIKE '%str%' pattern does not return anything.

The similarity does not return anything either, because the score is ~0.23 (should be >= 0.3).

Maybe replacing the LIKE pattern by word_similarity (https://www.postgresql.org/docs/9.6/static/pgtrgm.html) would be better, but it is only compatible with PostgreSQL >= 9.6

usually people just search for keywords _(house, texas, buy)_ and not complete sentences _(I wanna buy a house in texas)_, so I'd say to use the similarity way

I don't think we'll get great search quality by relying on postgres to scan its tables with a word-similarity filter... I'd really prefer to interpret this issue as a request to integrate a proper fulltext search engine.

I think if you want the best search capabilities, then we should make the search functionality work from elasticsearch, we'd have to connect an ES connector to the database and make it so any and all search queries go thru the ES instead of trying to fix the result to come more accurately from postgres database, it's better to invest that time to implement ES functionality instead, IMHO

Here is a nice one that connects ES to postgres for the best search functionality you will find. https://github.com/zombodb/zombodb

I don't think we'll get great search quality by relying on postgres to scan its tables with a word-similarity filter...

Why?

I'd really prefer to interpret this issue as a request to integrate a proper fulltext search engine.

PostgreSQL supports full text search but I'm not sure to understand why it would be better than word_similarity on a trigram index.

I mean, we could perhaps eventually build an okay search based on postgres trigrams, but that’s a very low level primitive compared to what i would expect in a modern fulltext search.

For example, a search for “hotdog sandwiches” isn’t lexically especially close to “pork chop sandwiches” but on an index that includes semantic information like word synonyms they could be, and Lucene supports features like this that will help improve search quality.

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html

I still think the old title for this issue was more accurate :D
For instance, for some reason, if you search for _beapal_ (which is an account name [email protected]) the search will not show any search results.

we could perhaps eventually build an okay search based on postgres trigrams, but that’s a very low level primitive compared to what i would expect in a modern fulltext search.

I agree with that. We can try to enhance more and more the search results. But I think we will still have cases that search won't returns expected results.

Last exemple: https://peertube.video I search for e = m6, nothing, but e=m6 returns results.

So I think a solution like elastic search is relevant, however it may complexify installation and development environment.

Is a plugin should be a better solution ? I mean basic search is okay, and don't need extra step to be installed, which is helpful. But if we want a better search, up to us to install an elasticsearch instance (or other), and install a plugin that replaces search bar, calls elasticsearch, run indexation...

Last exemple: https://peertube.video I search for e = m6, nothing, but e=m6 returns results.

I feel you. I use duckduckgo sometimes to search for peertube videos because the built-in search feature isn't reliable at this point in time.

Is a plugin should be a better solution ?

I personally don't think so. I think a good search engine should be part of core because it's an essential tool that many rely on.

I think a good search engine should be part of core because it's an essential tool that many rely on

I agree, as soon as it does not make PeerTube installation more complex (i.e need a running elasticsearch instance first)

We made some improvements to the search system since the creation of this issue. We'll use elastic search as a third party service to implement https://github.com/Chocobozzz/PeerTube/issues/824

Was this page helpful?
0 / 5 - 0 ratings

Related issues

gegeweb picture gegeweb  Â·  3Comments

Jorropo picture Jorropo  Â·  3Comments

ChameleonScales picture ChameleonScales  Â·  3Comments

sschueller picture sschueller  Â·  3Comments

filmaidykai picture filmaidykai  Â·  3Comments