Gitea: Elastic Search Issue Indexer fuzzy search

Created on 19 Jun 2020  路  11Comments  路  Source: go-gitea/gitea

Description

I have set up elasticsearch with the following settings:

[indexer]
REPO_INDEXER_ENABLED = true
ISSUE_INDEXER_TYPE: elasticsearch
ISSUE_INDEXER_CONN_STR: http://localhost:9200
ISSUE_INDEXER_NAME: gitea_issues

I have created a test issue with the text "bla bla bla mr. freeman" and I am trying to find it using the issue search. I've done the same thing on the try.gitea.io test instance:

Issue: https://try.gitea.io/thedoginthewok/test_issue_search/issues/1
Search: https://try.gitea.io/issues?type=your_repositories&repos=%5B%5D&sort=&state=open&q=freema

On the test instance, the issue is successfully found. On my instance, I can only find the instance if I search for the complete word freeman.

Is there any way to configure a fuzzy search for elastic?

Screenshots

My instance with search term freeman:
grafik

My instance with search term freema:
grafik

kinenhancement revieweconfirmed

Most helpful comment

Elastic search query should be improved

All 11 comments

Maybe you mean

[indexer]
REPO_INDEXER_ENABLED = true
ISSUE_INDEXER_TYPE = elasticsearch
ISSUE_INDEXER_CONN_STR = http://localhost:9200
ISSUE_INDEXER_NAME = gitea_issues

I've changed it to "=", but it behaves the same.

New log gist: https://gist.github.com/thedoginthewok/eaa51d81d8a82f13145ff7be1c56888b

This part is interesting to me:

2020/06/19 15:25:41 ...elastic/v7/client.go:848:dumpRequest() [T] POST /gitea_issues/_search HTTP/1.1\01503d
    Host: localhost:9200\01503d
    User-Agent: elastic/7.0.9 (linux-amd64)\01503d
    Transfer-Encoding: chunked\01503d
    Accept: application/json\01503d
    Content-Type: application/json\01503d
    Accept-Encoding: gzip\01503d
    \01503d
    b5\01503d
    {"from":0,"query":{"bool":{"must":[{"multi_match":{"fields":["title","content","comments"],"query":"freema"}},{"terms":{"repo_id":[1]}}]}},"size":50,"sort":[{"id":{"order":"asc"}}]}\01503d
    0\01503d
    \01503d

2020/06/19 15:25:41 ...elastic/v7/client.go:858:dumpResponse() [T] HTTP/1.1 200 OK\01503d
    Content-Type: application/json; charset=UTF-8\01503d
    \01503d
    {"took":4,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":0,"relation":"eq"},"max_score":null,"hits":[]}}

Try searching freema*

grafik

Nope.
What is the try.gitea.io instance running on?

It just uses database search

So, probably with LIKE '%SEARCHTERM%'.

This is a bug in the elasticsearch indexer, right?
Or is it supposed to work this way?

Elastic search query should be improved

That's because how we use elastic search. Below is the configuration from the source,

"mappings": {
            "properties": {
                "id": {
                    "type": "integer",
                    "index": true
                },
                "repo_id": {
                    "type": "integer",
                    "index": true
                },
                "title": {
                    "type": "text",
                    "index": true
                },
                "content": {
                    "type": "text",
                    "index": true
                },
                "comments": {
                    "type" : "text",
                    "index": true
                }
            }
        }

We should change the configuration to resolve the problem?

This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions.

This issue has been automatically closed because of inactivity. You can re-open it if needed.

unstale

Was this page helpful?
0 / 5 - 0 ratings