Bleve indexer is very inefficient: it uses a lot of disk space and a lot of memory. Also it keeps all index mmap'ed all the time which makes Gitea crash when I enable it on my 32-bit server with just 1.4gb of git repositories after generating 2GB of index.
There exist a lot of more popular and efficient full-text search engines, starting with ones built into Postgres / MySQL / SQLite (MySQL's one is not the most efficient one, but it still works). Then there's Elasticsearch and so on. Index sizes are much smaller in Elasticsearch and Postgres (compared to size of indexed data).
We are refactoring issue indexer, after that, we will start to refactor code indexer. Some PRs you can find, i.e. https://github.com/go-gitea/gitea/pull/6150
Hi I've also seen in the configuration files that there're two types of ISSUE_INDEXER_TYPE available. What is the differences between "db" and "bleve"? Is it safe to change from "bleve" into "db" in config files and then a simple restart?
@adamcavendish db will use database's Like
to search issues. Your operations are safe. But both types are inefficient.
A proper search support could fix things like #5694 #5277 #3448, #2967, #2434, #8366, #8386, #7825, #10147 and #10764 if implemented properly. Those might not be the "same" but their cause is [probably] the current indexing/searching implementation.
And I guess it would also help to lower the amount of memory related bugs (i.e. #4807).
Would love to help with the implementation for the elasticsearch code search backend! @lunny @jeblair
Just a note that SQLite has FTS indexing, and that it is quite efficient (I have gigabytes of plain text files indexed that way)
So no built-in, single binary SQLite search?
@rcarmo issues index support built-in SQLite search, code index support built-in bleve search.
Most helpful comment
Would love to help with the implementation for the elasticsearch code search backend! @lunny @jeblair