[x]
):I enabled the indexer. It has been running for couple days since then. I am able to search and get some results but some results return no results by the Code search page meanwhile I can get 10s on results for with grep
For the term "tool_set" in the Code search page I get No source code matching your search term found.
Grepping the same code base (eve after deleting the comment lines)
find -type f -name "*.py" -exec grep -i 'tool_set' {} \; |sed '/#/d' |wc -l
44
ini
[indexer]
REPO_INDEXER_ENABLED = true
ISSUE_INDEXER_PATH: indexers/issues.bleve
REPO_INDEXER_PATH: indexers/repos.bleve
UPDATE_BUFFER_LEN: 20
MAX_FILE_SIZE: 1048576
The indexer itself can handle your case. I've specifically tested with tool_set
and it was indexed correctly when I ran the indexer from scratch. The indexer is having some problems, however, because I'm getting errors in the log I can't pinpoint like:
2020/01/22 21:04:05 ...ndexer/code/queue.go:39:processRepoIndexerOperationQueue() [E] indexer.Index: exit status 1
/home/gprandi/src/code.gitea.io/gitea/modules/indexer/code/queue.go:39 (0x187f5f7)
processRepoIndexerOperationQueue: log.Error("indexer.Index: %v", err)
/home/gprandi/go/src/runtime/asm_amd64.s:1357 (0x46f5d0)
goexit: BYTE $0x90 // NOP
Which clogs the indexer queue. If I restart the instance and commit new changes to the repository, the indexer seems to pick them up correctly.
The indexer is expected to take a "long time" to build, but not _days_. It took a couple of minutes to build from scratch my indexes on 327 MB of repositories.
That is interesting.Where is a good place to see the indexer having issues? I did grep on he gitea log but not much that I can see
https://paste.debian.net/hidden/3591c5ca/
I also wonder if there is a limit to the size of the indexer db, mine is it at 285mb now and I have many repos in there.
My log configuration in app.ini
:
[log]
MODE = file
MAX_DAYS = 15
LEVEL = Info
ROUTER = file
ROUTER_LOG_LEVEL = Trace
STACKTRACE_LEVEL = Error
XORM = file
REDIRECT_MACARON_LOG = true
[log.file.xorm]
FILE_NAME = xorm.log
(It's a little redacted, so maybe not all options make sense)
This separates the SQL (XORM) log from the other logs, making everything cleaner. I've also set up a trace to every error, so I know exactly where every log is produced.
To get a meaningful log I stopped Gitea and deleted the repos.bleve
directory to force the system to rebuild them when restarted. You'll know it finished when it stops growing (which is _not necessarily_ when the log says it does... in fact my log was not useful about that).
Then I've edited a file using the web UI, and when the indexer attempted to do its thing, it crashed.
(NOTE: your paste doesn't say much, unfortunately)
@guillep2k what's the gitea version?
@lunny I've tested on master
as of today. (53f9dbfc7bd322a439bd6c6582d69506c7244384)
I also wonder if there is a limit to the size of the indexer db, mine is it at 285mb now and I have many repos in there.
BTW, the indexes of my prod instance are 1.3GB from 1.4GB of repositories (working fine on Gitea 1.10.3).
@guillep2k I will test with the latest rc2 from today. I will delete the database and force it again.
Btw is there a way to force the indexer while gitea is running?
Btw is there a way to force the indexer while gitea is running?
If by force you mean rebuild all, no, there isn't. But files are re-indexed with each commit (only the affected files, the whole file is re-indexed, not just the diff).
Hmm the latest rc2 fails on me with
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 26.358碌s
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 44.862碌s
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 34.792碌s
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `name` FROM `user` WHERE `id`=? LIMIT 1 []interface {}{1} - took: 27.177碌s
2020/01/22 22:52:07 ...exer/code/indexer.go:54:func2() [I] PID: 3759700 Initializing Repository Indexer at: /opt/gitea/indexers/repos.bleve
2020/01/22 22:52:07 ...er/issues/indexer.go:142:func2() [I] PID 3759700: Initializing Issue Indexer: bleve
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `pull_request`.`id` FROM `pull_request` WHERE (status=?) []interface {}{1} - took: 158.714碌s
2020/01/22 22:52:07 .../xorm/session_raw.go:78:queryRows() [I] [SQL] SELECT `id`, `repo_id`, `hook_id`, `uuid`, `type`, `url`, `signature`, `payload_content`, `http_method`, `content_type`, `event_type`, `is_ssl`, `is_delivered`, `delivered`, `is_succeed`, `request_content`, `response_content` FROM `hook_task` WHERE (is_delivered=?) []interface {}{false} - took: 213.028碌s
2020/01/22 22:52:07 routers/init.go:122:GlobalInit() [I] SQLite3 Supported
2020/01/22 22:52:07 routers/init.go:46:checkRunMode() [I] Run Mode: Production
2020/01/22 22:52:07 ...ndexer/code/bleve.go:228:Close() [D] Closing repo indexer
2020/01/22 22:52:07 ...ndexer/code/bleve.go:235:Close() [I] PID: 3759700 Repository Indexer closed
2020/01/22 22:52:07 ...exer/code/indexer.go:63:func2() [F] PID: 3759700 Unable to initialize the Repository Indexer at path: /opt/gitea/indexers/repos.bleve Error: error parsing mapping JSON: unexpected end of JSON input
mapping contents:
/go/src/code.gitea.io/gitea/modules/indexer/code/indexer.go:63 (0x124255f)
/usr/local/go/src/runtime/asm_amd64.s:1357 (0x466c70)
Could you find the file rupture_sharded_meta.json
on indexer directory ?
There is no rupture_sharded_meta.json
find -L -type f|grep -i rupt
./indexers/issues.bleve/rupture_meta.json
./indexers/repos.bleve/rupture_meta.json
@gerroon could you paste the content of that two files?
cat issues.bleve/rupture_meta.json repos.bleve/rupture_meta.json
{"version":1}{"version":4}
Ok, I deleted the whole indexer thing, installed the latest nightly (v1.11.0-rc2) . The database grew to 3gb
-rw-r--r-- 1 git git 47 Jan 23 00:04 index_meta.json
-rw-r--r-- 1 git git 13 Jan 23 00:04 rupture_meta.json
-rw------- 1 git git 3.0G Jan 23 08:27 store
However it still cant find tool_set
I did another search for builtin
. It located about 30
searches in the whole GItea contolled repos. Since I do not have the clones of all the repos, I made a search in the largest one I cloned for builtin
It returned and the difference is by huge magnitutes, not even close (30 vs 542
).
grep -ir "builtin" *|wc -l
542
One thing I am seeing is that 183.27 K/s 0.00 B/s 0.00 % 95.49 % gitea web -c /opt/gitea/custom/conf/app.ini
doing constant reading (holding %99 of the system io) without writing and never giving up whatever it is doing. And the database store
file was last updated like 4 hours ago. So whatever is reading from the disk is not written back given that the database file has not been updated for like 4 hours?
Here is the lsof for gitea
1 unix 33206 type=STREAM
2 unix 33206 type=STREAM
3 REG 0x30 869488 66071775 /media/DRIVE/_TEMP/LOG/gitea/gitea.log
4 a_inode 0xe 0 8828 [eventpoll]
5 REG 0x30 31205 66071776 /media/DRIVE/_TEMP/LOG/gitea/macaron.log
6 REG 0x30 0 66071777 /media/DRIVE/_TEMP/LOG/gitea/router.log
7 REG 0x30 696800 66071778 /media/DRIVE/_TEMP/LOG/gitea/xorm.log
8 REG 0x822 2433024 6197630 /media/DRIVEB/opt/gitea/data/gitea.db
9 REG 0x822 0 6167383 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/LOCK
10 REG 0x822 28139 6167384 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/LOG
11 IPv6 *:3000
12 REG 0x822 39378 6163834 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000102.log
13 REG 0x822 110 6163840 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/MANIFEST-000103
14 REG 0x822 15305 6209772 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000037.ldb
15 REG 0x822 127 6207818 /media/DRIVEB/opt/gitea/data/queues/issue_indexer/000002.ldb
16 REG 0x822 0 6167673 /media/DRIVEB/opt/gitea/data/queues/task/LOCK
17 REG 0x822 26545 6197256 /media/DRIVEB/opt/gitea/data/queues/task/LOG
18 REG 0x822 0 6164917 /media/DRIVEB/opt/gitea/data/queues/task/000084.log
19 REG 0x822 70 6164929 /media/DRIVEB/opt/gitea/data/queues/task/MANIFEST-000085
20 REG 0x822 127 6167385 /media/DRIVEB/opt/gitea/data/queues/task/000002.ldb
21 REG 0x30 1048576 66074727 /media/DRIVE/GITEA/indexers/issues.bleve/store
22 REG 0x30 3211452416 66074729 /media/DRIVE/GITEA/indexers/repos.bleve/store
23 IPv6 localhost:3000->localhost:43982
cwd DIR 0x30 58 400815 /media/DRIVE/REPO/GITEA
mem REG 0x2b 66074727 /media/DRIVE/GITEA/indexers/issues.bleve/store (path dev=0,48)
mem REG 0x2b 66074729 /media/DRIVE/GITEA/indexers/repos.bleve/store (path dev=0,48)
rtd DIR 0x825 4096 2 /
txt REG 0x822 82951528 6056650 /media/DRIVEB/opt/gitea/gitea
It would be useful to have some logs for the time span of your tests.
EDIT: (I mean, for context)
I would like to but there a lot of personal information in the logs, alot about my projects, issues, wikis etc etc If you cna tell me what specific you are looking for I can definetely provide it like crashes. But I am not seeing any of those there.
I think I've found an important bug! But it should only manifest itself as repos not being _updated_ (creation of indexes from scratch should not be affected).
As for the error message in my instance:
2020/01/22 21:04:05 ...ndexer/code/queue.go:39:processRepoIndexerOperationQueue() [E] indexer.Index: exit status 1
I've been debugging and it turns out this error is expected as I have one corrupt repo, so git show-ref -s
returns.... a silent exit status of 1. I believe this should not affect the indexing of other repos, because the error is logged and the indexer just continues processing its queue.
About the bug I've mentioned, I'll post a PR momentarily.
Sounds good.
I just started from scratch again, this time I added include files list so that the scope is limited since I am mostly interested in txt and py files (my repos have alot of binary fiels too). I will report back if that does any good.
Ok that did not work perfectly either. So here is the result from the Gitea code search page for builtin.transform
I am only including the results from the same repo in Code search and the Grep search.
One speculation I can make is that Code search seems to only return one result per file (compare it to the grep seearch), which can be one of the culprits if not the whole problem.
MayaConfigV3_2/fa_hotkeys.py
View File
{"properties":
[("name", 'builtin.transform'),
],
QWER/QWER_Industry_Keymap.py
View File
{"properties":
[("name", 'builtin.transform'),
],
and here is from the terminal
grep -ir "builtin.transform" *
MayaConfigV3_2/fa_hotkeys.py:536: [("name", 'builtin.transform'),
MayaConfigV3_2/fa_hotkeys.py:2547: [("name", 'builtin.transform'),
MayaConfigV3_2/fa_hotkeys.py:2708: [("name", 'builtin.transform'),
MayaConfigV3_2/fa_hotkeys.py:2715: [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:1307: [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:1314: [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:1321: [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:2050: [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:2057: [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:2064: [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:5407: [("name", 'builtin.transform'),
QWER/QWER_Industry_Keymap.py:7050: [("name", 'builtin.transform'),
It still not reporting anything aabout "tool_set" for this repo I listed above, but see what ack returns for the repo given above.
ack tool_set *|wc -l
3849
Oh! 馃う鈥嶁檪
The indexer indexes only the first instance of any term _per file_. It's not meant to be a full text search.
Interesting. Thn maybe it is not even going to return partial results?
Here tool_set_by_name
returns 2 results from the whole Gitea. Meanwhile grep can return many foir a single repo. Maybe that explains why "tool_set" returns none in some ways?
It _should_ return result per file where it occurs, as long as it's in master
(or whatever branch is your default) and "indexable" (i.e. not filtered out by your settings or ... ehem .... . _perhaps your files are marked as executable_?). 馃槼
as per @guillep2k
May this https://github.com/go-gitea/gitea/issues/9190#issuecomment-571563226 be related to this issue?
Re-checked https://github.com/go-gitea/gitea/issues/9190#issuecomment-571563226 behavior with latest upstream version
1.12.0+dev-174-g5b17bb8f3
Seems working now!
Repo index was updated after git push
@vvrein I'm gonna close this as Fixed by #9965 and #9957