Core: Search performance regression

Created on 24 Mar 2019  路  18Comments  路  Source: flarum/core

1741 introduced a regression in search performance which needs to be looked at.

critical typregression

Most helpful comment

I had a feeling there was a reason I didn't use left join when I wrote the search code, and I am aware of performance differences between left and normal join. So that tipped me off that I should performance-test this change.

To do so, I just ran the search SQL for the keyword "beta" with and without left against the discuss.flarum.org database. The effect is big enough that you can see it directly on discuss.flarum.org (this page takes way too long to load)

All 18 comments

For my information, how did you discover this?

I had a feeling there was a reason I didn't use left join when I wrote the search code, and I am aware of performance differences between left and normal join. So that tipped me off that I should performance-test this change.

To do so, I just ran the search SQL for the keyword "beta" with and without left against the discuss.flarum.org database. The effect is big enough that you can see it directly on discuss.flarum.org (this page takes way too long to load)

Would it make sense to add a test that:

  • creates a predefined set of discussions with a specific keyword
  • measures response time of a search onto that keyword
  • reports in when the response time is out of bounds

/cc @franzliedke

The reason I ask is that we want to prevent to have detrimental performance consequences of PR's that look okayish at first.

Prioritised this, because searching anything on discuss is a huge dealbreaker 馃槀

I think, that it would be make sense to do elasticsearch integration.

I think, that it would be make sense to do elasticsearch integration.

Or firebase database could be an alternative which avoid to host flarum on dedicated servers or vps

@vingrad @laTruffe79 Debates on principles won't help us with this ticket, this is about a very concrete performance regression. So I will mark these messages as off-topic.

Please feel free to take this discussion to our forums, though. Thanks for your interest in Flarum!

$subquery = Post::whereVisibleTo($search->getActor())
            ->select('posts.discussion_id')
            ->selectRaw('SUM(MATCH('.$grammar->wrap('posts.content').') AGAINST (?)) as score', [$bit])
            ->selectRaw('SUBSTRING_INDEX(GROUP_CONCAT('.$grammar->wrap('posts.id').' ORDER BY MATCH('.$grammar->wrap('posts.content').') AGAINST (?) DESC, '.$grammar->wrap('posts.number').'), \',\', 1) as most_relevant_post_id', [$bit])
            ->where('posts.type', 'comment')
            ->where('posts.number', '1')
            ->where('posts.content', 'LIKE', '%'.$bit.'%')
            //->whereRaw('MATCH('.$grammar->wrap('posts.content').') AGAINST (? IN BOOLEAN MODE)', [$bit])
            ->groupBy('posts.discussion_id');

In this case, the CPU uses more, but a search function that works more healthy and more beautiful.

(Since LeftJoin uses too much CPU, I recommend using Join Only.)

@flarumalshain The leftJoin was added in #1741 to fix a bug - so we cannot "just" revert. But yes, that change introduced the performance problems.

@franzliedke I wonder though what鈥檚 worse? An unusable search, or a search that is faster but it just doesn鈥檛 search titles or whatever the issue was before. In other words, sure there was a bug before but at least search was faster. Maybe if no one can figure this out by beta 11 it should be reverted so that search works. Then, focus on a fix for the original bug.

@zerosonesfun Yep, that's what we decided in yesterday's meeting as well. We reverted the change now, and are planning to spend a bit more time on e.g. a better indexing solution in the next sprint.

@flarum/core Should we close this ticket in favor of a new edition of #1738?

@franzliedke, I think if the regression was fixed yes. We do need one for a search engine implementation, but there ought to be one if I'm not mistaken.

There's #509

Good idea, added that one to the 0.1 milestone. ElasticSearch is probably out of scope, but the driver infrastructure and a fairly naive indexing implementation could be doable.

I believe that if you add this code, it will be improved to like
->where('discussions.title', 'LIKE', '%'.$bit.'%')
I added but I got the error.

$subquery = Post::whereVisibleTo($search->getActor())
->select('posts.discussion_id')
->selectRaw('SUM(MATCH('.$grammar->wrap('posts.content').') AGAINST (?)) as score', [$bit])
->selectRaw('SUBSTRING_INDEX(GROUP_CONCAT('.$grammar->wrap('posts.id').' ORDER BY MATCH('.$grammar->wrap('posts.content').') AGAINST (?) DESC, '.$grammar->wrap('posts.number').'), \',\', 1) as most_relevant_post_id', [$bit])
->where('posts.type', 'comment')
->whereRaw('MATCH('.$grammar->wrap('posts.content').') AGAINST (? IN BOOLEAN MODE)', [$bit])
->groupBy('posts.discussion_id');

$subquery = Post::whereVisibleTo($search->getActor())
            ->select('posts.discussion_id')
            ->selectRaw('SUM(MATCH('.$grammar->wrap('posts.content').') AGAINST (?)) as score', [$bit])
            ->selectRaw('SUBSTRING_INDEX(GROUP_CONCAT('.$grammar->wrap('posts.id').' ORDER BY MATCH('.$grammar->wrap('posts.content').') AGAINST (?) DESC, '.$grammar->wrap('posts.number').'), \',\', 1) as most_relevant_post_id', [$bit])
            ->where('posts.type', 'comment')
            ->where('posts.number', '1')
            ->where('posts.content', 'LIKE', '%'.$bit.'%')
            //->whereRaw('MATCH('.$grammar->wrap('posts.content').') AGAINST (? IN BOOLEAN MODE)', [$bit])
            ->groupBy('posts.discussion_id');

In this case, the CPU uses more, but a search function that works more healthy and more beautiful.

(Since LeftJoin uses too much CPU, I recommend using Join Only.)

I believe that if you add this code, it will be improved to like
->where('discussions.title', 'LIKE', '%'.$bit.'%')
I added but I got the error.

$subquery = Post::whereVisibleTo($search->getActor())
->select('posts.discussion_id')
->selectRaw('SUM(MATCH('.$grammar->wrap('posts.content').') AGAINST (?)) as score', [$bit])
->selectRaw('SUBSTRING_INDEX(GROUP_CONCAT('.$grammar->wrap('posts.id').' ORDER BY MATCH('.$grammar->wrap('posts.content').') AGAINST (?) DESC, '.$grammar->wrap('posts.number').'), ',', 1) as most_relevant_post_id', [$bit])
->where('posts.type', 'comment')
->whereRaw('MATCH('.$grammar->wrap('posts.content').') AGAINST (? IN BOOLEAN MODE)', [$bit])
->groupBy('posts.discussion_id');

Was this page helpful?
0 / 5 - 0 ratings