Framework: WhereHas query is too slow

Created on 19 Mar 2017  路  26Comments  路  Source: laravel/framework

  • Laravel Version: 5.4
  • PHP Version: 7.0
  • Database Driver & Version: Mysql 5.6.33

Description:
As i tested, i founded out that whereHas query is much slower than the same query using join.
Imagine this domain model:
User - Like - Post - Tag
User likes posts
Post belongs to many tags
Tag belongs to many posts
Each post has several tags and each user may like several posts. We call a specified user interested in tag T if he/she had liked at least 10 posts which has tag T. Now, for a specified user U, we are going to find those tags which he/she is interested in.

With whereHas we have the following code:

App\Tag::whereHas('posts', function($q) {
    $q->whereHas('likes', function($q) {  // likes is a relation between posts and users (post_user)
        $q->where('users.id', 1);
    });
}, '>=', 10)->toSql()

Note that the relation between like and post is polymorphic.
It produces the following SQL statement:

select * from `tags` where 
    (select count(*) from `posts` 
        inner join 
            `post_tag` on `posts`.`id` = `post_tag`.`post_id` 
        where `tags`.`id` = `post_tag`.`tag_id` and 
            exists 
                (select * from `users`
                    inner join `likes` on `users`.`id` = `likes`.`user_id` 
                    where `posts`.`id` = `likes`.`likeable_id` and `likes`.`likeable_type` = 'App\\Post' and `users`.`id` = 1
                )
    ) >= 10

But the following equivalent SQL statement performs 10 times faster with joins:

select `tags`.*, COUNT(tags.id) AS interest from `tags` 
    inner join `post_tag` on `tags`.`id` = `post_tag`.`tag_id` 
    inner join `posts` on `post_tag`.`post_id` = `posts`.`id` 
    inner join `likes` on `likes`.`likeable_id` = `posts`.`id` and `likes`.`likeable_type` = 'App\\Post' 
    inner join `users` on `likes`.`user_id` = `users`.`id` where `users`.`id` = 1 
    group by `users`.`id`, `tags`.`id` having `interest` >= 10 order by `interest` desc

I think RDMSs do lots of work to perform joins faster and ORM should take advantage of this.

Most helpful comment

@vogtdominikgit

Laravel is creating a temporary column on literally every row in the database, in which it fills with true or false. If any indexes can be used, they are probably used after this. Any query that uses whereHas is probably going to be a full table scan.

Here is a similar query, except it does not induce full table scans.

Profile::whereIn('profiles.id', function($query) use ($mac_address) {
    $query->from('devices')
        ->select('devices.profile_id')
        ->where('devices.mac_address', $mac_address);
})

Further reading:

https://github.com/laravel/framework/issues/18415
https://github.com/laravel/framework/issues/3543
https://stackoverflow.com/questions/46785552/poor-wherehas-performance-in-laravel

All 26 comments

It could be because you are running a whereHas inside another whereHas

@AbiriAmir joins add up to what query is selecting, while exists doesn't. In other words, when you join, and in your PHP have something like $post->where('id', '>', 42) it will cause an error because id is ambiguous, and that is not desired behaviour. You would have to prefix every statement with the table name. For that reason, Eloquent ORM is not using joins, and never will.

If you need improved performance for a specific use case, just create a custom scope or a repository class, where you can use the query builder to add joins.
Btw you can use dot syntax to access nested relationships, like this: whereHas('posts.likes', '>=', 10).

Also, in most cases, exists is actually faster, because it doesn't order RDBMS to fetch any data, just check whether relevant rows exist, which often can be done by a simple index lookup.

For that reason, Eloquent ORM is not using joins, and never will.

I disagree with that @mrliptontea or maybe it is not clear enough

@Dylan-DPC I mean by itself, Eloquent cannot use joins because that breaks the API. Every statement would have to be prefixed with the table name in order to avoid errors like 'id' in where clause is ambiguous. But as a developer, you are free to modify queries however you please. Also, like I pointed above, in most cases exists, which is what Eloquent is using for whereHas, is faster.

@mrliptontea thanks for your answer but this block is not equivalent to my query.

whereHas('posts.likes', '>=', 10)

Also exists is only in the case that we are checking >=1 and in other cases count query will be executed which is 10 times slower in my test case.

Also, I think that there must be a solution for excluding the added fields in join statement

this block is not equivalent to my query.

Right, I didn't read your query through 馃槈

there must be a solution for excluding the added fields in join statement

I didn't mean that join is adding fields, you still you need to specify them in select, but it is creating ambiguity when two tables have a column of the same name, while subqueries don't do that. This simple fact makes subqueries usable for the ORM that will work in all cases. Need something more fine-tuned? Write you own query. Simple as that.

You would have to prefix every statement with the table name.

Actually, I don't see it as an impossible task. Eloquent could automatize it.

Currently a developer has to replace all with, whereHas and other calls with database queries - but then why do these even exist? When is it irrelevant to run 5 queries instead of 1? Maybe on a website for your family photos, or for your child's kindergarten. But in production it can have big network / processing impact, and (at least according to my memory) nor the documentation notes this threat, you only realize it when it's slow as hell, or if you are logging queries. And then you have to go back, replace all your calls, and change all the relevant code (since the query won't return models anymore).

Maybe I just do something the wrong way, but it's exactly what I had to do a time ago, because I naively thought Eloquent will generate an efficient query.

Actually, I don't see it as an impossible task. Eloquent could automatize it.

True, it could be done. However, it would require either you to provide a list of all the columns in the model or Laravel would have to run additional describe table query. Otherwise, it won't know which columns to prefix with what.

Currently a developer has to replace all with, whereHas and other calls with database queries

That is not true. As I stated above, in the majority of cases a subquery with exists, which is what Laravel does, is optimal for this use.

since the query won't return models anymore

When you use Eloquent query builder, it does. Obviously, to populate relationships correctly, it has to go through relationships but you can mix and match, i.e. you can have a join in the query, to fetch only desired rows, and still use eager loading.

There is no definitive list of rules on how you should write your queries. It all depends on your database structure, its size, and what you're trying to achieve. Whenever you have performance issues with your SQL queries, EXPLAIN is you only friend. Any ORM can take it only this far.
Don't get me wrong, I wish Laravel could do all that automagically but it's complex.

@KristofMorva it's very hard to generate an efficient query, especially you have to support diffierent rdbms. for example, there are lots of sql queries which are considered to slow down the performance for mysql, like sub-query, exists. but for oracle, they are just piece of cake.
IMHO, I'll keep as simple as possilbe when I use ORM. and if I need do something complicated with sql, just use the raw query (with the help of functions like selectRaw, whereRaw, orderByRaw or something similar).

maybe the slowness is because you are using 2 whereHas functions?. Better to judge it by making a separate test-case with a single whereHas and evaluating.

it's very hard to generate an efficient query, especially you have to support diffierent rdbms

Yes, I'm sure it'd be complex to solve.

IMHO, I'll keep as simple as possilbe when I use ORM. and if I need do something complicated with sql, just use the raw query.

From now on, same here; but it'd great if there was a warning about it on the Eloquent doc page :)

no you don't need to use "fully raw queries". You can achieve things in QB / Eloquent by using selectRaw, whereRaw etc

no you don't need to use "fully raw queries". You can achieve things in QB / Eloquent by using selectRaw, whereRaw etc

I think that's what @jhdxr meant. At least I'm combining Eloquent with QueryBuilder, so it's somewhere between efficiency and code readability.

@KristofMorva nope. I have seen a lot of people end up with using "raw queries" instead of raw-supported-functions

@Dylan-DPC fine. I'll edit my comments to prove @KristofMorva is right orz

Eloquent has some edge cases, when you hit these just switch to using the query builder.

Closing since there's no activity on the issue and it doesn't really indicate a bug.

whereHas producing an AND EXISTS statement is causing tons of slowness (12+seconds) to my queries.. if I run a raw query using left joins it works in under a second.

@joshuaGlass808 as explained above - it depends. For some queries EXISTS is faster for others JOIN is. Just use join if you find it better in your use case.

Also if you want to query only rows that have some relationship you probably want a full join rather than left join.

For those who are interested in a specific case and to see how much improvement a simple change can make. I do not want to start a discussion about how this is bad programming by the developer. I want to bring focus on the bigger issue.

On one side a framework exists to help developers ship more features more frequently, but on the other side a framework can be a pain in terms of performance. This is due to every abstraction layer adding overhead and reducing control over the underlying code-design.

Don't get me wrong, I love Laravel. But I think it is important to address this issue in the laravel docs and add a disclaimer, that these kind of queries might cause a lot of performance-loss in the long term.

Here is my result by changing 5 lines of code:

DB (with indexes)
profiles table with 285.714 entries
devices table with 350.428 entries

with 2,610 calls per minute from various places

Code
code change

QueryTime
querytime drop

CPU
cpu drop

But due to my lack of understanding, can somebody please explain to me why this is such a big improvement?

@vogtdominikgit

Laravel is creating a temporary column on literally every row in the database, in which it fills with true or false. If any indexes can be used, they are probably used after this. Any query that uses whereHas is probably going to be a full table scan.

Here is a similar query, except it does not induce full table scans.

Profile::whereIn('profiles.id', function($query) use ($mac_address) {
    $query->from('devices')
        ->select('devices.profile_id')
        ->where('devices.mac_address', $mac_address);
})

Further reading:

https://github.com/laravel/framework/issues/18415
https://github.com/laravel/framework/issues/3543
https://stackoverflow.com/questions/46785552/poor-wherehas-performance-in-laravel

I am also wondering why Laravel uses exists clause for whereHas() while the same could have been done with in clause with any overhead.

I had to avoid using whereHas and use raw whereIn to increase the performance from 20 s to 100 ms

Would like to know the reason why to stick with exists clause.

After spend hours trying to solve this, i decided to make the query like the old way, it is ugly i know! But is blazing fast, the time to run the query drop from 28s to 300ms.
No need cache anymore.

I met a similar issue today...

Collection has a many to many relationship with Image.

This is very slow:

$collections = Collection::orderBy('created_at', 'desc')
    ->whereHas('images')
    ->paginate(36);

So I fix the issue with this:

$collections = Collection::orderBy('created_at', 'desc')
    ->whereIn('collections.id', function($query) {
        $query->select('image_collection.collection_id')
            ->from('image_collection')
            ->join('images', 'images.id', 'image_collection.image_id')
            ->whereNull('images.deleted_at');
    })
    ->paginate(36);

Hope this helps.

@howtomakeaturn

With https://github.com/laravel/framework/issues/18415#issuecomment-650828294, also you may be able to solve like this:

$collections = Collection::orderBy('created_at', 'desc')
    ->hasByNondependentSubquery('images')
    ->paginate(36);
Was this page helpful?
0 / 5 - 0 ratings