Framework: Performance issue with WhereIn (and with eager loading)

Created on 10 Oct 2018  ·  33Comments  ·  Source: laravel/framework

  • Laravel Version: 5.7.9
  • PHP Version: 7.1.21
  • Database Driver & Version: MySQL 5.5.5 -10.2.15-MariaDB

Description:

Doing a whereIn with prepared query take much more time than doing a raw query

Steps To Reproduce:

Create 10000 users

do :

$user_ids = \App\User::pluck('id');
 \App\User::whereIn('id', $user_ids->sort())->get();

this request take 908ms to execute ,

But when doing :

 \DB::table('users')->select('*')->whereRaw("id in ('1', '2', '3', '4', [...],'10000')")->get();

This on is taking only 68ms to execute.

I have made a demo project here : https://github.com/Arkhas/laravel-perf-wherein

(just run php artisan migrate:refresh --seed and visit the homepage (PHP Debug bar is enable))

it seems that it's an issue with prepared queries of php PDO :

https://laracasts.com/discuss/channels/general-discussion/eloquent-is-so-slow?page=2
https://stackoverflow.com/questions/4350718/why-are-certain-types-of-prepared-queries-using-pdo-in-php-with-mysql-slow

when doing eager loading with eloquent, it seems that it use the first synthax and take a lot more time to process

the question here is : Is it required to do prepared queries for eloquent eager loading ? Or is it possible to eager load using the second synthax?

Most helpful comment

I'll submit a PR.

All 33 comments

The two queries doesn't produce the same result, one returns Eloquent models, one doesn't.

You can use something like Blackfire to figure out where the time is spent. Is it really the actual database call, and not anything related like escaping values or hydrating models?

you can replace \DB::table('users') by \App\User and is still results in lower performances,

and it's just the db query which takes time :

capture d ecran 2018-10-11 a 10 30 25
capture d ecran 2018-10-11 a 10 31 08

the only difference between the 2 queries is that one is using binding while the other is not

Are you benching this in one request? Doesn't database do some caching? Try changing the order of execution.

No it's on 2 different reload (commenting the one I don't need), and same kind of result append when I do this :

$user_ids = \App\User::pluck('id');
$test_1  = \App\User::whereIn('id', $user_ids->sort())->get(); // 1.37s



md5-8a4d4a0292e54ff20bc7e3d069e0c102



$test_2 = \App\User::whereRaw("id in ('" . implode("','", $user_ids->toArray()) . "')")->get(); // 111.2ms



md5-4a30afc684d8d0922c78a386421558a3



public function user()
    {
        return $this->belongsTo(User::class, 'id', 'id');
    }



md5-14220c9cef93dfdc7c19ae6db2556ec8



$test_3 = \App\User::with('user')->get();

the "relation' query is exectuted in 1.44s
capture d ecran 2018-10-12 a 11 50 03

Is it something that can be fix with a better configuration of the mySQL server ?

I have a pretty standard valet installation on my local env, but the same thing in happening on the production server.

PS: I have reflect these tests in this file :

https://github.com/Arkhas/laravel-perf-wherein/blob/master/routes/web.php

Maybe the bottleneck is not in the whereIn, but in the beforehand pluck and/or sort?

Same thing when the array is manually passed in the array

$test_4 = \App\User::whereIn('id', ['1', '2', [...], '10000'])->get()

capture d ecran 2018-10-12 a 15 13 03

And even if it would take more time to process, it should not have any influence on the query execution time

What happens if you run that query directly in Sequel Pro or something (with query cache disabled)?

I have noticed with some of my own queries that Debugbar sometimes says a query has taken much much longer than it ever does when run directly. I always assumed there there was some overhead of building the Collection that was being included in the timings. Perhaps I'm wrong?

I have similar performance considerations with whereIn in projects of mine, currently I chunk requests (let's say chunks of 500 ids) and merge the results.

I was assuming it was some kind of saturation on the DB side. Didn't suspect using whereRaw would be faster (if this is confirmed).

@JayBizzle the difference in Sequel Pro is that you are not using a prepared query if you just copy and paste that query in Sequel PRO, it seems that it is not related to building the collection, but just the prepared query (you can try to see if there is any difference using this synthax in your own code :

$test_2 = \App\User::whereRaw("id in ('" . implode("','", $user_ids->toArray()) . "')")->get(); // 111.2ms

@vlakoff , we are using the same workaround for some big queries, as it seems that managing 10000 relation in 2 queries is less efficient that managing 20 queries of 1000

I have exactly the same problem !!
It seems to be in the prepared query, when you have to handle thousands of bindings.

@JayBizzle It seems that debugbar may take a while to load (render in the browser) but its actually pretty accurate:

Tinker

$ids = \App\User::pluck('id');
$m = microtime(true); \App\User::whereIn('id', $ids)->get(); microtime(true)-$m;
>>> 0.89213609695435

And I get same results form the debugbar ±0.05.


Need to re-trace everything here.

I have experienced this issue and came to the same conclusion than @Arkhas: it seems that Eloquent is slowing down queries with large whereIn condition by easily a factor of x10 vs. doing them raw =/

Just to clarify, is the slowdown due to the SQL prepared statement, or to some processing by Laravel? (… or both?)

@vlakoff The slowdown is caused by the prepared statement. You can reproduce the performance gap with a standalone PDO connection.

The implementation is simple, just not very elegant. Of course, we can only use it for integer keys.
An example for HasMany relationships: https://github.com/staudenmeir/framework/commit/3be00a20c11476ecfa6affd42f438619632f40b5

Ineed, I came to the same conclusion : the slowdown is caused by prepared statments.
So, for eager loading we should be able to use rawqueries to improve performances.
There is IMHO no security concerns that justifies using prepared statements when its about eager loading.

@FabienLucini It's only secure when the keys are integers.

But I think the final observation of @FabienLucini is very valuable. My arguments:

  • the majority of use cases uses integers
  • eager loading of relations is void of user input

Seems like a candidate for performance optimization.

I confirm that using the method of @staudenmeir , the performance are dramatically improved :

For this relation :

\App\User::with('user')->get();

Original :

capture d ecran 2018-10-16 a 16 24 43

using :

/**
     * Set the constraints for an eager load of the relation.
     *
     * @param  array  $models
     * @return void
     */
    public function addEagerConstraints(array $models)
    {
        // We'll grab the primary key name of the related models since it could be set to
        // a non-standard name and not "id". We will then construct the constraint for
        // our eagerly loading query so it returns the proper models from execution.
        $key = $this->related->getTable().'.'.$this->ownerKey;

        $whereIn = in_array($this->parent->getKeyType(), ['int', 'integer']) ? 'whereInRaw' : 'whereIn';
        $this->query->$whereIn($key, $this->getEagerModelKeys($models));
}

In the BelongsTo class :

capture d ecran 2018-10-16 a 16 31 44

@staudenmeir said it's not very elegant, but what would be a more elegant way of doing that ?

Does anyone know who to ping to know if this solution is viable, or if not using prepared request is a deal breaker for a possible PR ?

Taylor doesn't normally respond to pings on GitHub (from what I have seen). You can try to ask him on Discord.

@Arkhas did you ever hear anything?

Nope, I will try to ping @themsaid here :p

Just adding a bit more info, this PHP bug is related I think https://bugs.php.net/bug.php?id=53458

The performance impact here is incredible. Given the high (common) use of integers as foreign keys, this would be a pretty big benefit for most codebase. I know the solution isn't marvellous, but I think it would be worth pursuing.

Is it at all feasible to put any of the above fixes into a package in the interim?

@JayBizzle It's possible, but IMO a change like this would be much more useful in the core.

Since most Laravel sites run on MySQL and most sites use eager loading with integer keys (my assumptions), this would improve the performance on a lot of sites. Even if they aren't eager loading thousands of records at once.

The individual query may only gain a few microseconds/milliseconds, but over all sites combined, this adds up.

@staudenmeir Yes, I agree this being in core should be the goal, was just hoping for a quick fix as we have lots of instances in our app where this problem is very apparent.

Perhaps we should create a PR to start getting feedback and discussing ideas. Might get more attention than in here.

I'll submit a PR.

Awesome, I will keep an eye out for it 👍

Surprised @taylorotwell or any of the core dev team haven't commented on this yet. Looking forward to seeing the PR.

26434

It hadn't occurred to me that this is a Mysql only problem until I read the PR. It made me wonder if there's anything we can do at the query compilation time. What if we just stop using prepared statement for int values when compiling the bindings at the MySQL driver?

@deleugpn The only way I see is adjusting Builder::cleanBindings(). But that wouldn't affect all queries and isn't exactly elegant.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sebastianbergmann picture sebastianbergmann  ·  93Comments

Demers94 picture Demers94  ·  88Comments

rafaelrenanpacheco picture rafaelrenanpacheco  ·  64Comments

mrahmadt picture mrahmadt  ·  61Comments

mianshargeel picture mianshargeel  ·  59Comments