Doing a whereIn with prepared query take much more time than doing a raw query
Create 10000 users
do :
$user_ids = \App\User::pluck('id');
\App\User::whereIn('id', $user_ids->sort())->get();
this request take 908ms to execute ,
But when doing :
\DB::table('users')->select('*')->whereRaw("id in ('1', '2', '3', '4', [...],'10000')")->get();
This on is taking only 68ms to execute.
I have made a demo project here : https://github.com/Arkhas/laravel-perf-wherein
(just run php artisan migrate:refresh --seed
and visit the homepage (PHP Debug bar is enable))
it seems that it's an issue with prepared queries of php PDO :
https://laracasts.com/discuss/channels/general-discussion/eloquent-is-so-slow?page=2
https://stackoverflow.com/questions/4350718/why-are-certain-types-of-prepared-queries-using-pdo-in-php-with-mysql-slow
when doing eager loading with eloquent, it seems that it use the first synthax and take a lot more time to process
the question here is : Is it required to do prepared queries for eloquent eager loading ? Or is it possible to eager load using the second synthax?
The two queries doesn't produce the same result, one returns Eloquent models, one doesn't.
You can use something like Blackfire to figure out where the time is spent. Is it really the actual database call, and not anything related like escaping values or hydrating models?
you can replace \DB::table('users')
by \App\User
and is still results in lower performances,
and it's just the db query which takes time :
the only difference between the 2 queries is that one is using binding while the other is not
Are you benching this in one request? Doesn't database do some caching? Try changing the order of execution.
No it's on 2 different reload (commenting the one I don't need), and same kind of result append when I do this :
$user_ids = \App\User::pluck('id');
$test_1 = \App\User::whereIn('id', $user_ids->sort())->get(); // 1.37s
md5-8a4d4a0292e54ff20bc7e3d069e0c102
$test_2 = \App\User::whereRaw("id in ('" . implode("','", $user_ids->toArray()) . "')")->get(); // 111.2ms
md5-4a30afc684d8d0922c78a386421558a3
public function user()
{
return $this->belongsTo(User::class, 'id', 'id');
}
md5-14220c9cef93dfdc7c19ae6db2556ec8
$test_3 = \App\User::with('user')->get();
the "relation' query is exectuted in 1.44s
Is it something that can be fix with a better configuration of the mySQL server ?
I have a pretty standard valet installation on my local env, but the same thing in happening on the production server.
PS: I have reflect these tests in this file :
https://github.com/Arkhas/laravel-perf-wherein/blob/master/routes/web.php
Maybe the bottleneck is not in the whereIn
, but in the beforehand pluck
and/or sort
?
Same thing when the array is manually passed in the array
$test_4 = \App\User::whereIn('id', ['1', '2', [...], '10000'])->get()
And even if it would take more time to process, it should not have any influence on the query execution time
What happens if you run that query directly in Sequel Pro or something (with query cache disabled)?
I have noticed with some of my own queries that Debugbar sometimes says a query has taken much much longer than it ever does when run directly. I always assumed there there was some overhead of building the Collection that was being included in the timings. Perhaps I'm wrong?
I have similar performance considerations with whereIn
in projects of mine, currently I chunk requests (let's say chunks of 500 ids) and merge the results.
I was assuming it was some kind of saturation on the DB side. Didn't suspect using whereRaw
would be faster (if this is confirmed).
@JayBizzle the difference in Sequel Pro is that you are not using a prepared query if you just copy and paste that query in Sequel PRO, it seems that it is not related to building the collection, but just the prepared query (you can try to see if there is any difference using this synthax in your own code :
$test_2 = \App\User::whereRaw("id in ('" . implode("','", $user_ids->toArray()) . "')")->get(); // 111.2ms
@vlakoff , we are using the same workaround for some big queries, as it seems that managing 10000 relation in 2 queries is less efficient that managing 20 queries of 1000
I have exactly the same problem !!
It seems to be in the prepared query, when you have to handle thousands of bindings.
@JayBizzle It seems that debugbar may take a while to load (render in the browser) but its actually pretty accurate:
Tinker
$ids = \App\User::pluck('id');
$m = microtime(true); \App\User::whereIn('id', $ids)->get(); microtime(true)-$m;
>>> 0.89213609695435
And I get same results form the debugbar ±0.05.
Need to re-trace everything here.
I have experienced this issue and came to the same conclusion than @Arkhas: it seems that Eloquent is slowing down queries with large whereIn condition by easily a factor of x10 vs. doing them raw =/
Just to clarify, is the slowdown due to the SQL prepared statement, or to some processing by Laravel? (… or both?)
@vlakoff The slowdown is caused by the prepared statement. You can reproduce the performance gap with a standalone PDO
connection.
The implementation is simple, just not very elegant. Of course, we can only use it for integer keys.
An example for HasMany
relationships: https://github.com/staudenmeir/framework/commit/3be00a20c11476ecfa6affd42f438619632f40b5
Ineed, I came to the same conclusion : the slowdown is caused by prepared statments.
So, for eager loading we should be able to use rawqueries to improve performances.
There is IMHO no security concerns that justifies using prepared statements when its about eager loading.
@FabienLucini It's only secure when the keys are integers.
But I think the final observation of @FabienLucini is very valuable. My arguments:
Seems like a candidate for performance optimization.
I confirm that using the method of @staudenmeir , the performance are dramatically improved :
For this relation :
\App\User::with('user')->get();
Original :
using :
/**
* Set the constraints for an eager load of the relation.
*
* @param array $models
* @return void
*/
public function addEagerConstraints(array $models)
{
// We'll grab the primary key name of the related models since it could be set to
// a non-standard name and not "id". We will then construct the constraint for
// our eagerly loading query so it returns the proper models from execution.
$key = $this->related->getTable().'.'.$this->ownerKey;
$whereIn = in_array($this->parent->getKeyType(), ['int', 'integer']) ? 'whereInRaw' : 'whereIn';
$this->query->$whereIn($key, $this->getEagerModelKeys($models));
}
In the BelongsTo
class :
@staudenmeir said it's not very elegant, but what would be a more elegant way of doing that ?
Does anyone know who to ping to know if this solution is viable, or if not using prepared request is a deal breaker for a possible PR ?
Taylor doesn't normally respond to pings on GitHub (from what I have seen). You can try to ask him on Discord.
@Arkhas did you ever hear anything?
Nope, I will try to ping @themsaid here :p
Just adding a bit more info, this PHP bug is related I think https://bugs.php.net/bug.php?id=53458
The performance impact here is incredible. Given the high (common) use of integers as foreign keys, this would be a pretty big benefit for most codebase. I know the solution isn't marvellous, but I think it would be worth pursuing.
Is it at all feasible to put any of the above fixes into a package in the interim?
@JayBizzle It's possible, but IMO a change like this would be much more useful in the core.
Since most Laravel sites run on MySQL and most sites use eager loading with integer keys (my assumptions), this would improve the performance on a lot of sites. Even if they aren't eager loading thousands of records at once.
The individual query may only gain a few microseconds/milliseconds, but over all sites combined, this adds up.
@staudenmeir Yes, I agree this being in core should be the goal, was just hoping for a quick fix as we have lots of instances in our app where this problem is very apparent.
Perhaps we should create a PR to start getting feedback and discussing ideas. Might get more attention than in here.
I'll submit a PR.
Awesome, I will keep an eye out for it 👍
Surprised @taylorotwell or any of the core dev team haven't commented on this yet. Looking forward to seeing the PR.
It hadn't occurred to me that this is a Mysql only problem until I read the PR. It made me wonder if there's anything we can do at the query compilation time. What if we just stop using prepared statement for int values when compiling the bindings at the MySQL driver?
@deleugpn The only way I see is adjusting Builder::cleanBindings()
. But that wouldn't affect all queries and isn't exactly elegant.
Most helpful comment
I'll submit a PR.