Cms: Better query batching

Created on 29 Dec 2020  Â·  8Comments  Â·  Source: craftcms/cms

It’s possible to create batch queries using yii\db\Query::batch() or each(), however by default MySQL will still fetch all of the data at once. As noted, it’s possible to change this behavior, but with caveats:

  • No other queries can be made by the same DB connection until the full dataset has been retrieved.
  • MyISAM tables will be locked for the duration, so even other DB connections can’t query the table (not a problem for Craft as only the searchindex table is MyISAM by default).

Craft should have an easy way to more reliably execute batched queries, since the vast majority of Craft installs use MySQL.

Couple options:

  1. We could override batch() and each(), and if a $db argument wasn’t passed (and using MySQL), create a new DB connection with PDO::MYSQL_ATTR_USE_BUFFERED_QUERY set to false and pass that to parent::batch() / parent::each()
  2. Add a paginate() method to craft\db\Query which acts as a pagination-based alternative to batch(), which could either use craft\db\Paginator or yii\data\ActiveDataProvider under the hood.

Problem with option 2 is that pagination only works reliably if the query is ordered by a column with only unique values, and if no rows are added or deleted during pagination.

If we go with option 1, the following code could be used to create a new DB connection based on the primary one:

$db = Craft::$app->getComponents()['db'];
if (!is_object($db) || $db instanceof Closure) {
    $db = Craft::createObject($db);
}
$db->open();
$db->pdo->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

Perhaps we could make that connection available via a new unbufferedDb application component, which could be manually configured from config/app.php if needed.

enhancement

Most helpful comment

Discussed this change internally and decided it’s better to have this be an opt-in thing rather than alter the default behavior, as it will have unexpected consequences. For example if you have an active transaction going, the batch query results will no longer reflect prior changes made in the transaction, since that was made over the primary connection.

So I just reverted the changes to craft\db\Query, and created new craft\helpers\Db::batch() and each() methods instead (9067c28f9fb078e688acbca59f3c1b77ece54e9a), which will create an unbuffered DB connection, if using MySQL:

// Old 
foreach ($query->each() as $row) {
    // ...
}

// New
foreach (\craft\helpers\Db::each($query) as $row) {
    // ...
}

See 6b2e6d65f0a79a9c750ccb42c1524460e8509006 for core examples.

This also frees us up to release 3.6.0 without waiting on Yii 2.0.41. We’ll just pull this into whatever the next 3.6.x release is at that point.

All 8 comments

After some relatively in depth research and discussion with @khalwat over the last couple of days I settled on option 1 as a preference.

Additional memory allocation due to the 2nd DB connection should be relatively minor and would only ever come into play if batch() or each() were actually used.

Also with this method the only way I can foresee the false buffering negatively impacting a script is if a nested each() or batch() were to be used, which is pretty much as edge-case as edge-cases come.

I'm not too fussed about the additional unbufferedDb component as I'd always assume the connection would be otherwise identical to the regular DB connection and for Craft to assume that that is the case (famous last words).

+1 for option 1

Another advantage of storing the unbufferedDb component option is it can be reused later on in the same request if there’s another batched query.

Alright, done for the next 3.6 release. Went with option 1. batch() and each() now default to the unbuffered DB connection:

https://github.com/craftcms/cms/blob/9af2acc5dd279003e63106a0306007d5f8d73cf8/src/db/Query.php#L151-L158

https://github.com/craftcms/cms/blob/9af2acc5dd279003e63106a0306007d5f8d73cf8/src/db/Query.php#L164-L171

If using Postgres, the unbufferedDB component will just act as an alias for the db component, so there won’t be any difference.

https://github.com/craftcms/cms/blob/9af2acc5dd279003e63106a0306007d5f8d73cf8/src/config/app.php#L268-L273

This looks great!! Just so I can understand the implications of setting PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false, does this mean that the entire query result will be stored in memory in the MySQL database until the connection is closed? Are there any potential downsides to having to call out to the database server on each batch? My guess is that preventing PHP's memory limit from being exceeded is more important than the potential latency, but just curious as to how this might affect behaviour.

Realized that change will cause MySQL errors if two batched queries are being executed simultaneously. So reverted the commit, and submitted a PR to Yii that will enable us to create a new unbuffered MySQL connection for each batched query, which will only be open for the duration needed by the batching (https://github.com/yiisoft/yii2/pull/18457 + https://github.com/craftcms/cms/pull/7340).

This looks great!! Just so I can understand the implications of setting PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false, does this mean that the entire query result will be stored in memory in the MySQL database until the connection is closed?

Only until the cursor reaches the end (or the connection is closed – whichever comes first). Which is how it already works for Postgres.

Are there any potential downsides to having to call out to the database server on each batch? My guess is that preventing PHP's memory limit from being exceeded is more important than the potential latency, but just curious as to how this behaviour might be affected.

Nah I’d say this will cause the batching to work as developers are already expecting they would work.

https://github.com/yiisoft/yii2/pull/18457 has been approved and added to the 2.0.41 milestone 🎉

Given this is a semi impactful change, I’m thinking we will hold off on releasing 3.6 until that Yii update is out, so we can get this in.

Discussed this change internally and decided it’s better to have this be an opt-in thing rather than alter the default behavior, as it will have unexpected consequences. For example if you have an active transaction going, the batch query results will no longer reflect prior changes made in the transaction, since that was made over the primary connection.

So I just reverted the changes to craft\db\Query, and created new craft\helpers\Db::batch() and each() methods instead (9067c28f9fb078e688acbca59f3c1b77ece54e9a), which will create an unbuffered DB connection, if using MySQL:

// Old 
foreach ($query->each() as $row) {
    // ...
}

// New
foreach (\craft\helpers\Db::each($query) as $row) {
    // ...
}

See 6b2e6d65f0a79a9c750ccb42c1524460e8509006 for core examples.

This also frees us up to release 3.6.0 without waiting on Yii 2.0.41. We’ll just pull this into whatever the next 3.6.x release is at that point.

Yii 2.0.41 was released yesterday, but we decided to hold off on this change until 3.7 (now that that’s going to be a thing). So this is resolved it the 3.7 branch now 🎉

Was this page helpful?
0 / 5 - 0 ratings