Framework: Using Cursor on large number of results causing memory issues

Created on 21 Aug 2016  路  9Comments  路  Source: laravel/framework

When using Cursors to loop through database results PHP runs out of memory on very large result sets, this is due to the default option of MYSQL_ATTR_USE_BUFFERED_QUERY being true.

This is a MySQL specific attribute available within PDO - http://php.net/manual/en/ref.pdo-mysql.php#pdo.constants.mysql-attr-use-buffered-query.

To prevent this I am currently this attribute manually via the code below but it would be good to have this as a default.

DB::connection()->getPdo()->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

Thanks

Most helpful comment

For anyone looking at this issue, I've come up with some steps to solve this:

Fix for ->cusor() when using MySQL and getting out of memory issue

All you need to do is create a dedicated mysql connection settings which turns buffering off:

config/database.php

'mysql-unbuffered' => [
    'driver'    => 'mysql',
    'host'      => env('DB_HOST'),
    'database'  => env('DB_DATABASE'),
    'username'  => env('DB_USERNAME'),
    'password'  => env('DB_PASSWORD'),
    'charset'   => 'utf8mb4',
    'collation' => 'utf8mb4_unicode_ci',
    'prefix'    => '',
    'strict'    => false,
    'options'   => [
        PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false
    ],
],

Then in your code whenever you want to use ->cursor() over large data sets:

foreach (DB::connection('mysql-unbuffered')->table('users')->cursor() as $user) {
    // ....
}

You can also change the connection when using Eloquent models using ->on()

foreach (User::on('mysql-unbuffered')->cursor() as $user) {
    // ....
}

The reason why you need to use a separate connection is because in MySQL you cannot mix buffered and unbuffered connection, so you may get a warning:

SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active.

So it's important to use a separate connection when using unbuffered queries.

Hopefully that helps.

All 9 comments

I don't think this can be considered as a framework bug, querying over large data sets might result memory problems, dealing with this situation differs from a situation to another, it's also highly related to the database driver being used.

I'm going to close this issue for now but please feel free to ping me if ever found a specific bug in the framework that causes such memory issue.

Thank you :)

@jamesryanbell Where do you see that the default is true? Is Laravel making it true by default. What did you end up doing? Is there any downside to making the default false?

@jamesryanbell setting that attribute to false fixed my issue as well. Thanks!

@jamesryanbell setting this fixed my issue as well.

@themsaid Any time I've tried to use the cursor (only a few 100 results, 415 to be exact), I always get a 502 from the nginx. Using chunking, it takes too long and I get a 504 timeout. Although I cannot prove that this is indeed a framework bug, it appears as though something holds up the cursor between iterations when MYSQL_ATTR_USE_BUFFERED_QUERY is true (probably the buffer).

Either way, I do think it would be good to note in the documentation why this happens, because the data sets I am querying are not very large. Without the setting of the mysql attribute, cursor is useless for me in all applications.

The documentation is slightly misleading, as the chunking portion says "If you need to process thousands of Eloquent records,..." but the cursor portion states "When processing large amounts of data, the cursor method may be used to greatly reduce your memory usage". Are there any metrics for this so we can reliably set the attribute? I can interpret that thousands of records indicates a large count of rows, but what designates a large data set?

I've attached my code below for when I had to replace chunking with the cursor. On the first try without the mysql attribute, this endpoint would send back a 502 and there was quite a bit of lag in between cursor iterations. Upon changing the attribute as specified, the iterations went down to microseconds.

It would be good to know as we are utilizing laravel to build micro-services that need to scale reliably and some type of information would be better than none.

````
function getReport(Request $request, $idFkey)
{
//had to add this for the cursor
DB::connection()->getPdo()->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
$entries = [];

// get all order forms for a season
$models = MyModel::select('id')
    ->where('idFkey', $idFkey)
    ->where('idFkey2', $request->get('Fkey2'))
    ->get();

foreach ($models as $model) {
    // get all orders for those order forms
    foreach (MyModelItems::select('items', 'idFkey')
        ->where('relatedItems', 'LIKE', '%"idItem":'.$model->id.'%')
        ->where('status', '!=', 'transferred')
        ->cursor() as $modelItem) {
        // get all discount program entries for those orders
        $items = json_decode($modelItem->items);

        foreach ($items as $item) {

            if (property_exists($item, 'programs')) {
                foreach ($item->programs as $program) {
                    $entries[] = [
                        'idModel'  => $item->idModel,
                        'idProgram'  => $program->id,
                        'idCustomer'  => $modelItem->idCustomer,
                        'programName' => $program->name,
                        'discount'    => $program->discount,
                        'quantity'    => $item->quantity,
                        'price'       => $item->price
                    ];
                }
            } elseif (property_exists($item, 'idProgram') && !empty($item->idProgram)) {
                $program = DiscountProgram::find($item->idProgram);

                $entries[] = [
                    'idSolution'  => $item->idSolution,
                    'idDiscount'  => $program->id,
                    'idCustomer'  => $modelItem->idCustomer,
                    'programName' => $program->name,
                    'discount'    => $program->discount,
                    'quantity'    => $item->quantity,
                    'price'       => $item->price
                ];
            }
        }
    }
}

return response(['data' => $entries]);

}
````

Thanks @jamesryanbell using that option indeed does the trick, otherwise PHP runs out of memory. Considering this as a "bug" would mean that the behaviour was not intended, as Laravel uses a default setting, it's probably not a bug, but more of suboptimal practice? As cursors are used to scroll through large amounts of results, the default option in the framework should however help as much as possible to allow this behaviour to occur... which in this case it doesn't @themsaid

@themsaid I believe this issue should be re-opened because ->cursor() is broken with large data sets at the moment, which in a way goes completely against why you would even use it in the first place.

If you can load all results in memory with buffering enabled, then you don't really need to use cursor at all. The only useful case for cursor at the moment is to use the results from the database quicker (due to using generators).

Laravel should fix this by using a new separate connection with buffering turned off, otherwise ->cursor() will always give memory problems, even on relatively small datasets. See here for more info: http://php.net/manual/en/mysqlinfo.concepts.buffering.php

Following these characteristics buffered queries should be used in cases where you expect only a limited result set or need to know the amount of returned rows before reading all rows. Unbuffered mode should be used when you expect larger results.

Laravel should fix this by using a new separate connection with buffering turned off

But this very MySQL specific and real cursors in other DBs work differently (e.g. in Postgres they require a transaction), so it's not that easy.

@mfn Ah true. Would you happen to know how in Laravel you can create a new connection instance and use setAttribute to turn buffering off (just for MySQL) - then to use that new connection with the query builder?

I've been digging thru the core trying to work out how to do it but it doesn't appear possible.

For anyone looking at this issue, I've come up with some steps to solve this:

Fix for ->cusor() when using MySQL and getting out of memory issue

All you need to do is create a dedicated mysql connection settings which turns buffering off:

config/database.php

'mysql-unbuffered' => [
    'driver'    => 'mysql',
    'host'      => env('DB_HOST'),
    'database'  => env('DB_DATABASE'),
    'username'  => env('DB_USERNAME'),
    'password'  => env('DB_PASSWORD'),
    'charset'   => 'utf8mb4',
    'collation' => 'utf8mb4_unicode_ci',
    'prefix'    => '',
    'strict'    => false,
    'options'   => [
        PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false
    ],
],

Then in your code whenever you want to use ->cursor() over large data sets:

foreach (DB::connection('mysql-unbuffered')->table('users')->cursor() as $user) {
    // ....
}

You can also change the connection when using Eloquent models using ->on()

foreach (User::on('mysql-unbuffered')->cursor() as $user) {
    // ....
}

The reason why you need to use a separate connection is because in MySQL you cannot mix buffered and unbuffered connection, so you may get a warning:

SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active.

So it's important to use a separate connection when using unbuffered queries.

Hopefully that helps.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kerbylav picture kerbylav  路  3Comments

gabriellimo picture gabriellimo  路  3Comments

SachinAgarwal1337 picture SachinAgarwal1337  路  3Comments

ghost picture ghost  路  3Comments

progmars picture progmars  路  3Comments