Orm: Batch Processing ,out of memory

Created on 25 Jun 2017  Â·  23Comments  Â·  Source: doctrine/orm

http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/batch-processing.html#iterating-large-results-for-data-processing

$q = $this->_em->createQuery('select u from MyProject\Model\User u');
$iterableResult = $q->iterate();
foreach ($iterableResult as $row) {
    //nothing
    $this->_em->detach($row[0]);
}
PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to all
ocate 1425408 bytes) in C:\mmood\vendor\doctrine\dbal\lib\Doctrine\DBAL\Driver\P
DOConnection.php on line 104


  [Symfony\Component\Debug\Exception\OutOfMemoryException]
  Error: Allowed memory size of 134217728 bytes exhausted (tried to allocate
  1425408 bytes)

Total: 4000,000 rows
I tried the code and found that was out of memory

Bug Missing Tests

Most helpful comment

As a workaround, I ended up iterating over 10K rows at a time:

$lastId = 0;

while (true) {
    $iterable = $this->entityManager
        ->getRepository(UploadView::class)
        ->createQueryBuilder('v')
        ->where('v.id > :id')
        ->orderBy('v.id', 'asc')
        ->setMaxResults(10000)
        ->getQuery()
        ->iterate([
            'id' => $lastId,
        ]);

    $iterated = false;
    foreach ($iterable as $row) {
        $iterated = true;
        /** @var UploadView $view */
        $view = $row[0];
        $lastId = $view->getId();
        // do actual processing
        $this->entityManager->detach($view);
    }

    $this->entityManager->clear();

    if (!$iterated) {
        break;
    }
}

All 23 comments

@imsheng this needs more details, such as:

  • which associations may be involved
  • whether this is reproducible on master (see https://github.com/doctrine/doctrine2/pull/1515)
  • a stack trace

in symfony console.
doctrine/orm version 2.5
my code is just that:

    protected function execute(InputInterface $input, OutputInterface $output)
    {
        set_time_limit(0);
        $this->em = $this->getContainer()->get('doctrine.orm.default_entity_manager');
        $xxxxDql = $this->em->createQuery('select w from AppBundle:xxxx w');
        $xxxxResult = $xxxxDql ->iterate();
        foreach ($xxxxResult as $row) {
            // do stuff with the data in the row, $row[0] is always the object

            // detach from Doctrine, so that it can be Garbage-Collected immediately
            $this->_em->detach($row[0]);
        }
    }
PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to all
ocate 1425408 bytes) in C:\mmood\vendor\doctrine\dbal\lib\Doctrine\DBAL\Driver\P
DOConnection.php on line 104

nothing else

Run it with -vvv

C:\mmood>php bin/console xxxxxxx -e=prod -vvv
PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to all
ocate 1429504 bytes) in C:\mmood\vendor\doctrine\dbal\lib\Doctrine\DBAL\Driver\P
DOConnection.php on line 104


  [Symfony\Component\Debug\Exception\OutOfMemoryException]
  Error: Allowed memory size of 134217728 bytes exhausted (tried to allocate
  1429504 bytes)


Exception trace:
 () at C:\mmood\vendor\doctrine\dbal\lib\Doctrine\DBAL\Driver\PDOConnection.php:
104

That still shows no trace, but the fact that it is crashing within the connection may indicate that the resultset is too big.

what should I do? I need to deal with tens of millions of data

Sadly, PDO does not have cursor-alike operations: mysqli does, for example.

Just a hint though: if you are off doing million records operations, then the ORM is likely the wrong tool for the job...

I tried this method, but still out of memory:

$count = $this->em->getRepository('AppBundle:xxxxxx')->createQueryBuilder('xxxxxxx')
            ->select('count(xxxxxxx) c')
            ->getQuery()
            ->getSingleScalarResult();
//count:4000,000
        for ($i = 0; $i <= $count - 1; $i++) {
            $xxx= $this->em->getRepository('AppBundle:xxx')->createQueryBuilder('xx')
                ->setFirstResult($i)
                ->setMaxResults(1)
                ->getQuery()
                ->getSingleResult();
           //nothing
        }

Are you clearing in between results? Also, that method is unsafe if data is
added or deleted during the batch operation

On 25 Jun 2017 12:55 PM, "imsheng" notifications@github.com wrote:

I tried this method, but still out of memory:

$count = $this->em->getRepository('AppBundle:xxxxxx')->createQueryBuilder('xxxxxxx')
->select('count(xxxxxxx) c')
->getQuery()
->getSingleScalarResult();
//count:4000,000
for ($i = 0; $i <= $count - 1; $i++) {
$xxx=
$this->em->getRepository('AppBundle:xxx')->createQueryBuilder('xx')
->setFirstResult($i)
->setMaxResults(1)
->getQuery()
->getSingleResult();
//nothing
}

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHub
https://github.com/doctrine/doctrine2/issues/6525#issuecomment-310895945,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJakJzJ6PgYCote9yrQupLz2vOAdyrkks5sHjy6gaJpZM4OEjCI
.

without update or insert.
The same is true with add this code.

$this->em->detach();
$this->em->clear();
unset();

what I do is a crawler
the php needs to run for a long time.
Thanks for your help :)

Then there's a memleak somewhere else - this needs debugging either in your codebase, or be reproduced in an isolated test case within this library: I can't help further without any of those.

Setting the variable to NULL is recommended in place of using unset. It tends to work better with GC (refCount issues). You should also look at if you have any relations set to EAGER. If you do, it would probably be better to use LAZY or EXTRA_LAZY to keep from having too many objects loaded into memory.

It tends to work better with GC (refCount issues).

It doesn't: refcount works in the exact same way. If it doesn't, report a php-src bug.

Running into this problem, too. I can't even get to iteration itself. It fails with OOM right here:

$this->entityManager
    ->getRepository(UploadView::class)
    ->createQueryBuilder('v')
    ->orderBy('v.id', 'asc')
    ->getQuery()
    ->iterate();

Need to process 12M rows.

I expected it to work the same way DBAL does because I don't have a problem iterating over a huge result set with DBAL. This time I need model objects tho.

As a workaround, I ended up iterating over 10K rows at a time:

$lastId = 0;

while (true) {
    $iterable = $this->entityManager
        ->getRepository(UploadView::class)
        ->createQueryBuilder('v')
        ->where('v.id > :id')
        ->orderBy('v.id', 'asc')
        ->setMaxResults(10000)
        ->getQuery()
        ->iterate([
            'id' => $lastId,
        ]);

    $iterated = false;
    foreach ($iterable as $row) {
        $iterated = true;
        /** @var UploadView $view */
        $view = $row[0];
        $lastId = $view->getId();
        // do actual processing
        $this->entityManager->detach($view);
    }

    $this->entityManager->clear();

    if (!$iterated) {
        break;
    }
}

Ah yes, fairly sure that the ORM doesn't lazily iterate over DBAL resultsets: that is a big issue that needs work.

Ah yes, fairly sure that the ORM doesn't lazily iterate over DBAL resultsets: that is a big issue that needs work.

@Ocramius it would possibly be done for ORM v3.x, right?

Yeah, not 2.x

On 29 Oct 2017 19:44, "Luís Cobucci" notifications@github.com wrote:

Ah yes, fairly sure that the ORM doesn't lazily iterate over DBAL
resultsets: that is a big issue that needs work.

@Ocramius https://github.com/ocramius it would possibly be done for ORM
v3.x, right?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/doctrine/doctrine2/issues/6525#issuecomment-340284458,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJakADmwebqnVp7XH0c5w0C1JbRQiyQks5sxMd3gaJpZM4OEjCI
.

$em->getConnection()->getConfiguration()->setSQLLogger(null);

worked for me

Try this, works for me

$conn = $emd->getConnection();
$conn->getWrappedConnection()
->setAttribute(\PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
$conn->getConfiguration()->setSQLLogger(null);

If you wish persist an entity, then you have this error
SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active. Consider using PDOStatement::fetchAll(). Alternatively, if your code is only ever going to run against mysql, you may enable query buffering by setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute.

To avoid this, create another Entity Manager with PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to true (by default) and use merge.

$emd2->merge($entity);

$em->getConnection()->getConfiguration()->setSQLLogger(null);

worked for me

It works perfectly

Was this page helpful?
0 / 5 - 0 ratings