Orm: Suggestion: EntityRepository::findChunksBy()

Created on 6 Mar 2018  Â·  11Comments  Â·  Source: doctrine/orm

I understand that we have AbstractQuery::iterate(), however it is retrieving rows one by one, my suggestion is to have a chunk size, (i.e. 10000 by default) and $iterator->next() will trigger to load next 10000 rows from result.

Thanks.

New Feature Won't Fix

Most helpful comment

I don't really see any benefit in such thing, what is your use case? Optimization?

You can simply implement your own user-land chunk iterator like this:

<?php

declare(strict_types=1);

use Doctrine\ORM\EntityManagerInterface;

$data = new ArrayIterator(range(1, 100));

class ChunkIterator implements IteratorAggregate
{
    /** @var Iterator */
    private $inner;

    /** @var int */
    private $size;

    /** EntityManagerInterface */
    private $entityManager;

    public function __construct(Iterator $inner, int $size, EntityManagerInterface $entityManager)
    {
        $this->inner = $inner;
        $this->size = $size;
        $this->entityManager = $entityManager;
    }

    public function getIterator() : iterable
    {
        $counter = 0;
        $buffer = [];

        while ($this->inner->valid()) {
            $buffer[$this->inner->key()] = $this->inner->current();

            if (++$counter % $this->size === 0) {
                yield $buffer;
                $buffer = [];
                $this->entityManager->clear();
            }

            $this->inner->next();
        }

       if (count($buffer) === 0) {
           return;
       }

        yield $buffer;
        $this->entityManager->clear();
    }
}

foreach (new ChunkIterator($data, 5, $entityManager) as $chunk) {
    var_dump($chunk);
}

All 11 comments

I don't really see any benefit in such thing, what is your use case? Optimization?

You can simply implement your own user-land chunk iterator like this:

<?php

declare(strict_types=1);

use Doctrine\ORM\EntityManagerInterface;

$data = new ArrayIterator(range(1, 100));

class ChunkIterator implements IteratorAggregate
{
    /** @var Iterator */
    private $inner;

    /** @var int */
    private $size;

    /** EntityManagerInterface */
    private $entityManager;

    public function __construct(Iterator $inner, int $size, EntityManagerInterface $entityManager)
    {
        $this->inner = $inner;
        $this->size = $size;
        $this->entityManager = $entityManager;
    }

    public function getIterator() : iterable
    {
        $counter = 0;
        $buffer = [];

        while ($this->inner->valid()) {
            $buffer[$this->inner->key()] = $this->inner->current();

            if (++$counter % $this->size === 0) {
                yield $buffer;
                $buffer = [];
                $this->entityManager->clear();
            }

            $this->inner->next();
        }

       if (count($buffer) === 0) {
           return;
       }

        yield $buffer;
        $this->entityManager->clear();
    }
}

foreach (new ChunkIterator($data, 5, $entityManager) as $chunk) {
    var_dump($chunk);
}

@Majkl578 I already implemented it in my use case :) wanted to discuss here, if it's smth usable and I can, write test cases and create PR for it.

I think we discussed this on twitter DM first: I also don't see a real
use-case for this, but you should look at
https://github.com/Ocramius/DoctrineBatchUtils

On 6 Mar 2018 19:08, "Vahe Shadunts" notifications@github.com wrote:

@Majkl578 https://github.com/majkl578 I already implemented it in my
use case :) wanted to discuss here, if it's smth usable and I can, write
test cases and create PR for it.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/doctrine/doctrine2/issues/7118#issuecomment-370873983,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJakKg_IdmOjmx5fInrzzVCPa4QYDW1ks5tbtCWgaJpZM4SeoSv
.

My use case

loop over 5M+ rows.

what I want to achieve:

An BufferedIterableResult that triggers a SELECT only every 1000 rows (ie).

@Ocramius is it possible with your batch utils ?

Closing here: this shouldn't go to ORM core.

@quazardous yes, but looping over 5M+ rows will take a while with the ORM: are you sure it is a task for the ORM at all?

@ocramius good question : what could be the overhead of orm ? Object hydration: not much ? I'm aware looping over 5M rows is slow but having chunk of 1000 rows will reduce Select queries from 5M to 5K. At this point readability and code sanity is much more important for me 😊

Object hydration: not much ?

A lot.

If you are fine with a task running for a few hours, no big deal, but the ORM really isn't the best fit here.

Even with orm cache on db schema etc ? And single table select ?
So ORM maps select results against entity and keeps track of modified fields. Something else ?

@quazardous the most efficient way to deal with huge amount of data is to not transmit it over the network in first place: an optimised SQL query wins over any kind of PHP abstraction here.

I'm not saying "don't try", I'm saying that the performance you will experience will be dreadful :-)

i'm using @Majkl578 snippet from https://github.com/doctrine/doctrine2/issues/7118#issuecomment-370816775

does the thing. pretty happy with it. previously loaded whole table to memory (2GiB memory usage) now just 1024 chunk having at (64MiB). also the $em->flush(null); is faster this way.

EDIT: to clarify, i'm using that ChunkIterator to gather ids and do separate query in the loop

    /**
     * @param array $ids
     * @param int $chunkSize
     * @return \Generator|Entity\Article[]
     */
    public function getIterator(array $ids, $chunkSize = 1024)
    {
        $qb = $this->createQueryBuilder('a');
        $qb->orderBy('a.id', 'DESC');

        foreach (new ChunkIterator(new ArrayIterator($ids), $chunkSize, $this->_em) as $chunk) {
            $result = (clone $qb)
                ->where('a.id in (:ids)')
                ->setParameter('ids', $chunk)
                ->getQuery()
                ->getResult();

            foreach ($result as $row) {
                yield $row;
            }
        }
    }

@glensc ChunkIterator is precisely what ocramius/doctrine-batch-utils does, btw :-)

Was this page helpful?
0 / 5 - 0 ratings