Currently the service classes often return arrays from getMany-type calls, which is convenient but requires all results to be loaded in-memory at once. This can be too resource-intensive, e.g. for long lists of submissions.
Instead, introduce a pattern for using an Iterator implementation. This will allow for foreach-based traversal with one-at-a-time instantiation.
PRs:
@NateWr, mind taking a look? Hopefully this substantiates what I was raving about in Slack.
Tweaked your suggestions -- just missing your blessing for the greater approach :)
Ok, I've slept on it and here are my thoughts.
Iterator. Or a Generator if you think it will be more clear about the rewind limitation. I would say lets stick with what seems to be the most conventional approach in modern PHP.EntityReadInterface from getMany to getIterator or getGenerator. I think that will make it clearer that we're not getting a collection of objects.iterator_to_array()? If not, let's write a method like Generator::toArray() which we can use when needed.Does that sound alright? Thanks for talking me through all of this.
Yes, let's go with an Iterator. Or a Generator if you think it will be more clear about the rewind limitation. I would say lets stick with what seems to be the most conventional approach in modern PHP.
I think I'll stick with Iterator. The Generator interface extends Iterator anyway, so clearly the PHP dev team -- for what it's worth -- thinks non-rewindable Iterators are OK.
Let's change the name in the EntityReadInterface from getMany to getIterator or getGenerator. I think that will make it clearer that we're not getting a collection of objects.
I'm not so sure about this one; if we standardize to always returning iterators, and since iterators work with e.g. the foreach primitive, then I'm not sure it's worth disruptively removing getMany in favour of a new method name. It's true that we'd be changing the return value to a more restrictive type, but e.g. attempting to re-iterate [so to speak] will throw an Exception, so I think it's a lesser evil.
If we use a generator instead of an iterator, can we still use iterator_to_array()? If not, let's write a method like Generator::toArray() which we can use when needed.
Moot, since I'm OK with using an Iterator. But yes, Generators should still support iterator_to_array.
Once this is done, let's check each Hook...
Yes, can do.
Ok, I give in. Let's stick with getMany(). :smile:
Thanks, @NateWr! All merged.
@ajnyga, this issue in a nutshell (for the sake of PPS): Going forward, the Service class getMany functions should return an Iterator (specifically a DAOResultIterator) instead of an Array. You can do this by returning DAOResultFactory::toIterator() from your getMany implementation (in place of DAOResultFactory::toArray()).
The benefit is lower memory usage, since toArray involved instantiating all objects at once in an array. This approach only instantiates one at a time.
The downside is that you can only foreach through each return value once. (This is almost never required in the code, though.) To work around it, you can use:
$resultsAsArray = iterator_to_array($resultsIterator);
Attempting to foreach through a DAOResultIterator a second time will result in an exception.
@asmecher Do we need an OMP PR as well?
Also, sorry for the late code follow-up. I notice that the return statements all say Iterator:
@return Iterator
But should this say DAOResultIterator to be more precise? (My code editor will do code hinting/completion based on the return object type from a method.)
Do we need an OMP PR as well?
No -- OMP is fine as is. (Tests pass and some initial testing goes OK, though I wouldn't be surprised if we turned up one or two regressions during testing.)
But should this say
DAOResultIteratorto be more precise?
I changed Iterator to \Iterator; not sure if that helps already? DAOResultIterator doesn't expose any functionality beyond what Iterator does, so I would expect them to look the same in the hinting/completion tool.
Ok, good to know. I though DAOResultIterator might help us document that it doesn't rewind like Iterators do, but code completion is the main thing.
I found a few spots where !empty($result) is called on the DAOResuIterator or array_map was used without converting it to an array. I also renamed the variables to $result rather than, eg, $submissions, to help clarify the usage.
@asmecher can you take a look and let me know if this is alright?
PRs:
https://github.com/pkp/pkp-lib/pull/5206
https://github.com/pkp/ojs/pull/2512
https://github.com/pkp/omp/pull/717
@NateWr, good catches, thanks. What do you think about using the Countable interface and count($xyz) != 0 to replace the broken empty checks?
That can work too, and probably reads clearer than $result->valid(). We should be able to do just if (count($xyz)) right?
Yes, that'll do it. In the Countable implementation, you can use $this->_resultFactory->getCount().
All merged -- thanks, @NateWr! Personally I prefer e.g. $submissions over $results (more descriptive) but no big deal either way.
Personally I prefer e.g. $submissions over $results (more descriptive) but no big deal either way.
Yes, I'm torn on this as it seems easier to read with $submissions. But I also ran into exactly that confusion during the pair programming session because I made the assumption that I could call empty($submissions).
I'd be happy to also call it $submissionsIterator or something too, but I think, for myself at least, calling it $submissions will be a recurrent source of bugs from lack of attention.
Agreed, with a slight preference for $submissionsIterator (though it's long). I fear this stems from one of those legitimate criticisms of PHP.
However, if we're running a complex SELECT in order to just COUNT the results (or check if there were any), that's a wasteful pattern -- maybe there's some underlying work there.
However, if we're running a complex SELECT in order to just COUNT the results (or check if there were any), that's a wasteful pattern -- maybe there's some underlying work there.
Usually we're not just counting the results, but checking if there were 0 results before doing something else. (For example, when getting the list of issues, we check if there were any future issues before adding the "---Future Issues----" option to the select field.)
Ok, updated to $submissionsIterator.
PRs:
https://github.com/pkp/pkp-lib/pull/5221
https://github.com/pkp/ojs/pull/2516
https://github.com/pkp/omp/pull/719
Yup, thinking about this further and seeing the change rolled out through the applications, this was definitely a worthwhile clarification. Thanks, @NateWr.