Crystal: Array iteration methods

Created on 28 Oct 2016  路  7Comments  路  Source: crystal-lang/crystal

The Array API defines pairs of iteration methods such as repeated_combinations and each_repeated_combination. The former, and shorter of the two, returns an Array, while the later returns an Iterator and can take a block. This is not optimal. And really, there shouldn't be a need for both.

All that is needed is a method in Iterator to "manifest" all the results. In Ruby that method is #to_a. That doesn't quite fit for Crystal. So maybe #all?

array.repeated_combinations => Iterator
array.repeated_combinations.all => Array

I get the motivation for using each_, it makes it more obvious which methods return an iterator and can take a block. But using the iterator is so much better in most cases that having the longer method name for it is not good b/c it acts as a deterrent.

Aside, why not?

array.combinations(repeated: true)

Most helpful comment

It is possible to transform a method that yields with a block into a method that returns a stateful iterator which implements the same logic using only AST transformations inside the compiler. In a similar manner to what http://facebook.github.io/regenerator/ does for ES6 (the code becomes a state machine). The performance impact would probably be very minimal (as efficient as a hand-written iterator, I suppose), but this is far from trivial to implement.

All 7 comments

In many cases returning an array directly instead of going through an iterator is much faster, this is why there are two versions. You can try and benchmark it to see this.

Ideally we'd have something like Ruby's Enumerable and Enumerator, so you could get an Iterator out of a method implemented with just yield very easily, but this turns out to be much slower (like 1000x slower)

Or we could go with C# approach and transform that nice yielding code in the iterator version when invoked without block.

But that would not remove the returning array versions. Just the block/iterators.

You have an example? I ran some benchmarks and have yet to find an example where the speed is much different. (And that's avoiding the use of things like #first which would easily give the iterator the advantage.)

Also, the performance isn't just about speed. Memory usage is also significant. In my benchmarks, though the speed varied little, in many cases the non-iterator hit a memory wall.

a = (1..10).to_a

a.permutations.each{ |x| x }
# 0m4.246s

a.each_permutation{ |x| x }
# 0m3.784s

a.permutations.each{ |x| x }
# 0m5.076s

But...

a = (1..12).to_a
a.permutations.each{ |x| x }

produces

Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
Aborted (core dumped)

Also if the speed difference is so bad (1000x, wat?) then something is wrong with the implementation. That's just not reasonable.

It's true, there doesn't seem to be much different. Maybe we can make permutation and combination be the same as Ruby then. But I don't know, I always felt those two methods were out of place, every other method starts with each: each_slice, each_cons, each_with_index, each_with_object`. But I wouldn't mind keeping the same syntax and semantics as in Ruby.

About the 1000x difference, I meant to have just a single permutation implementation that would yield values, and then a generic way to turn that into an Iterator. The way Ruby does this is with fibers: it pauses and resumes the current fiber on yield. We could do the same by also using fibers and channels, but it turns out it's 1000x slower than implementation the Iterator manually, like it's done now.

I always felt those two methods were out of place, every other method starts with each: each_slice, each_cons, each_with_index, each_with_object.

That's an interesting point, in fact I recall asking that Ruby's each_with_object be renamed to just with_object a long time ago. And really the same is more or less true for all of these methods -- each is a bit obvious and thus a tad redundant if the method name uses the plural (e.g. #slices). But I am guessing you won't want to move that far away from Ruby's naming?

That being the case and you want to stick with each_ for consistency, okay. But maybe since these two don't correspond with Ruby anyway, things can be improved a little just by using more concise names, #each_combo, #combos, #each_perm, and #perms.

It is possible to transform a method that yields with a block into a method that returns a stateful iterator which implements the same logic using only AST transformations inside the compiler. In a similar manner to what http://facebook.github.io/regenerator/ does for ES6 (the code becomes a state machine). The performance impact would probably be very minimal (as efficient as a hand-written iterator, I suppose), but this is far from trivial to implement.

I don't think there's anything to do here other than a feature request for very efficient generator functions, which are rewritten to Iterators at compile time. This is a feature i'd love to have, however it seems quite difficult to implement correctly. I've created an issue for that here: https://github.com/crystal-lang/crystal/issues/4438.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

will picture will  路  3Comments

oprypin picture oprypin  路  3Comments

Papierkorb picture Papierkorb  路  3Comments

asterite picture asterite  路  3Comments

cjgajard picture cjgajard  路  3Comments