Julia: Default `iteratorsize` to `SizeUnknown()`

Created on 7 Jun 2017  路  10Comments  路  Source: JuliaLang/julia

Since the most conservative estimate of an iterator's size is unknown, I think it would make sense for iteratorsize to default to SizeUnknown() for new iterators and let iterator authors optionally override it as an optimization.

As it stands, collect will return an error on custom iterator types that haven't defined iteratorsize.

collections

Most helpful comment

Also, if you remove the default iteratorsize, you just get a different MethodError (for iteratorsize rather than length), so I'm not sure what the benefit is.

Wouldn't it be clearer to get an error about iteratorsize (which you need to implement) than about length (which you don't want to/cannot implement)?

All 10 comments

Most iterators have known size so wouldn't this be a performance trap since it is easy to just forget to define the method?

is it better to have things work by default but be slow, or not work by default?

Probably not work by default. You see the error message, you fix it. You don't see the error message, code is slow forever.

The only backwards-compatible solution would be to deprecate the default iteratorsize method entirely.

Yes, I think that would be reasonable.

a deprecation would have been good to put in when these were first introduced, but that was several versions ago and I think most of the breakage has already been done

Also, if you remove the default iteratorsize, you just get a different MethodError (for iteratorsize rather than length), so I'm not sure what the benefit is.

I've seen people respond to the errors here by implementing an O(n) length method, which isn't the right thing to do at all. The default behavior is definitely misleading.

Also, if you remove the default iteratorsize, you just get a different MethodError (for iteratorsize rather than length), so I'm not sure what the benefit is.

Wouldn't it be clearer to get an error about iteratorsize (which you need to implement) than about length (which you don't want to/cannot implement)?

Hey, as I recently stumbled across the same question what an appropriate default is, I'm curious:

is it better to have things work by default but be slow, or not work by default?

I thought, that the whole concept of abstraction and generalization is, to work by default, and be slow if there is no specialization. Though you can get a lot of performance if you specialize and provide all information you have. When it was "not work by default" over "work but be slow by default", then Julia wouldn't have a general matrix multiply that works for any combination of AbstractMatrix types.

Regarding the slow by default "problem". If it was changed such that the workflow looks like:

You want any kind of sizehint? -> Define size/length and set IteratorSize properly.
Default: No sizehint and no performance.

Then it would feel way more like a performance opt-in where possible/wanted/needed rather than a code by trial and error. Otherwise one could ask, well there are cases where we implement the size method, so why don't make HasShapethe default?

From a newcomers perspective:
Code some custom iterator ci (only the iterate function for now as I didn't want to read the full documentation and try it as soon as its written).
collect(ci) errors because there is no length function supplied. But. I never told it would have a length function. I might not even be able to tell beforehand. So how to define the length function?
The very least should be a hint to the IteratorSize function.

But the only correct way, if we're too concerned about the default performance, would be to remove the default case at all and thus enforce proper definitions. Then we at least would get the helpful error that we didn't define the sizehint IteratorSize.

Thus, in regards to

Also, if you remove the default iteratorsize, you just get a different MethodError (for iteratorsize rather than length), so I'm not sure what the benefit is.

You would get a better error message that actually points you to the right direction rather than making you assume that you need to implement a length function no matter what. And better error messages are always appreciated. Especially for new adopters.

P.S. wasn't meant to read that harsh 馃檲

EDIT: The current default somewhat motivates to omit the IteratorSize implementation if you implement a length function anyway. Which in my opinion is ugly as it makes you define the method if the default doesn't fit and depend on the default if they happen to be the same. For iterators the IteratorSize seems to be a varying core property. As such it should be stated explicitly for every iterator and there is no logically useful default since on the one hand at least HasShape and HasLength are equally likely to be useful and on the other hand default behaviour shouldn't contradict explicit implementation.

Was this page helpful?
0 / 5 - 0 ratings