I was confused as to why my custom iterator, was failing to collect
Consider the MWE:
type Foo
values::Vector{}
end
function Base.start(f::Foo)
return 1
end
function Base.next(f::Foo, state)
f.values[state], state+1
end
function Base.done(f::Foo, state)
state>length(f.values)
end
Running with a for loop works fine:
for f in Foo([1,2,3])
print(f)
end
> 123
Collecting Breaks:
collect(Foo([1,2,3]))
>LoadError: MethodError: no method matching length(::Foo)
while loading In[26], in expression starting on line 1
This is true -- there is not method to find the length of a ::Foo
So why is it being ask?
Because
Base.iteratorsize(Foo([1,2,3]))
>Base.HasLength()
It thinks it has a length.
I think it is coming from:
https://github.com/JuliaLang/julia/blob/c603a7941089cbb11bda70ef67743b468fce21e5/base/generator.jl#L35
The workaround is to add to my definition
Base.iteratorsize(::Foo) = Base.SizeUnknown()
But I feel like this should be being inferred, by the face that neither size(::Foo) or Length(::Foo) are defined?
Is your Foo actually a custom iterator for which length cannot be computed ahead of iteration or cannot be efficiently computed?
@mschauer In this example, well no.
But that is because this is a toy MWE.
My real code is doing tokenisation on long (multigigabyte) files.
Which can not be calculated efficiently ahead of iteration.
(it actually a bit worse than that as has a random chance to just discard some tokens)
Ok, you are correct, Base.iteratorsize(::Foo) = Base.SizeUnknown() should be added to your code. Defining Base.iteratorsize(::Foo) = Base.SizeUnknown() (so called traits) rather then using a test like
haslength(x) = applicable(length,x) in the collect function has technical and performance reasons . These design choices where discussed in #15146.
Well, good to know it was on purpose. :+1:
I personally prefer when breaking changes are made to core language parts
that they are documented and set up to trigger deprecation warnings.
Before being merged.
:stuck_out_tongue:
But I do actually love that Julia is not at all afraid to break things to allow for the language to become better. I'ld rather undocumented breaking changes, than stasis.
:grinning:
The new iterator traits definitely need mentioning in the interfaces section of the documentation. We should be more careful that new features need documentation, not just implementation.
I wrote a short recap of the changes from what I understand: https://gist.github.com/mschauer/c8d8b7eb5b455cb12ddc9bda15695203
@JeffBezanson, do you agree with the recommendations in the last paragraph in the gist?
ping @JeffBezanson iterator traits are still very poorly documented, can you please respond with some feedback on @mschauer's proposal to fix that?
@mschauer probably easier to review if you open that as a PR
I noticed this as a potential concern while implementing #17006 and found this. @mschauer, your writeup looks correct (though there are a couple of layout and wording issues, folks here can help smooth those).
It's also possible that HasLength() might not be the correct default. The counter argument is that a MethodError is pretty easily resolvable, whereas someone might not notice that they're getting 3x worse performance than they could be. The argument is even stronger for the eltype, that the "dangerous" (might fail) default is, somewhat paradoxically, perhaps the better choice.
@timholy Wait a second,
The counter argument is that a MethodError is pretty easily resolvable, whereas someone might not notice that they're getting 3x worse performance than they could be.
reads like an argument for the default HasLength (which will cause a MethodError in case of a collection attempt without defined length). But I too thought that HasLength might not the right default in the long run - I was thinking of this when writing "in case changes to the fallback definitions are made".
Yes, I meant it as an argument in favor of HasLength.
It's even more important for eltype: you might get Any containers back otherwise. Compared to debugging that problem, debugging a missing eltype method is a walk in the park.
Ping again, @JeffBezanson are iterator traits really still undocumented?
Btw., I am just waiting for the outcome of #17053 before making changes
I think the outcome is that we won't be making any further API changes or trait additions before 0.5 if we can absolutely avoid it. We do however desperately need docs for iterator traits as they currently are, and I consider that RC-blocking. Jeff is evidently on vacation, but @mschauer would you be willing to submit a PR?
Given that collect only works for iterators that define length, should length be added to the "required" list for the iterator? Maybe with a note that says that it's not required if you define Base.IteratorSize(::Type{MyIterator}) = Base.SizeUnknown()?
Another way to phrase it might be to say that you should define at least one of length or IteratorSize.
Another way to phrase it might be to say that you should define at least one of
lengthorIteratorSize.
Yes, that's a good description; saying length is required is not as accurate.
Most helpful comment
I wrote a short recap of the changes from what I understand: https://gist.github.com/mschauer/c8d8b7eb5b455cb12ddc9bda15695203