Julia: find* functions and pairs()/keys()

Created on 11 Feb 2018  路  7Comments  路  Source: JuliaLang/julia

find* functions have been made more general in https://github.com/JuliaLang/julia/pull/24673, and now use pairs (and therefore keys) to get indices for AbstractArray, AbstractDict, Tuple, NamedTuple and AbstractString types. For HasShape iterators, they use cartesian indices, and for other iterators they use linear indices.

https://github.com/JuliaLang/julia/blob/11c08ad3363561b18938d7cd96755d3afa4f1bc8/base/array.jl#L1500-L1505

However, this scheme suffers from a significant limitation: even if a custom indexable type other than the ones listed above implements pairs/keys, these functions are ignored and the generic iterator fallback will be used instead. This is because there is currently no trait to identify indexable types, so the internal _pairs function has to be defined on a hardcoded set of types so that we don't try to call keys on iterators which may not implement it.

Changing this after 1.0 would be technically breaking pairs/keys would suddenly be used for types which implement these functions, and they could return different types from the fallback.

Possible solutions to fix this before 1.0:

  • Use method_exists(pairs, itr). This would work but AFAIK the compiler does not handle this efficiently at this point.
  • Add an Indexable trait, and call pairs/keys for types which implement it. This trait could also be useful in other cases (e.g. broadcast, see https://github.com/JuliaLang/julia/issues/18618).
  • Rename the internal _pairs function to something else, and recommend that types override it. This doesn't sound like a good approach since it would be mostly redundant with pairs/keys.
collections search & find

Most helpful comment

Note that while on 0.6 the methods accepted any iterable, in practice they relied on getindex being defined. So we probably don't need any deprecation, people will just have to implement keys/pairs for their custom collections if they don't already, and the error will tell them exactly that.

All 7 comments

Very good points. I think functions like find* that refer to indices should require their argument to implement pairs or keys/indices. It seems dubious to me to use linear indices 1:n for arbitrary iterators. To make that work, we'd need something like with_indices(idxs, itr) that creates an association between some indices and an iterator. You could pass with_indices(countfrom(1), itr) to get the current behavior. Of course we'd need to decide what that's called and what it returns.

Well, it sounds useful to be able to pass non-indexable iterators and get the position of the match, even if that's not really an index which can be used with that iterator. For example, you could use zip to combine vectors of the same length and test for a condition on several elements. Or you could use eachline to find the first line matching a condition. Telling people to use a with_indices wrapper doesn't sound very user-friendly.

Overall it seems to me that we lack collection traits, which would make things much clearer and would allow providing reasonably useful fallbacks for non-indexable iterators.

I don't agree that traits make things much clearer. With traits, in order to make X work, you need to define Y, which is difficult to discover. Traits also allow behaviors to change behind your back as packages evolve, e.g. if something becomes iterable or indexable that wasn't before.

I agree with Jeff here that it doesn't really make sense to implicitly use 1:n "indices" for things that don't have indices. If you return something as an index, then it should actually be an index. If you have something iterable you want to use 1:n as its indices, then we can provide a wrapper that gives it that behavior.

We can deprecate the current behavior of arbitrary iterators to something like

first(i for (i, x) in zip(countfrom(1), itr) if f(x))

Or to:

for (i, x) in enumerate(itr)
    f(x) && return i
end

if we want to preserve the not-found behavior of returning nothing

Note that while on 0.6 the methods accepted any iterable, in practice they relied on getindex being defined. So we probably don't need any deprecation, people will just have to implement keys/pairs for their custom collections if they don't already, and the error will tell them exactly that.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tkoolen picture tkoolen  路  3Comments

TotalVerb picture TotalVerb  路  3Comments

sbromberger picture sbromberger  路  3Comments

yurivish picture yurivish  路  3Comments

omus picture omus  路  3Comments