In #35944 we noted that repeat(::OffsetArray, ...)
is a little funny; it errors for inner
repeats and happens to return 1-based arrays for outer
repeats. I don't think either of these behaviors were ever explicitly thought through; it'd be good to decide how we want them to behave in the future.
repeat
, like cat
, push!
, etc, seems to belong to the "arrays as lists" worldview rather than "arrays as associative containers" worldview. xref https://github.com/JuliaArrays/CatIndices.jl, which has never yet gotten around to deciding on a very complete solution to this question but has taken a few tentative stabs at it.
One option I can't remember noticing in the past (but surely @andyferris and others have): should we use values(A)
to indicate that we wish to strip the indices, returning a 1-based indexing view? At one point I decided to call this collect
, but that's eager rather than lazy and the docstring for collect
is not as clear on this point as I seem to remember it.
Currently it's a bit of a wreck:
julia> a = Base.IdentityUnitRange(-3:3)
Base.IdentityUnitRange(-3:3)
julia> collect(a)
7-element Array{Int64,1}:
-3
-2
-1
0
1
2
3
julia> Array(a)
7-element Array{Int64,1}:
-3
-2
-1
0
1
2
3
julia> b = OffsetArray(-3:3, -3:3)
-3:3 with indices -3:3
julia> collect(b)
7-element Array{Int64,1}:
-3
-2
-1
0
1
2
3
julia> Array(b)
ERROR: BoundsError: attempt to access 7-element Array{Int64,1} at index [-3:3]
Stacktrace:
[1] throw_boundserror(::Array{Int64,1}, ::Tuple{OffsetArrays.IdOffsetRange{Int64,Base.OneTo{Int64}}}) at ./abstractarray.jl:537
[2] checkbounds at ./abstractarray.jl:502 [inlined]
[3] copyto!(::Array{Int64,1}, ::OffsetArray{Int64,1,UnitRange{Int64}}) at ./multidimensional.jl:959
[4] Array at ./array.jl:541 [inlined]
[5] Array(::OffsetArray{Int64,1,UnitRange{Int64}}) at ./boot.jl:429
[6] top-level scope at REPL[9]:1
julia> values(b)
-3:3 with indices -3:3
julia> values(a)
Base.IdentityUnitRange(-3:3)
It's a good question. I have to say, I'm even sometimes a little confused about what collect
is supposed to do (well, it's eager?) - it seems like such a nice generic english verb one might like to extend, but it's tied to the Array
literal syntax.
we wish to strip the indices, returning a 1-based indexing view
If I think beyond arrays, I can use enumerate(iter)
to get "indices" for each element in an iterable which might not even have been iterable to begin with. Not all iterables will support fast lookup based on their iteration position, but for those that do I can imagine instead of doing something like:
for (i, x) in enumerate(iter)
...
end
I could do something like
for (i, x) in pairs(enumeration(iter))
...
end
and enumeration(iter)
would be an indexable such that enumeration(iter)[1] === first(iter)
, etc.
values
is currently an AbstractDict
thing also. E.g. it would seem natural to me if values(::Dict)
would still be indexable - and the indices don't change. Whatever we come up with it would be nice to have similar semantics across dictionaries and arrays (and their keys).
My original intention for collect
was pretty simple: it should produce the vector of values that you would get if you iterated an entire iterable and put the results in a vector. But that concept was not adhered to terribly well and now it has a hodgepodge of behaviors 🤷♂️
Thanks Stefan. Out of curiosity, what was wrong with using vector
for that - like we have tuple
and string
as constructors for other built-in collections?
Related to this, I suggested more "lawful" approach to values
etc. in #36175.
I think it's a good idea to make values
and keys
optionally indexible by "tokens". The tokens can be 1-origin dense indices for arrays.
what was wrong with using
vector
for that
I think the problem is that it doesn't scale. I don't think it's ideal for each custom container to have lowercase factory function that may or may not do what you expect (although it might be OK for Vector
and Array
). See #36288
Most helpful comment
My original intention for
collect
was pretty simple: it should produce the vector of values that you would get if you iterated an entire iterable and put the results in a vector. But that concept was not adhered to terribly well and now it has a hodgepodge of behaviors 🤷♂️