This would be a very nice syntax for taking head and rest. Likewise a..., b = [1,2,3]
might be good for slurping the initial elements into a
and the tail element into b
.
Should we add tail(itr, state)
to the iteration protocol, giving a collection with elements starting at the given state?
In some cases that would be easy but it won't always. Having a Rest{T,S}(itr::T,state::S)
type that wraps an iterator with a state and allows you to iterate the rest of it might do the trick.
Ah, of course, the Drop
iterator already does something very similar to this.
Rest
might be a better name for that iterator.
Well, they are actually a bit different. Drop
takes a count of items to skip. Rest
would start from a given state, making it basically trivial to implement.
Ah, that's true, but the Rest
type can server both purposes, it's just a matter of how you get there – by taking an explicit state or a number of values to skip over.
Wouldn't it be possible to simulate matlab-style varargout
with this? I guess even though possible, it should be avoided since it would play havoc with the type system.
No, this does not tell the function how many outputs are requested. And actually, in a,b = f()
, as long as f returns 2 or more values, this will work and just drop the rest.
But it does result in telling the iterator how many arguments are necessary, so it seems you could write the function as a continuation iterator to weakly simulate matlab's varargout (the sane version where you just do lazy computation)
We could generally support Matlab's varargout if we did it lazily and changed the protocol for destructuring a little bit. The idea came up in the discussion of factorization objects. I.e. if a,b = x
caused a single destructuring call to x
to occur, giving some kind of boolean mask of which values should be produced. Then again, I'm not sure we really want fully general varargout since it's a bit weird that the outputs can change completely depending on how many of them are asked for.
Before implementing this maybe it would be good to check whether a more general approach to destructuring assignments is welcomed. Basically, every function f with a fixed bijective inverse function inv(f) can be used to write
f(a, b) = c
-> a, b = inv(f)(c)
examples for f are tupel composition and list composition from head and tail
f(a,b) = (a,b)
but also say
(sign(x), abs(x))
and
((a,c), (b,c))
have such inverses, which can be found by applying the rule
inv(f*g) = inf(g)*inv(f)
or be provided explicitly.
Yes, you can do that, but it's not a new capability added by a,b... = x
syntax. You can do it already. Just have the function return an iterator that computes values as next
is called. Then a,b = f()
will compute 2 values, a,b,c = f()
will compute 3, etc., with no ...
needed.
The real problem is the case of a single result, a = f(x)
, which just assigns the whole thing and no destructuring happens. If that breaks, composing functions starts to get difficult.
I found myself wanting this syntax yet again today. I think we should consider this.
(Match.jl has this)
There are some design issues here, primarily: what should the type of b
(or a
) be – array, tuple or iterator? Since we haven't addressed this, it seems best to bump this to 1.0.
Here's an interesting example that begins to address the issue.
julia> a, b, c = countfrom()
Base.Count{Int64}(1,1)
julia> a, b, c
(1,2,3)
This works as expected, but what if "splatting" was used on one of the variables?
In the case that c
was "splatted", it would make sense to return an iterator with start
returning the "current" iterator value and next
/done
matching that of the existing iterator.
However, what if b
was "splatted"? Would the code run indefinitely? An iterator could be useful if the last variable is "splatted", but it could become more confusing otherwise.
In the case that an iterator is not the right choice, though, the question of mutability (tuple vs. array) definitely seems to be worth debating.
Yes, it does seem that returning a Rest
iterator might be necessary in the general case.
This is a feature I have missed a few times. FWIW, Python seems to do the equivalent of collect
on the splatted unpacking and returns a list. I don't like this, I think it should return a type matching what is being unpacked.
Could the return type be delegated to the type of the iterable? I. e., translate
a, b, c..., d, e = f
->
a = first(f); b = second(f);
e = last(f); d = secondlast(f);
c = slurp(f, 3, end-2)
If the iterable is infinite, then calling last(f)
should raise an error.
I think the issue is for iterable things that are not infinite but have indefinite length and/or can only be iterated in order – you can't get the last item until you've collected the ones before it.
Is this not a 1.x possibility? This is currently invalid syntax.
The Python approach might be worth looking at, replacing *
with ...
. (I see dalum is skeptical of the specifics – but isn't it good to be somewhat consistent with the varargs approach, where args...
becomes a sequence of values? I might be misunderstanding.)
Anyway, Python also permits a, *b, c = range(5)
(i.e., a, b..., c = 0:4
). We may not want to copy that approach, though it might be worth having a look at the (reasonably short) spec, given that this is a meaning of this kind of unpacking that is already widely in use. (And, yes, this kind of unpacking is quite useful and readable, IMO.)
Even reading that PEP I don’t really get what they do. The main question we have is whether this should be eager or lazy. The Python approach appears to be eager?
Maybe it makes sense to have two distinct classes of iterables here? If we define one class for which it makes (enough) sense to be eager (certainly tuples, maybe vectors, possibly some other things, but certainly not potentially infinite sequences), we could permit a, b..., c
– otherwise that could be an error (the default).
(Otherwise, to be lazy and still permit it, I guess we could use something like a Future
, but that seems sort of excessive, and probably not a very useful use-case anyway?)
It’s generally not a good idea for being able to write generic code to change meaning based on type.
But, yeah, the Python definition does include the length of the rhs iterable (i.e., the rhs must have at least as many elements as the lhs), and though I've only looked cursorily at it, the implementation does seem eager.
@StefanKarpinski I guess that depends on what you mean by meaning. But, yeah, I think it makes sense to make this eager, myself – and have it behave “just like” varargs (which is a very close relative, after all).
That does seems like the obvious version. Note, however that varargs produces tuples which is somewhat questionable for this kind of usage. We may want to collect a vector instead.
Agreed. That's exactly the story in Python as well: In varargs tuples are used, while in assignments, lists are used.
This would also let us write function signatures like:
julia> foo((a, bs...), c, ds...) = bs
ERROR: syntax: invalid assignment location "bs..."
Which would be useful for e.g. https://github.com/JuliaDiff/ChainRulesCore.jl/issues/128#issuecomment-586716291
I think that leads us to an answer for how a, bs... = <expr>
must behave (which is what had already been concluded, anyway):
foo(a, bs...)
already makes bs
tuplefoo((a, bs...))
should behave the same way(a, bs...) = <expr>
should behave the same wayI think this would also be interesting
a, b, c[3:4] = [1, 2, 3, 4, 5];
#a= 1
#b= 2
#c= [3, 4]
I would have a strong expectation for that syntax to assign to a slice.
@cindRoberta See my comment in your issue. That syntax won't work sice it already has a meaning.
This seems possible, however:
a, b, c[3:4]... = [1, 2, 3, 4, 5];
Might just be my mental model that's janky, but that seems very confusing to me. I.e., it seems like you're assigning to c[3]
and c[4]
, not that you're assigning rhs[3:4]
to c
…?
I agree with you, @mlhetland, that's what I would expect it to do. Using an index on the left to indicate what value on the right to take is very unintuitive. I'm just pointing out what syntax is available.
What if a symbol or token was used to indicate the assignment to the variable instead of the assignment to the slice?
Rust uses & to indicate the slice and assignment to the slice, which is ideal for me, in Julia it would be as follows:
to assign to the slice:
c = [1, 2, 3, 4, 5]
&c[3:4] = [6, 7]
# c = [1, 2, 6, 7, 5]
to assign to the variable:
c[3:4] = [1, 2, 3, 4, 5]
# c = [3, 4]
or using the token ¬ (or another token)
c¬[3:4] = [1, 2, 3, 4, 5]
# c = [3, 4]
but the closest to reality, without conflicts, is:
this already works in Julia, I know, it's a matter of preference to use &
c = [1, 2, 3, 4, 5]
c[3:4] = [6, 7]
# c = [1, 2, 6, 7, 5]
to variable:
c¬[3:4] = [1, 2, 3, 4, 5]
# c = [3, 4]
it's a little confusing, I know, but it can't be any other way and, of course, keep the ... to assign the rest
Well, it does seem potentially useful to permit the lhs to pick apart the rhs. I guess one would also have
§x["foo"] = y
mean
x = y["foo"]
for example? Perhaps even permitting attribute lookup in the same way? (And function calls?)
Then again, if you’re going to be explicit about which parts you want, why not put that information on the rhs? The unpacking syntax is convenient when the parts are largely implicit, but I’m not sure I think, say
§a[1,3], §b[2,4] = c
is any better than
a, b = c[1,3], c[2,4]
Quite the opposite, really. Not only is it more verbose, it is, at least to me, quite confusing and contrary to how indexing works in general.
Then again, I don’t know what the desired use cases are – I’m clearly just speaking for myself.
(Edit: By “more verbose”, I meant it uses more characters, but of course it doesn’t. #sleepybrain)
@mlhetland just in my ideal,
the & references a pre-declared variable
a = [1, 2, 3, 4, 5]
print(&a[3:4]) # [3, 4]
and the a[] create a sub-array of other array
a[3:4] = [1, 2, 3, 4, 5] or
a¬[3:4] = [1, 2, 3, 4, 5]
print(a) # [3, 4]
only that, maybe there is already a function in Julia that does this, but I don't know
maybe it can be done that way
a = [1, 2, 3, 4, 5]
&a[3:4]¬[1:2] = [6, 7, 8, 9]
or ¬(&a[3:4])[1:2] = [6, 7, 8, 9]
# a = [1, 2, 6, 7, 5]
in Julia:
a = [1, 2, 3, 4, 5]
a[3:4]¬[1:2] = [6, 7, 8, 9]
or ¬(a[3:4])[1:2] = [6, 7, 8, 9]
# a = [1, 2, 6, 7, 5]
lhs is util for destructuring only, because in rhs I can do it:
a = [1, 2, 3, 4, 5][3:4]
but in destructuring not
creating a sub-array:
a, b, c[3:4] = [1, 2, 3, 4, 5] or
a, b, ¬c[3:4] = [1, 2, 3, 4, 5] or
a, b, c¬[3:4] = [1, 2, 3, 4, 5] # I prefer this
# a = 1, b = 2
# c = [3, 4]
slice:
c = [6, 7, 8, 9]
a, b, &c[1:2]¬[3:5] = [1, 2, 3, 4, 5]
# a = 1, b = 2
# c = [3, 4, 5, 8, 9]
or adapting to julia slice
c = [6, 7, 8, 9]
a, b, c[1:2]¬[3:5]= [1, 2, 3, 4, 5]
# a = 1, b = 2
# c = [3, 4, 5, 8, 9]
Sorry if I missed something, but I don't understand what the proposals with §
have to do with this issue. In any case, I think we have perfectly good constructs for slicing up arrays in complicated ways, that fit into the semantics of the language. Introducing a DSL that does this on the LHS seems insufficiently motivated.
In contrast, the a, b... =
syntax would serve a common use case and mirror how function arguments work.
The splatting syntax also makes sense wrt. current syntax/semantics (like, e.g., in Python): Conceptually, you’re splatting in an as-yet-non-existent sequence/tuple in the lhs, whose elements are then defined by the assignment. There is, in a sense, only one way of interpreting what goes where.
I would understand the ...
in a, b... =
as slurping, not splatting (just like the original issue words it, and the same way it works for functions).
Sure. My point was that slurping is, conceptually, the splatting in of a set of target positions into the lhs. No need for a separate conceptual framework to understand it. Just “pretend” that the target sequence exists, in a sense; it is then splatted in as a sequence of targets for assignment. My main point was just that there is really no wiggle-room in what it means and how it behaves, which I think is a good thing.
This symmetry between slurping and splatting (in a sense, just removing a set of parentheses from a tuple, in either case), has at least always been my mental model (also for the equivalent stuff in Python). I almost find it more confusing to treat them as separate ideas; to my mind, it’s just a matter of lhs vs rhs (also in function arguments).
Sorry if I missed something, but I don't understand what the proposals with
§
have to do with this issue. In any case, I think we have perfectly good constructs for slicing up arrays in complicated ways, that fit into the semantics of the language. Introducing a DSL that does this on the LHS seems insufficiently motivated.In contrast, the
a, b... =
syntax would serve a common use case and mirror how function arguments work.
my proposal is simple to understand but unconventional and limited to destructuring (therefore related to this issue), just:
a¬[3:4] = [1, 2, 3, 4, 5] # I think this is better
or ¬a[3:4] = [1, 2, 3, 4, 5]
print(a) # [3, 4]
a = [1, 2, 3, 4, 5]
a[3:4]¬[1:2] = [6, 7, 8, 9]
or ¬(a[3:4])[1:2] = [6, 7, 8, 9]
# a = [1, 2, 6, 7, 5]
a, b, c¬[3:4] = [1, 2, 3, 4, 5] # I think this is better
or a, b, ¬c[3:4] = [1, 2, 3, 4, 5]
# a = 1, b = 2
# c = [3, 4]
c = [6, 7, 8, 9]
a, b, c[1:2]¬[3:5] = [1, 2, 3, 4, 5]
# a = 1, b = 2
# c = [3, 4, 5, 8, 9]
Should of course be taken with a large grain of salt and I'm not saying we should just follow the majority opinion here, but I was interested in what people naturally expected this to do and did a quick survey on Slack:
I was especially surprised that so many people considered returning a tuple for arrays the most useful of all the options, since I would have imagined that returning a vector would be generally preferred. As discussed on the triage call, throwing an error if the rhs isn't a tuple until we have made up our minds about all the other cases might also be a very viable option.
Regarding other languages, I found rust has something a bit like this, but as part of their more general match syntax. They only support slurping for arrays (no tuples, at least for now) with [a, b @ ..] => ...
. b
is then a "slice", which are their type for views, but slices are immutable by default, so you need to explicitly specify mut
, if b
should be mutated afterwards. But since pattern matching is quite different from destructuring in Julia, I don't know whether that's really comparable.
Since we currently disallow vector expressions on the lhs of assignments, a more speculative proposal would be to support that syntax for destructuring as well, with the difference that [a, b...] = itr
always collects the rest of itr
into a vector, whereas for (a, b...) = itr
, b
is always a tuple. That still doesn't work for infinite iterators, but to me it seems that they are quite rare in real code and I think it's reasonable to have to explicitly ask for the rest with Iterators.rest
or Iterators.drop
in those situations. @JeffBezanson, would be interested to hear your thoughts on that.
Interesting. I can see the case for collecting everything to tuples because that makes it as similar as possible to varargs. But I think that option is horribly NON-useful. It's giving special syntax to the operation "take the tail of this data structure and convert it to a tuple". Why would you have syntax for that? It's very slow and type-unstable for basically every case except tuples. The comparison to varargs is not as reasonable as it seems at first, because we always need to splat out function arguments into a virtual tuple first to inspect all of their types for dispatch. And indeed, splatting large collections is slow. It's a somewhat common performance trap. So trying to be like varargs here would be intentionally copying this negative aspect of the language.
But since pattern matching is quite different from destructuring in Julia, I don't know whether that's really comparable.
I think it's nearly the same thing. Of course rust has different concerns about mutability that make it hard to copy directly though.
Yes, given that, I think probably the best way forward here would be to go with E, i.e. throw an error for anything that's not a tuple, for 1.6, since returning a tuple here should be pretty uncontroversial and probably also the most common case people want to use this syntax for. That would enable us to revisit the other cases later on, once people have already used this syntax a bit, so perhaps we can make a better informed decision then.
The only question that would then remain would be what to lower this syntax to. We could add a method to Base.tail
that also accepts an index to consume from, but perhaps a separate function that potentially also accepts an iteration state would be more future-proof and extensible and allow for clearer error messages. Base.rest
may be too confusing, since we already have Iterators.rest
and this would probably have a different API, right now I called it Base.slurp_rest
, but I am open to suggestions for better names/APIs.
In case they're of any interest, here are some emails discussing these questions on a Python developer mailing list in 2007. Not sure how useful the Python perspective is, but thought they were interesting.
In order to pin down the semantics, it would be interesting to see what concrete semantics people want this to replace.
Eg I could imagine
a, b... = c
replacing
a, b = first(c), c[(begin+1:end)]
but also variations with view
, dropping/keeping generalized indexing for b
(eg if c::OffsetVector
), etc.
It is not clear that any of these is preferable to the other. Because of this, I think that just using an explicit construct on the RHS is a reasonable alternative.
Yes, I agree that finding a semantic that works well for arbitrary array types and iterators is hard, but I think it would be a real shame to give up on this nice syntax altogether. https://github.com/JuliaDiff/ChainRulesCore.jl/issues/128#issuecomment-586716291 is just one example where this would be really useful if it worked at least for tuples. If we only allowed this syntax for tuples for now, I don't see how this would be problematic semantically.
Restricting to tuples would be somewhat confusing, as the a, b, c = rhs
syntax works for all iterables.
If the user really wants tuples, why not just
f(t) = first(t), Base.tail(t) # please someone invent a snappy name for f
a, b = f(t)
Restricting to tuples would be somewhat confusing, as the a, b, c = rhs syntax works for all iterables.
It wouldn't be a syntax error, it will just error because the analog of Base.tail
is not defined for arbitrary iterables, which seems reasonable to me, since the latter also only works for tuples.
If the user really wants tuples, why not just
f(t) = first(t), Base.tail(t) # please someone invent a snappy name for f a, b = f(t)
Sure, but you could make exactly the same argument against pretty much any syntax feature. I think what a, b... = t
does should be immediately obvious to anyone familiar with how splatting and slurping works for function calls. Especially in function signatures, like in @oxinabox's example, I just find it easier to figure out what the function is doing using the slurping syntax, than using Base.tail
. In that example, this change would really make writing frule
s using ChainRulesCore.jl more intuitive and more consistent with rrule
for people wanting to write new rules.
it will just error because the analog of
Base.tail
is not defined for arbitrary iterables, which seems reasonable to me, since the latter also only works for tuples
I understand that you have a specific use case in mind, but from the discussion it seems that others have a different one (ie it should work for AbstractVector
) and clarifying what the user expectations are would be useful.
One great feature of the current destructuring is that it just works for anything iterable, loosely coupling syntax and types via the iteration interface.
Introducing a, b... = c
requires taking a stand on how c
maps to b
. Eg
Saying that only c::Tuple
is allowed and b = Base.tail(c)
is one option, it plays well with types but happens to be restrictive, especially with the original proposal in mind.
Making b
equivalent to collect(c)[2:end]
is another option, but it isn't nice for users who want to destructure tuples.
Asking that the invariant (a, b...) == (c...,)
(or similar) is maintained and allowing b
to be any iterable for which this holds is also an option, which could accomodate tuples and anything iterable. Perhaps this could be done with lowering this syntax to a function that users can define methods for (basically f
above).
Perhaps this could be done with lowering this syntax to a function that users can define methods for (basically f above).
A good candidate might be peel
, an overloadable/non-lazy equivalent of Iterators.peel
.
A good candidate might be peel, an overloadable/non-lazy equivalent of Iterators.peel.
peel
is probably not the best API here, since we don't always want to take just one element from the front. I think to be most friendly to constant propagation, this function should probably accept an iteration state as well as the number of elements in front already consumed, similar to how iterate_and_index
works. I basically implemented this in #37410 as slurp_rest
, just with the exception that it only ever produces tuples.
Would it make sense to introduce this functionality as a @slurp
macro in the next version and wait and see how it is received before adding the new syntax? This could generate a lot of useful feedback from users regarding the most sensible semantics before making it officially part of the syntax, which would be much more difficult to change/deprecate later.
Yes, and in a package.
Most helpful comment
I found myself wanting this syntax yet again today. I think we should consider this.