Thanks for creating a separate issue for this.

To be honest, I'm not sure we need explicit syntax for this. It's more of an (important) implementation detail than a user-facing feature, no? But if one does want to make the syntax explicit, then I suggest putting something like impl enum Trait in the function signature.

alexreg on 23 Apr 2018

👍13 👎2

To be honest, I'm not sure we need explicit syntax for this. It's more of an (important) implementation detail than a user-facing feature, no? But if one does want to make the syntax explicit, then I suggest putting something like impl enum Trait in the function signature.

@alexreg one reason to make it explicit is it does have performance implications, each time you call a method there will have to be a branch to call into the current type (hmm, unless these were implemented as some form of stack-box with a vtable instead, either way that still changes the performance). My first thought was also to make it a modifier on the impl Trait syntax to avoid having to repeat it on each return (impl sum Trait for the insiders pun of the some Trait proposed keyword instead of impl). But, as you mention this is an implementation detail, so that should not be exposed in the signature (could just have rustdoc hide it I suppose).

Nemo157 on 23 Apr 2018

👍7

I might be wrong, but wouldn't the |...| cause parsing ambiguities, since it's already used for closures?

Pauan on 23 Apr 2018

@Pauan Oh indeed, I was thinking EXPR { } was not valid syntax, but that's not the case in eg. if. Then, in if the |...| syntax should not be allowed anyway in if conditions, but that'd complicate for no reason the parser.

@Nemo157, @alexreg The issue with putting this as a modifier in the type signature is the fact it wouldn't work well inside a function:

fn bar() -> Option<LinkedList<char>> { /* ... */ }
// This is allowed
fn foo() -> impl enum Iterator<Item = char> {
    match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    }
}
// Either this is not allowed, or the loss in performance is not explicit
fn foo() -> impl enum Iterator<Item = char> {
    let mut tmp = match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    match n {
        Some(_) => tmp,
        None => "foo bar".iter(),
    }
}

Ekleog on 23 Apr 2018

Haven't you just invented dynamic dispatch??

est31 on 23 Apr 2018

Yeah, fair point about the performance hit. It’s a small one, but it wouldn’t be in the spirit of Rust to hide it from the user syntactically.

alexreg on 23 Apr 2018

@Ekleog you can use the exact same syntax for inside a function, somewhere you need to mention the trait that you're generating a sum type for anyway:

fn foo() -> impl enum Iterator<Item = char> {
    let mut tmp: impl enum Iterator<Item = char> = match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    match n {
        Some(_) => tmp,
        None => "foo bar".iter(),
    }
}

@est31 a constrained form of dynamic dispatch that could potentially get statically optimized if the compiler can prove only one or the other case is hit. Or, as I briefly mentioned above it could be possible for this to be done via a union for storage + a vtable for implementation, giving the benefits of dynamic dispatch without having to use the heap. (Although, if you have wildly different sizes for the different variants then you pay the cost in always using the size of the largest.)

One thing that I think might be important is to benchmark this versus just boxing and potentially have a lint recommending switching to a box if you have a large number of variants (I'm almost certain that a 200 variant switch would be a lot slower than dynamically dispatching to one of 200 implementors of a trait, but I couldn't begin to guess at what point the two crossover in call overhead, and there's the overhead of allocating the box in the first place).

Nemo157 on 23 Apr 2018

👍4

@Nemo157 Thanks for explaining things better than I could!

I'd just have a small remark about your statement: I don't think a 200-variant switch would be a lot slower than dynamic dispatch: the switch should be codegen'd as a jump table, which would give something like (last time I wrote assembler is getting a bit long ago so I'm not sure about the exact syntax) mov rax, 0xBASE_JUMP_TABLE(ENUM_DISCRIMINANT,3); jmp rax, while the dynamic dispatch would look like mov rax, METHOD_INDEX(VTABLE); jmp rax. As BASE_JUMP_TABLE and METHOD_INDEX are constants, they're hardcoded in the assembly, and so both cases end up being 1/ a memory load (either in JUMP_TABLE or VTABLE, so there could be cache impact here depending on whether you always call different methods on the same object or whether you always call the same method on different objects), and 2/ a jump (with a dependency between these instructions).

So the mere number of implementors shouldn't matter much in evaluating the performance of this dispatch vs. a box. The way of using them does have an impact, but this will likely be hard to evaluate from the compiler's perspective.

However, what may raise an issue about performance is nesting of such sum types: if you have a sum type of a sum type of etc., then you're going to lose quite a bit of time going through all these jump tables. But the compiler may detect that one member of the sum type is another sum type and just flatten the result, so I guess that's more a matter of implementation than specifiction? :)

Ekleog on 23 Apr 2018

👍8

a constrained form of dynamic dispatch that could potentially get statically optimized if the compiler can prove only one or the other case is hit.

LLVM is capable of doing devirtualisation.

as I briefly mentioned above it could be possible for this to be done via a union for storage + a vtable for implementation, giving the benefits of dynamic dispatch without having to use the heap

That's a point admittedly. Dynamically sized stack objects are a possibility but they have certain performance disadvantages.

est31 on 23 Apr 2018

Is there anything wrong with the bike shed color -> enum Trait? There are still folk who want impl Trait to be replaced by some Trait and any Trait so maybe just enum Trait catches the "make this for me" better.

I think procedural macros could generate this right now. If coersions develop further then maybe doing so would becomes quite simple even. Right now, there are Either types that do this for specific types like Iterator and Future. I'm unsure why they do not even implement From.

burdges on 24 Apr 2018

👍1

So if we are going to start painting the bike shed (there seems to be little opposition right now, even though it has only been like a day since initial posting), I think there is a first question to answer:

Should the indication of the fact the return is an anonymous sum type lie in the return type or at the return sites?

i.e.

fn foo(x: T) -> MARKER Trait {
    match x {
        Bar => bar(),
        Baz => baz(),
        Quux => quux(),
    }
}
// vs.
fn foo(x: T) -> impl Trait {
    match x {
        Bar => MARKER bar(),
        Baz => MARKER baz(),
        Quux => MARKER quux(),
    }
}

Once this question will have been answered, we'll be able to think about the specifics of what MARKER should be.

Ekleog on 24 Apr 2018

❤3

So, now, my opinion: I think the fact the return is an anonymous sum type should lie at the return site, for two reasons:

mostly because the caller has no need of knowing that the return type is an anonymous sum type, only that it's some type that implements Trait
but also because it will have a nicer syntax when used inside a function with something that is not actually a return, but eg. let x = match or similar (depending on how effective type inference of the result will be, I think taking the intersection of the traits matched by all the return branches should do it, but…)

On the other hand, the only argument I could think of in favor of putting the marker in the return type is that it makes for less boilerplate, but I'm not really convinced, so I'm maybe not pushing forward the best arguments.

Ekleog on 24 Apr 2018

👍5

@Ekleog Yeah, I'm with you on return site actually, even though I proposed the return type syntax above. As you say, it reflects the fact that it's more of an implementation detail that consumers of the function don't need to (shouldn't) care about. Also, I think the analogy to box syntax is a good one, and it would be used in a similar way superficially.

alexreg on 24 Apr 2018

I think procedural macros could generate this right now.

For a closed set of traits, sure, but to allow this to be used for any trait requires compiler support for getting the methods of the traits. Delegation plus some of its extensions might enable this to be fully implemented as a procedural macro.

I'm tempted to try and write a library or procedural macro version of this, I am currently manually doing this for io::{Read, Write} so even if it only supports those traits it would be useful for me. If it works that could be a good way to get some experience with using it in real code to inform the RFC.

Nemo157 on 24 Apr 2018

👍1

Various forms of enum impl Trait sugar for autogenerated anonymous enums have been discussed a few times in the past. For some reason I can't find a centralized discussion of them now, but I think the points that came up before but haven't been fully raised yet are:

The original motivation in past discussions was to make it possible to introduce a new error type to a function without it "virally infecting"/requiring code changes up the entire call stack. I still believe that's a far more compelling benefit than anything to do with performance (after all, this is just sugar over defining your own enum).
It's probably better to put the marker for this feature on the signature, i.e. fn foo(x: T) -> MARKER Trait
- As far as I know, the only options here are one marker in the signature, and one marker at every return point. A marker for every return point is just too noisy (especially when the return point is just a ?), and seems to go against the motivation of making it trivial to introduce a new error type.
- The main argument against having it in the signature is that it shouldn't and doesn't affect the function's API/contract, so it's not something clients should know about. While a good rule of thumb, it's been pointed out in the past that this is already not an ironclad rule (I believe mut on arguments was the counterexample that's stable today), so rustdoc having to hide the marker isn't a new layer of complexity.
The syntax probably should include impl Trait because "returning an impl Trait" is the function's API/contract as far as clients are concerned. Hence, I slightly prefer enum impl Trait over enum Trait or impl enum Trait.

I believe the main thing preventing these discussions from going anywhere is that making it even easier to use impl Trait further accentuates the only serious problem with impl Trait: that it encourages making types unnameable. But now that the abstract type proposal exists, I don't personally think that's a big concern anymore.

@Nemo157 Hmm, what delegation extensions do you think we'd need? The "desugaring" I've always imagined is just a delegate *, so we'd need enum delegation (technically an extension iirc), and then whatever it takes for a delegate * to "just work" on most traits, which might mean delegation of associated types/constants, and I think that's it.

Though I think we should probably not block this on delegation, since it could just be compiler magic.

Ixrec on 24 Apr 2018

👍4

From the current delegation RFC the extensions required are "delegating for an enum where every variant's data type implements the same trait" (or "Getter Methods" + "Delegate block") and "Delegating 'multiple Self arguments' for traits like PartialOrd" (although, this could be implemented without it and this feature would be in the same state as normal delegation until it's supported).

One thing I just realised is that delegation won't help with unbound associated types, required to support use cases like:

#[enumified]
fn foo() -> impl IntoIterator<Item = u32> {
    if true {
        vec![1, 2]
    } else {
        static values: &[u32] = &[3, 4];
        values.iter().cloned()
    }
}

would need to generate something like

```rust
enum Enumified_foo_IntoIterator {
A(Vec),
B(iter::Cloned>),
}

enum Enumified_foo_IntoIterator_IntoIter_Iterator {
A(vec::Iter),
B(iter::Cloned>),
}

impl IntoIterator for Enumified_foo {
type Item = u32;
type IntoIter = Enumified_foo_IntoIterator_IntoIter_Iterator;

fn into_iter(self) -> self::IntoIter {
    match self {
        Enumified_foo_IntoIterator::A(a)
            => Enumified_foo_IntoIterator_IntoIter_Iterator::A(a.into_iter()),
        Enumified_foo_IntoIterator::B(b)
            => Enumified_foo_IntoIterator_IntoIter_Iterator::B(b.into_iter()),
    }
}

}

impl Iterator for Enumified_foo_IntoIterator_IntoIter_Iterator {
...
}

Nemo157 on 24 Apr 2018

@Ixrec Oh indeed, I didn't think of the Result<Foo, impl ErrorTrait> use case (my use case being really futures and wanting not to allocate all the time), that would make a syntax with the marker at return site much less convenient, with ?.

However, this could be “fixed” by piping ?'s definition through sed 's/return/return MARKER/'.

This wouldn't change the behaviour for existing code (as adding MARKER when there is a single variant would be a no-op). However, an argument could be raised that it does the reducing of performance without explicit marker, but for ? the boat has sailed and it's already using From magically.

So I still think that the ease of using MARKER through type-inferred branches (like let x = match { ... }), and not only at typed boundaries (like function-level return or let x: MARKER Trait = match { ... }), is a net enough win in favor of MARKER at the return site, with an implicit MARKER for ?.

However, as a downside of MARKER at return site, I must add that having it at return site will likely make implementation harder: what is the type of { MARKER x }? I'd say it could be handled by having a “TypeInProgress” type that would slowly be intersected with all the other types, and MARKER basically lifts Type to TypeInProgress(Type), and then typing rules follow, eg.

fn bar() -> bool {…}
struct Baz {} fn baz() -> Baz {…}
struct Quux {} fn quux() -> Quux {…}
struct More {} fn more() -> More {…}

trait Trait1 {} impl Trait1 for Baz {} impl Trait1 for Quux {} impl Trait1 for More
trait Trait2 {} impl Trait2 for Baz {} impl Trait2 for Quux {}

fn foo() -> impl Trait1 {
    let x = match bar() {
        true => MARKER baz(), //: TypeInProgress(Baz)
        false => MARKER quux(), //: TypeInProgress(Quux)
    }; //: TypeInProgress(Baz, Baz) & TypeInProgress(Quux, Quux)
    // = TypeInProgress(Trait1 + Trait2, enum { Baz, Quux })
    // (all the traits implemented by both)
    if bar() {
        MARKER x //: TypeInProgress(Trait1 + Trait2, enum { Baz, Quux })
    } else {
        MARKER more() //: TypeInProgress(More, More)
    } // TypeInProgress(Trait1 + Trait2, enum { Baz, Quux }) & TypeInProgress(More, More)
    // = TypeInProgress(Trait1, enum { Baz, Quux, More })
    // (all the types implemented by both)
}

And, once this forward-running phase has been performed, the actual type can be observed (ie. here enum { Baz, Quux, More }), generated, and backwards-filled into all the TypeInProgress placeholders.
Obviously, this requires that at the time of MARKER, the type is already known.

On the other hand, for a return-site-level MARKER setup, the scheme I can think of is the following:

// (skipping the same boilerplate)

fn foo() -> MARKER Trait1 {
    let x: MARKER Trait1 = match bar() {
        true => baz(),
        false => quux(),
    }; // Here we need to infer MARKER Trait1.
    // Observing all the values that come in, we see it must be enum { Baz, Quux }
    if bar() {
        x
    } else {
        more()
    } // Here we need to infer MARKER Trait1.
    // Observing all the values that come in, it must be enum { enum { Baz, Quux }, More }
}

I personally find the syntax of the second example less convenient (it forces writing down exactly which trait(s) we want to have, not letting type inference do its job) and the end-result less clean (two nested enums will be harder to optimize). It also will likely not detect common subtrees, eg. if the more() of the else branch was replaced by if bar() { more() } else { baz() }, the first type inference algorithm would infer enum { Baz, Quux, More }, because TypeInProgress(Trait1, enum { Baz, Quux, More }) is the intersection of TypeInProgress(Trait1 + Trait2, enum { Baz, Quux }) and TypeInProgress(Trait1, enum { More, Baz }) ; while the second type inference algorithm would be forced to complete the typing at the let x: MARKER Trait1, and would thus have to infer enum { enum { Baz, Quux }, enum { More, Baz } } (or maybe enum { enum { Baz, Quux }, More, Baz }, and in either case hitting exactly the performance issue of nesting sum types discussed before).

What do you think about the idea of having ? return MARKER x.unwrap_err()? (I agree with you that without it it's most likely best to have MARKER at the return type)

Ekleog on 25 Apr 2018

I personally find the syntax of the second example less convenient (it forces writing down exactly which trait(s) we want to have, not letting type inference do its job)

For me, the primary use case is returning an enum impl Trait from a function, in which case you already need to write down the traits you want in the signature. So I never saw this as a disadvantage. In fact, it didn't even occur to me to apply enum impl Trait on a regular let binding until today.

I'm not sure I understand the sentiment behind "not letting type inference do its job". Both variations of this idea involve us writing an explicit MARKER to say we want an autogenerated anonymous enum type. In both cases, type inference is only gathering up all the variants for us, and never inferring the need for an anon enum type in the first place. In both cases, the variable x needs to have a type of some kind, at least for type-checking purposes, and in both cases that type may very well disappear during compilation/optimization. So, what's the job that type inference doesn't do in the MARKER-on-signature case?

and the end-result less clean (two nested enums will be harder to optimize). It also will likely not detect common subtrees

I'm not sure I buy either of these claims. To the extent that "detecting common subtrees" is important, I would expect the existing enum layout optimizations to effectively take care of that for free. We probably need an actual compiler dev to comment here, but my expectation would be that the actual optimization-inhibiting difficulties would come from having "all traits" implemented by the anon enums, instead of just the traits you need.

And to me, the autogenerated anonymous enum type implementing more traits than I need/want it to is "less clean". I guess that's one of those loaded terms that's not super helpful.

I'm not seeing the significance of the TypeInProgress stuff; that seems like machinery you could use in implementing either variation of the syntax, and I don't see what it buys you other than perhaps guaranteeing the type of x goes away. This is probably another thing we need compiler people to comment on, but trying to make some variables not have a type sounds to me like it would be a non-starter, its motivation better addressed my optimizations after type-checking, and entirely orthogonal to the question of what the surface syntax should be anyway.

What do you think about the idea of having ? return MARKER x.unwrap_err()? (I agree with you that without it it's most likely best to have MARKER at the return type)

I think "the idea of having ? return MARKER x.unwrap_err()" is also strictly an implementation detail that's not really relevant to the surface syntax debate, especially since ? is already more than just sugar over a macro.

To clarify, I believe the real, interesting issue here is whether we want these anonymous enum types to implement only the traits we explicitly ask for, or all the traits they possibly could implement. Now that this question has been raised, I believe it's the only outstanding issue that really needs to get debated to make a decision on whether MARKER goes at every return site or only once in the signature/binding.

My preference is of course for the traits to be listed explicitly, since I believe the primary use case to be function signatures where you have to list them explicitly anyway, and I also suspect that auto-implementing every possible trait could lead to unexpected type inference nuisances, or runtime behavior, though I haven't thought about that much.

Let's make the type inference nuisance thing concrete. Say Trait1 and Trait2 both have a foo method, and types A and B both implement both traits. Then you want to write a function that, as in your last two examples, returns enum impl Trait1 and has a let binding on a match with two branches. If we go with your variation, the let binding infers the equivalent of enum impl Trait1+Trait2 and a foo() call later in the function becomes ambiguous, while in my variation you have to explicitly write enum impl Trait1 so a call to foo() just works. That's a real disadvantage of auto-implementing all possible traits, right?

Ixrec on 25 Apr 2018

I think "the idea of having ? return MARKER x.unwrap_err()" is also strictly an implementation detail that's not really relevant to the surface syntax debate, especially since ? is already more than just sugar over a macro.

Well, I added it to answer your concern that it would be painful to have to add MARKER at return sites like ? :)

Let's make the type inference nuisance thing concrete. Say Trait1 and Trait2 both have a foo method, and types A and B both implement both traits. Then you want to write a function that, as in your last two examples, returns enum impl Trait1 and has a let binding on a match with two branches. If we go with your variation, the let binding infers the equivalent of enum impl Trait1+Trait2 and a foo() call later in the function becomes ambiguous, while in my variation you have to explicitly write enum impl Trait1 so a call to foo() just works. That's a real disadvantage of auto-implementing all possible traits, right?

That's true. However, the same could be said with regular types: if I return a single value (so no MARKER anywhere), that implements Trait1 + Trait2 and put it in an un-typed variable, then calls to foo() later will be ambiguous. So that's consistent with what we currently have, and I don't think that's a real disadvantage: it's still possible to explicitly type with return-site marker, if you want to implement only a single trait and/or type inference fails: let x: impl Trait1 = if foo() { MARKER bar() } else { MARKER baz() } (the marking of a specific type would “close” the TypeInProgress type and realize it)

I'm not seeing the significance of the TypeInProgress stuff; that seems like machinery you could use in implementing either variation of the syntax, and I don't see what it buys you other than perhaps guaranteeing the type of x goes away.

Well, apart from the end-result being cleaner (and I don't think enum layout optimizations could optimize enum { enum { Foo, Bar }, Bar, Quux } into enum { Foo, Bar, Quux }, at least with named enums, as the tag could have significance), I don't know about rustc specifically, but typing is usually done on the AST. And on an AST, I think it'd be easier to go forward and slowly complete the type of a variable, than to try to go backwards from the return point to the return sites, and from there check all the possible types that could be returned .

Actually, I'd guess that's how rustc currently does type inference:

fn foo() -> Vec<u8> {
    let res = Vec::new; //: TypeInProgress(Vec<_>)
    bar();
    res // Here we know it must be Vec<u8>, so the _ from above is turned into u8
}

This is probably another thing we need compiler people to comment on, […]

Completely agree with you on this point :)

Ekleog on 25 Apr 2018

Would it be practical to use a procedural macro to derive a specialized iterator for each word? (It seems possible, but a little verbose)
~~~ rust

[derive(IntoLetterIter)]

[IntoLetterIterString="foo"]

struct Foo;

[derive(IntoLetterIter)]

[IntoLetterIterString="hello"]

struct Hello;

fn foo(x: bool) -> impl IntoIterator {
if x {
Foo
} else {
Hello
}
}
~~~

nielsle on 2 May 2018

I'm concerned with the degree to which this seems to combine the implementation details of this specific optimization with the code wanting to use that optimization. It seems like, despite impl Trait itself being a relatively new feature, we're talking about extending it to include a form of reified vtables as an optimization, and exposing that particular choice of optimization with new syntax. And we're doing that without any performance numbers to evaluate that optimization.

I also wonder to what degree we could detect the cases where this makes sense (e.g. cases where we can know statically which impl gets returned) and handle those without needing the hint. If the compiler is already considering inlining a function, and it can see that the call to the function will always result in the same type implementing the Trait, then what prevents it from devirtualizing already?

I'd suggest, if we want to go this route, that we need 1) an implementation of this that doesn't require compiler changes, such as via a macro, 2) benchmarks, and 3) some clear indication that we can't already do this with automatic optimization. And even if we do end up deciding to do this, I'd expect it to look less like a marker on the return type or on the return expressions, and more like an #[optimization_hint] of some kind, similar to #[inline]

joshtriplett on 2 May 2018

Just to add my thoughts to this without clutter, here is my version of the optimization: https://internals.rust-lang.org/t/allowing-multiple-disparate-return-types-in-impl-trait-using-unions/7439
Automatically generating an enum is one way to devirtualize, but without inlining a lot of redundant match statements would be generated.
I'm interested in seeing what performance gains can be gleaned from this, if any.

I think that automatic sum type generation should be left to procedural macros

maplant on 2 May 2018

@joshtriplett I don’t believe the only reason to want this is as an optimisation. One of the major reasons I want this is to support returning different implementations of an interface based on runtime decisions without requiring heap allocation, for use on embedded devices. I have been able to avoid _needing_ this by sticking to compile time decisions (via generics) and having a few manually implemented delegating enums, but if this were supported via the language/a macro somehow that would really expand the possible design space.

I do agree that experimenting with a macro (limited to a supported set of traits, since it’s impossible for the macro to get the trait method list) would be the way to start. I’ve been meaning to try and throw something together myself, but haven’t found the time yet.

Nemo157 on 2 May 2018

@joshtriplett to address part of your comment, i.e. benchmarks, I created a repository that uses my method and benchmarks it against Box. Although I only have one test case and it is somewhat naive, it seems that my method is about twice as fast as Box. Repo here: https://github.com/DataAnalysisCosby/impl-trait-opt

maplant on 2 May 2018

@Nemo157 I don't think you need heap allocation to use -> impl Trait, with or without this optimization.

But in any case, I would hope that if it's available as an optimization hint, it would have an always version just like inline does.

joshtriplett on 3 May 2018

@joshtriplett Let's look at this example (here showing what we want to do):

trait Trait {}
struct Foo {} impl Trait for Foo {}
struct Bar {} impl Trait for Bar {}

fn foo(x: bool) -> impl Trait {
    if x {
        Foo {}
    } else {
        Bar {}
    }
}

(playground)

This doesn't build. In order to make it build, I have a choice: either make it a heap-allocated object:

fn foo(x: bool) -> Box<Trait> {
    if x {
        Box::new(Foo {})
    } else {
        Box::new(Bar {})
    }
}

(playground)

Or I do it with an enum:

enum FooBar { F(Foo), B(Bar) }
impl Trait for FooBar {}
fn foo(x: bool) -> impl Trait {
    if x {
        FooBar::F(Foo {})
    } else {
        FooBar::B(Bar {})
    }
}

(playground)

The aim of this idea is to make the enum solution actually usable without a lot of boilerplate.

Is there another way to do this without heap allocation that I'd have missed?

As for the idea of making it an optimization, do you mean “just return a Box and have the compiler optimize-box-away(always)”? If so, how would it handle no_std systems, that don't (IIRC, my last use of such a system was ~a year ago) actually have Box::new?

Ekleog on 3 May 2018

👍2

@Ekleog Ah, thank you for the clarification; I see what you're getting at now.

joshtriplett on 3 May 2018

Regarding the third playground example, you can use derive_more to derive Foo.into(), or alternatively you can use derive-new to derive a constructor for FooBar.These libraries do not solve the complete problem in the RFC, but they may help a little.

AFAICS a procedural macro on the following form could potentially solve the complete problem
~~~ rust

[derive(IntoLetterIter)]

enum FooBar {
#[format="foo"]
Foo,
#[format="hello"]
Hello,
}
~~~

nielsle on 3 May 2018

Quick question: How does this proposal look like on the calling site?

fn foo(x: bool) -> impl Iterator<Item = u8> { ... } // Uses what is proposed here

fn main() {
    foo().next(); // Usage like this?
}

And an idea. What about:

fn foo(x: bool) -> Box<dyn Trait>; // Rust 2018 version of `Box<Trait>`
fn foo(x: bool) -> dyn Trait; // Possible syntax for this proposal
fn foo(x: bool) -> dyn impl Trait; // Both keywords.
                                   // impl suggests that the actual type is unnamed
                                   // dyn suggests that there is dynamic dispatch

dyn would make sense to me because there is dynamic dispatch involved unless the compiler can infer that it is not required in a particular scenario. (Maybe this is nonsense. I'm just suggesting it in case it's not 😄)

MajorBreakfast on 3 May 2018

👍1

@MajorBreakfast The caller doesn't know (or care) whether the function is using auto-generated enums or not: everything works normally. So your example will work.

As for the syntax, my understanding is that dyn Trait is already used for trait objects, e.g. impl dyn Trait { ... }

And the performance characteristics (and behavior) of auto-generated enums is different from trait objects, so I'm not sure if it's a good idea to try and associate them together.

Pauan on 3 May 2018

As for the syntax, my understanding is that dyn Trait is already used for trait objects, e.g. impl dyn Trait { ... }

Isn't this effectively a trait object on the stack instead of the heap? If not, where is the difference?

Edit: The difference is the size of course, duh o_O Wasn't thinking right when I wrote this. The question is: Is it close enough to call it dyn?

MajorBreakfast on 3 May 2018

👍1

I agree with @Ixrec that the marker should be on the signature for convenience, but dropped in rustdoc because it's irrelevant for API compat (comment by @Ixrec)
I don't quite like enum as additional keyword
- It's only half the story (data layout). The other half is dynamic dispatch.
- The value does not behave like an enum. It's all hidden

MajorBreakfast on 3 May 2018

👍1

@MajorBreakfast Aside from the performance, there's also the fact that trait objects have type erasure: a Box<dyn Trait> can be anything that implements that trait. Whereas an auto-generated enum has a very specific and known set of types.

As for the syntax, my point is that the dyn Trait syntax is already being used, so it might not be feasible or desirable to use it for auto-generated enums.

It's only half the story (data layout). The other half is dynamic dispatch.

The "dynamic dispatch" is simply a match, which is the normal way of using enum. There's nothing special about it.

The value does not behave like an enum. It's all hidden

But it does behave exactly like an enum. The fact that it is an unnameable type (just like closures) doesn't change its behavior or how the programmer thinks about it.

Just like how programmers can reason about closures, even though their exact layout is unspecified (and they are unnameable), the same is true with auto-generated enums.

Pauan on 3 May 2018

👍1

Aside from the performance, there's also the fact that trait objects have type erasure: a Box can be anything that implements that trait. Whereas an auto-generated enum has a very specific and known set of types.

From the user's perspective this is also type erasure. The types are only known to the compiler.

The "dynamic dispatch" is simply a match, which is the normal way of using enum.

The match that @Nemo157 mentions here only exists in generated code. I think the example he gives is more for illustration and it actually simulates how a trait object would redirect the call to the correct implementation.

But it does behave exactly like an enum.

No, you can't match on it.

MajorBreakfast on 3 May 2018

From the user's perspective this is also type erasure. The types are only known to the compiler.

Sure, it is a form of type erasure, but it still feels qualitatively different from Box<dyn Trait>. I can't quite articulate why it feels different for me.

The match that @Nemo157 mentions here only exists in generated code. [...] No, you can't match on it.

Of course that's a natural consequence of it being unnameable, but the performance and behavior should still be the same as an enum.

Pauan on 3 May 2018

@Pauan

but it still feels qualitatively different from Box. I can't quite articulate why it feels different for me.

Differences:

As you said, dyn Trait can be a lot of types. This one can only be one of a few types mentioned inside the function.
A dyn Trait is unsized. At runtime it has a size and it's as big as it needs to be. This one is an enum, so it's size is known at compile time and it's as big as the largest of its variants.

Although I think the two are quite similar, I also think you're right for not wanting to call it a dyn.

but the performance and behavior should still be the same as an enum.

Performance, yes. But, all enum-ish behaviour isn't visible to the user. That's why I suggest not calling it an enum. If we can come up with something better that is ^^' (Making sum a keyword is a bad idea, because it'll break a lot of code for certain)

BTW the Unsized Rvalue RFC introduces unsized types on the stack. It doesn't allow functions to return an unsized value, but this might one day be possible in Rust. Consequently a solution other than an enum might be possible in the future. I still like the solution proposed here, because AFAIK async functions won't be able to support unsized types on the stack because they compile to a state machine.

MajorBreakfast on 3 May 2018

Yes, it does indeed feel very different from Box, because at the end of the day the type is statically known. This should be reason enough.

alexreg on 3 May 2018

I took the evening to throw together an experimental proc-macro based implementation: https://github.com/Nemo157/impl_sum

There's some big limitations documented in the readme, with probably other stuff I forgot/didn't notice, but if anyone else wants to experiment with this there's something to work with there now. (If you have any implementation comments/issues feel free to open issues on that repo to avoid cluttering this issue).

Nemo157 on 3 May 2018

👍4

Re: syntax, what about an attribute in the type signature (not actually sure if attributes are allowed here but w/e)

fn do_something() -> Vec<#[auto_enum] impl Trait> {
    ...
}

Attributes are not typically considered part of the type signature anyway, so there's no problem with it being in the return type position.

Diggsey on 5 May 2018

An attribute in the type sig? That’s some super-ugly syntax. Plus there’s no precedent for it. The enum keyword makes more sense to me.

alexreg on 5 May 2018

👍2

Out of all the proposals here something like #[marker] on the function itself makes most sense to me. In particular there are too many macros that just return so that a marker on the return position makes no sense.

mitsuhiko on 10 May 2018

@mitsuhiko The thing is, this functionality can't be properly replicated by a (procedural) macro. So making it look like it is a macro is just deceptive at best.

alexreg on 10 May 2018

@mitsuhiko what macros are you thinking of? The only only I can think of is try!/? but wanting the error type to be an auto-generated sum type seems unlikely to me.

One extra difficulty might be supporting closure transforms, would it be possible to support a function like this where the sum type for impl Display happens inside an inner closure:

fn load(input: Option<&str>, number: bool) -> Option<impl Display> {
    input.map(|v| {
        if number {
            v.parse::<i32>().unwrap()
        } else {
            v.into_owned()
        }
    })
}

This example could also be extended to have 2 of the branches inside the closure, and an additional branch or 2 outside it.

Nemo157 on 11 May 2018

@Nemo157 I can't judge how likely it is that errors might not be sum types here as we cannot predict what will happen in the future. I also think that modifiers on return are significantly harder to understand for users (and make the language more complex) than an attribute. Let alone that there are implied returns.

About which macros it affects: a lot of Rust projects have their own try macros. Almost all of mine have some custom try!/unwrap! type macros. The failure crate has custom "error throwing" macros etc.

@alexreg why can a procedural macro not replicate it? But regardless there are lots of compiler internals that are implemented as special macros or attributes so this would not be completely new to the language.

mitsuhiko on 11 May 2018

@mitsuhiko With a proposal like #[marker] on the function itself (as opposed to return type), how would you type things like this? (here using marker on return type for clarity)

let foo: impl Display = if bar { "foo".to_owned() } else { 5 };
println!("{}", foo);

I can understand the idea of having a marker on return type (and then the #[marker] syntax looks ugly to me, having -> Option<#[marker] impl Display> for @Nemo157's example, and I think another syntax would be better), but I don't really get the idea of having a marker on the function itself.

In my mind this is more a debate of how we want to say to Rust “Please wrap this value in an anonymous enum for me” and/or “Please make an anonymous enum out of these values”.

I prefer the first option (in part because I don't see a clear way for the user to understand from which values exactly the compiler will infer the type) And so I think the most intuitive is marker-on-return-site, but marker-on-return-type might make sense to.

Actually, to understand my reason given in parenthesis above, here is an example of why I feel uneasy about the return-type marker option:

fn foo() -> marker impl Trait {
    let bar = if test() { something() } else { somethingelse() };
    if othertest() { bar } else { stillotherstuff() }
}

Assuming something, somethingelse and stillotherstuff all return different types implementing Trait, not knowing how the compiler is implemented I can't really guess whether this will build or not. Is the type forced at the let bar boundary? Is it left “in progress”?

The advantage of the return-site marker option is that it makes things “explicitly implicit”: when encountering the marker, the value is wrapped in an anonymous enum ready to be extended, and when the being-built anonymous enum hits a type, it is realized. While with the return-type marker, the question is “which are the paths considered by the compiler as leading to the return-type marker?”, which I think can't be answered without a clear understanding of the internals.

About the issue of macros that return, they could just add the marker on each return site: if a single type is ever encountered by an anonymous enum, it will be an anonymous enum with a single member, which could (should?) actually be returned as the said member -- thus being a noop when there is no need for anonymous enums, and automatically adding anonymous enum capability when asked for.

Ekleog on 11 May 2018

@Ekleog

@mitsuhiko With a proposal like #[marker] on the function itself (as opposed to return type), how would you type things like this? (here using marker on return type for clarity)

let foo: impl Display = #[marker] {
    if bar { "foo".to_owned() } else { 5 }
};
println!("{}", foo);

Also I do wonder if the marker could not just go entirely. If the impact of that generated type is not too big then it might just be that this could be an acceptable solution to begin with. Hard to tell though.

mitsuhiko on 11 May 2018

@mitsuhiko So between

fn foo() -> marker impl Trait {
    let foo: marker impl Trait = if bar() { baz() } else { quux() };
    if x() { foo } else { y() }
}

and

#[marker]
fn foo() -> impl Trait {
    let foo: impl Trait = #[marker] {
        if bar() { baz() } else { quux() }
    };
    if x() { foo } else { y() }
}

you'd rather have the second one? (comparing to return-site as that's the closest to your proposal, with the smallest non-trivial example I could manage)

If so I think we can only agree to disagree :)

Ekleog on 11 May 2018

I just don't see a reason why this should become syntax in the first place. If it's such a great feature and the performance impact is a massive deal then it can still migrate from an attribute to real syntax later.

mitsuhiko on 11 May 2018

@mitsuhiko

Also I do wonder if the marker could not just go entirely.

I think there should be a marker:

Rust likes to make things explicit. If you've got two types with different sizes they get combined into an enum with the size of the bigger type plus discriminant. Should this really be hidden?
An enum is not the only way to solve this. Currently Rust does not support dynamically sized rvalues. It is however likely that this is going to change in the future.

MajorBreakfast on 11 May 2018

Also to further add to my stance on attributes: even async/await started out with not introducing new syntax. This is a fringe feature in comparison.

mitsuhiko on 11 May 2018

I'm personally fine with using an attribute-like syntax for this, but I will note that it is 100% impossible to implement as a proc-macro (even looking at other proposed extensions to the type system like delegation, I'm certain that this will still not be possible anytime in the near future).

If there were a marker at return sites then it may be possible to implement this as some sort of syntax extension, or a limited proc-macro that only supports a pre-registered set of enums. Having a marker is not unprecedented as this is similar to a non-allocating, constrained version of boxing, which uses Box::new to wrap the return values:

#[marker]
fn foo() -> impl Trait {
    let foo: impl Trait = {
        if bar() { marker!(baz()) } else { marker!(quux()) }
    };
    if x() { marker!(foo) } else { marker!(y()) }
}

The versions that use either just a marker on the function, or a marker on the return type, are probably not implementable even as a syntax extension. These would need to tie in to type inference in order to detect where in the function the returned values do not unify and inject the necessary wrapping code to make it work.

Nemo157 on 11 May 2018

👍1

@Nemo157 Would the following be possible with a compiler built-in?

fn foo() {
    let x: impl Trait = {
        if bar() { marker!(baz()) } else { marker!(quux()) }
    };
}

MajorBreakfast on 11 May 2018

I believe the intention is to eventually allow impl Trait in more places, eg.

type X = impl Debug;

fn foo() -> X {
    "Hi!"
}

So you could use a syntax where the "automatic enum" is defined separately:

enum X = impl Debug;

fn foo(a: bool) -> X {
    if a { "Hi!".into() } else { 42.into() }
}

Diggsey on 11 May 2018

@MajorBreakfast yes, I believe so.

Nemo157 on 11 May 2018

@mitsuhiko

@alexreg why can a procedural macro not replicate it? But regardless there are lots of compiler internals that are implemented as special macros or attributes so this would not be completely new to the language.

Ask @Nemo157, since he prototyped the implementation, but I believe it would be very difficult at best, if not downright impossible under the current proc_macro2 implementation, due to having to mess with the actual AST at a fine-grained level. I could be wrong, but I'll let him answer that.

Anyway, not sure what you mean by "compiler internals that are implemented as special macros or attributes", but actually the macros defined by Rust itself are not special-cased... they could be implemented by declarative or proc macros in a separate crate, if you wanted to.

alexreg on 11 May 2018

@alexreg as an example the await! macro is not a macro but a compiler builtin.

mitsuhiko on 11 May 2018

@mitsuhiko Sure, but it might as well be implemented as a macro. enum is a different sort of beast.

alexreg on 11 May 2018

@alexreg i really don't want to derail this topic any further but the current proposal for await! cannot be implemented as a plugin as far as I understand the RFC. In any case it's not exactly relevant to the point I was making.

mitsuhiko on 11 May 2018

OK I would like to enter this discussion. As @Ekleog showed, this feature can already be easily implemented manually by the programmer by creating a new enum type to hold all the different return types. So this feature doesn't add any new capabilities to the language. That being said, I think this feature is pretty cool. It make the language more accessible because of two main reasons. It makes this use case of impl Trait more ergonomic and cuts a lot of boilerplate.

So if the goal of this feature is cutting boilerplate and making the language more ergonomic it would make sense to only have to use the maker once in the function declaration instead of in each return site. Note that, the maker also has to be added in let expressions, and again, in the spirit of cutting boilerplate and making things more ergonomic, it makes more sense to use the marker only inside in the type instead of multiple times in the return statements.

Using the logic stated above this leaves us with two options, since both of this options use the marker only once.

fn foo() -> marker impl Trait {
    let foo: marker impl Trait = if bar() { baz() } else { quux() };
    if x() { foo } else { y() }
}

and

#[marker]
fn foo() -> impl Trait {
    let foo: impl Trait = #[marker] {
        if bar() { baz() } else { quux() }
    };
    if x() { foo } else { y() }
}

Both of these syntaxes use the maker only once (per let or per function) and therefore are in the spirit of this RFC. If you use the marker on each return site, than their is less of an incentive for this RFC to exist. After all, the only code that the feature would save you is declaring the enum by hand. Introducing a new syntax just to avoid declaring a enum seams a little excessive. I mean, it could still be done, but we would have less of a win in our hands.

If you are still not convinced about the debate where to put the maker, I have one more argument to try and convince you. This other argument not only says that we should use the maker only once, it also says what that marker should be and why it should be that way. In the following I will make a case for this particular syntax:

fn foo() -> enum impl Trait {
    let foo: enum impl Trait = if bar() { baz() } else { quux() };
    if x() { foo } else { y() }
}

My argument is about teaching and learning rust. Rust is a fairly complicated language. New programmers are constantly fighting with the compiler. In order to mitigate this fighting the compiler often suggests changes to your code. This suggestions make the learning experience much less frustrating. Add a keyword somewhere and suddenly your code not only compiles but also works as expected (assuming the logic is correct). This experience is sort of magic and very satisfying when it works. The syntax proposed above can have this property. The compiler can show you the error of the type mismatch, but can also suggest that you add a single enum in the appropriate place to solve the problem. Once the new rustacean inserts the suggested enum keyword in his function declaration or let statement, his code will magically work. He might not understand exactly why it works, but it will work. Once his code compiles he might try to find some documentation and find out what is happening. So he will do a search for something like "rust enum impl". He will then find a blog post, or reddit post or the Book or whatever that contains the appropriate explanation. He will than learn that enum impl Trait means exactly what it says on the tin. eg. the compiler is creating an anonymous enum of the return types of your function or let statement, and all members of that enum have impl Trait. Basically the compiler is creating an enum in which all members implement a particular Trait. Hence enum impl Trait.

Paluth on 11 May 2018

I just want to add that I think an annotation above the function is unintuitive. Such an annotation makes sense if it affects the whole function, e.g. like the #[test] annotation. In contrast to that, this marker just affects the return type and therefore should be near it or at the return sites.

After all, the only code that the feature would save you is declaring the enum by hand.

@Paluth Not really correct. As discussed above the enum is just the data structure. It doesn't act like an enum: You can't match on it. Instead you can call all the methods of the Trait(s). The code @Ekleog shows here requires the user to match on the enum. The code that @Nemo157 shows here is impractical to write by hand.

I agree with the things that you say about teachability.

MajorBreakfast on 11 May 2018

@Paluth Just, for the impracticability of writing enums by hand, here is a real-life example of where it is a pain to maintain, especially every time I add a return site to the function I must come back to this file and change everything.

About teachability, I mostly agree with you, but I think the compiler could suggest adding markers at return site too? That said it'd likely be a mess to see, as the compiler would have to point to the two places where the markers would have to be added, and ascii art can only do so much.

Ekleog on 11 May 2018

👍1

@Ekleog Damn. This is some real spaghetti code! Descriptive file name, though 😄

MajorBreakfast on 11 May 2018

To add to what @MajorBreakfast just said about annotating the function, it also doesn't make sense for all use cases. Given a function signature like

fn foo() -> Result<impl Read, impl Error + Debug>

you may want to return multiple possible readers, but have a specific error type in mind that you just don't want to publicly name yet.

This sort of usecase is pushing me towards the marker on the return type syntax, either an attribute like @Diggsey suggested above or a keyword, that would allow writing this signature like:

fn foo() -> Result<#[marker] impl io::Read, impl Error + Debug>

and get the auto-generated sum type for only one of the existential types.

It also seems easier to extend to named existential types, the same marker could be used when declaring the type:

existential type Foo: #[marker] io::Read;

fn foo() -> Result<Foo, impl Error + Debug>;

The other form I am currently considering as being a relatively strong contender is having just a marker on each return value, in contrast to what @Paluth says above I believe the overhead of writing the boilerplate to do the delegation (here's what it looks like for an enum over io::Read + io::Write for a single variant) vs the overhead of adding a single annotation at each return site (which you would have to do when boxing anyway) makes any kind of sugar for this worth it.

One downside of this form is that it is relatively easy to do on a case by case basis as a purely library implementation, re-using an example from earlier you could imagine taking the existing either crate and adding delegating trait implementations to it:

fn foo() -> impl Trait {
    match x() {
        Some(foo) => Either::A(foo),
        None => Either::B(y()),
    }
}

I still believe that providing builtin support is better than this for a couple of reasons:

This suffers from the same issue a proc-macro based implementation does, it requires someone to pre-declare all traits that it works, which requires either a lot of boilerplate¹, a rather heinous proc-macro to generate the boilerplate or a more powerful delegation than has been proposed as an RFC yet.
Changing this method suddenly adds a lot more churn, say x() changed to return a ternary value, now you would have to switch from Either to some other Either3 form:
```
fn foo() -> impl Trait {
    match x() {
        First(foo) => Either3::A(foo),
        Second(bar) => Either3::B(bar),
        None => Either3::C(y()),
    }
}
```
(pre-post edit: and @Ekleog links to a representation of just such this churn 😄)

_{1: this is only for a single number of variants, you would need to repeat this for all 1..n enums to support up to n variants}

Nemo157 on 11 May 2018

👍2

@mitsuhiko I'm pretty sure it can be... but I'll let others more knowledgeable confirm or deny.

alexreg on 11 May 2018

@MajorBreakfast you are right about not being able to match on the return value of a fn foo() -> enum impl Trait, and therefore you could argue that the return type of foo doesn't really represent an enum since it doesn't behave like one. But it would hardly make sense to try to match against and anonymous enum. Since the enum is anonymous you don't know what it looks like and therefore you can't provide a pattern that would make sense, unless the pattern was generic like match x { a => ... } or match x { _ => .... }, but that type of match doesn't do anything. So one could argue that by definition, an anonymous enum is unmatchable. But all this is kinda of off-topic, and even a bit pedantic. What really matter to the user of rust is that the return type of foo automatically implements Trait, and that it auto-generates the match expression needed to delegate the calls of the Trait methods to the return values of foo.

As @Ekleog showed and @Nemo157 reinforced, the auto-generated match statement to delegate the method calls of Trait can save a lot of boilerplate code and therefore would easy justify a new syntax even it it meant you have to add it to every return site. I underestimated the amount of code that the auto match saves the user.

That being said, I still fail to see any reason why adding an annotation to each return site is better than adding a single annotation on the function return type, or the let type. If the user is going to have to write more code to get the same result, than we need a good reason to make it that way. Could you guys elaborate what those reasons are? By that I mean, what feature does annotating at each return site provide over annotating once on the type?

Paluth on 11 May 2018

I think I have a new desideratum to add to the pile: consistency with possible syntax for anonymous enums (not autogenerated enums).

fn foo() -> #[marker] impl Debug {
    if(...) { A::new() } else { B::new() }
}

fn bar -> #[marker] (A | B) {
    if(...) { A::new() } else { B::new() }
}

So pretend for a moment that we want (A | B) to be the syntax for an enum type with no name and variants of types A and B. Despite being nameless, this type is not hidden by impl Trait so bar()'s callers could match on it. Presumably, if we ever added this, we'd also like bar() to compile more or less the way I wrote it, rather than requiring something like (A|B)::A(A::new()) to explicitly create a value of that anonymous enum type (we'd probably need that syntax somewhere, but imo we shouldn't need it for this).

If we'd want some kind of marker on anonymous enum return types to opt-in to this implicit wrapping behavior, I assume we'd want it to be the same marker that we use for autogenerated enum return types that also do this sort of implicit wrapping (albeit with a hidden autogenerated type you couldn't explicitly refer to anyway). This gives us an argument against using enum as the marker: enum (A|B) looks pretty redundant when (A|B) is already an enum type. Of course, it's also conceivable that we'd want no marker at all in the anonymous enum case, or no implicit wrapping for anonymous enums, or no anonymous enums at all (I have no strong opinions here yet). Thoughts?

Ixrec on 11 May 2018

@Paluth My reasoning is mostly the one put forward at https://github.com/rust-lang/rfcs/issues/2414#issuecomment-383755348 plus what I completely failed to explain clearly at https://github.com/rust-lang/rfcs/issues/2414#issuecomment-384144887.

I'll try to explain it another way: I think the advantage of marker-at-return-site is demonstrated by the following code:

fn foo() -> impl Trait {
    let a = if b() { marker c() } else { marker d() };
    if e() { a } else { marker f() }
}

That is, being able to have variables that are still-lifted-in-not-completed-enum-type-yet.

On the other hand, with marker-at-return-type, the code would have to look like (in order not to be dependent on compiler internals, optimization level and the like):

fn foo() -> marker impl Trait {
    let a: marker impl Trait = if b() { c() } else { d() };
    if e() { a } else { f() }
}

I prefer the first syntax, because marker would mean “Please use this value to build whatever enum I'll want later on,” and because I feel it'd make for easier refactoring (as the marker can be basically anywhere and is just a noop when not actually used by a merge point anywhere). OTOH, the second syntax requires explicit type annotation in the let binding (which I try to minimize in the code I write), and even requires annotating at every point where a type conflict could appear.

There is also a question about the marker-at-return-type option: what about this?

let a: marker impl Trait = match foo() {
    Foo1 => if bar() { baz() } else { quux() },
    Foo2 => iwantmorenames(),
}

Should the compiler be able to infer that the baz and quux calls must be lifted in an anonymous enum? Should it just lift in the anonymous enum the match?

Actually, writing this I think I understand better why I prefer the marker-at-return-site option:

Marker-at-return-type means “take all the places that point to here and merge them into an anonymous enum.” Which is, IMHO, problematic as what the compiler will consider as “places that point to here” will likely be highly implementation-dependent, or even optimization-dependent (see eg. https://github.com/rust-lang/rust/issues/42974 for a case where the syntax changes depending on optimizations -- I don't think we'd want that on stable)
Marker-at-return-site means “from here until the next point where the type is forced, this variable is an anonymous enum into which any other anonymous enum can be merged.” Which has the nice property of having a beginning and an end, and so of being easily understandable, and there is no second-guessing what the compiler would consider a “path that points to here.” :)

@Ixrec I think the question of “what should marker be” is a bit early, like painting drawings on the bikeshed when the background color is not picked yet :)

That said, the question of anonymous enums you raised is interesting indeed. And I'd argue it's a (not very strong at all) argument in favor of marker-at-return-site: with marker-at-return-site, the marker is decorrelated from the return type, thus it makes for a consistent syntax and straightforward path to supporting anonymous enums. It'd just require allowing forcing the type to an anonymous enum from a marker'd enum.

Basically, your example would look like:

fn bar() -> (A | B) {
    if(...) { marker A::new() } else { marker B::new() }
}

Where marker would implicitly be (A|B)::A().

Ekleog on 12 May 2018

👍1

@Ixrec There's been no discussion of untagged enums outside of FFI, that I know of. I'm not against the proposal of anonymous enums or structs though, in general.

I'd imagine untagged enums as return types would work something like this:

fn bar() -> enum { A(i32), B } {
    if(...) { A(123) } else { B }
}

So while we don't have to use the enum keyword for auto-generated sum types, I see nothing precluding it.

alexreg on 12 May 2018

@Ekleog

fn foo() -> impl Trait {
    let a = if b() { marker c() } else { marker d() };
    if e() { a } else { marker f() }
}

What type is a?

Why doesn't this require let a: impl Trait?

Does this create two different enums (one for a and one for the return type of foo)?

If so, why doesn't this require if e() { marker a } else { marker f() }?

Pauan on 12 May 2018

While I'm not necessarily advocating this (I would prefer to avoid going down this road), as an alternative to generating a type solely for the return value, has anyone considered the idea of Ocaml-style sum types, of the kind that use constructors starting with a backquote?

Personally, though, I'd prefer to avoid putting this in the language, and instead provide a mechanism to simplify the creation of sum types on the fly.

joshtriplett on 12 May 2018

❤1

@Ekleog thank for the more detailed explanation. I might have misunderstood what you are trying to say, and if that is the case, I apologize. However if I understood what you were trying to say correctly, then that means you might be a little confused about how this feature will work. I will try to explain it more clearly.

Lets say that we have some code like this:

let a: impl Trait = expression

What can we infer from the expression? Well a mathematician or type theorist might come up with sorts of conclusions, but I'm neither of those, so my conclusions will be limited. I can infer two things from expression, according to how rust currently work.

1 - The type of every value that expression can return will have to implement Trait
2 - The type of every value that expression can return will have to be the same

Unless there is a bug in the compiler, the compiler can already guarantee points 1 and 2. How does the compiler know that? Well for every valid expression the compiler can determine all the return sites with a 100% precision. Not only that, the compiler can also determine the type of the values that are returned in all of the return sites. These abilities that the compiler has, are not implementation-dependent. They are deterministic and and every implementation of rust, regardless of who or how its made, will have to achieve that. If they don't achieve points 1 and 2, it either means they are incompatible with rustc or that they have a bug.

OK, so what does all of that have to do with the current discussion? What the proposed feature is trying to do is eliminate point 2. So lets look at some code examples to see how all of this applies.

let a: marker impl Trait = match foo() {
    Foo1 => if bar() { baz() } else { quux() },
    Foo2 => iwantmorenames(),
}

Should the compiler be able to infer that the baz and quux calls must be lifted in an anonymous enum?
Should it just lift in the anonymous enum the match?

Absolutely! Not only should the compiler be able do infer that in the future (should this feature ever make into rust), but it kinda of already does. If you were to write such code today (without the marker of course), the current compiler (rustc 1.26) will be able the determine all tree return points with 100% deterministic precision. It will also be able to determine the types of the values returned by baz, quuk and iwantmorenames functions. Then it will processed to check points 1 and 2. The only difference this RFC would introduce is the following: if point 2 fails, but point 1 still stands, then it will "lift" those values into the anonymous enum. Notice that in this particular example, it would not "lift" the function calls, it will "lift" whatever the return values of those calls are.

Paluth on 12 May 2018

@joshtriplett as was said earlier in the thread, the problem at hand is not so much about generating an 'enum'. The main problem is having to manually implement the match expression on said 'enum'. As the 'enum' grows (because you have more return types), the matching get worst. This feature proposal would eliminate the need for manually writing the match.

Paluth on 12 May 2018

@joshtriplett That's requiring an awful lot of boilerplate though, as @Paluth is pointing out. I don't know anything about OCaml sum types. How do they work?

alexreg on 12 May 2018

@Ekleog

Has already been pointed out by @Paluth:

fn foo() -> impl Trait {
    let a = if b() { marker c() } else { marker d() };
    if e() { a } else { marker f() } // <-- `a` needs marker in front
}

If we'd want some kind of marker on anonymous enum return types to opt-in to this implicit wrapping behavior, I assume we'd want it to be the same marker that we use for autogenerated enum return types

@Ixrec I highly discourage this. The anonymous enum feature you're mentioning produces an actual enum that can be used in match expressions. This feature OTOH does not. That's why I don't even recommend using the enum keyword. We should not strive for similarity between these two features.

About the discussion whether to put the marker in the type or at return sites:

fn foo() -> impl Trait {
    let a: Result<marker impl Trait, String> = if cond1() { Ok(f1()) } else { f2() };
    let b: marker impl Trait = if cond2() { a.unwrap() } else { f3() };
    b
}

I'd prefer to add it to the type:

This is where the action is: This is were all the variants are combined into the single sum type. Markers at the return sites OTOH are only for one variant. The type is however determined by all variants
Convenience: A single place means less typing

MajorBreakfast on 12 May 2018

👍2

Maybe add the marker at the trait declaration site? It cannot be made to work for all traits anyway; there are certain requirements for the trait, which are similar, but weaker than, object-safety; meanwhile, explicit annotations for the latter have been proposed on internals, so there is some precedent here.

My suggestion:

Allow the user to mark trait Trait {} blocks as #[additive]. If this attribute is present, the compiler checks that:
- All the trait's methods have Self in the argument (i.e. contravariant) position;
- All its associated types have additive traits as bounds and appear only in the return (i.e. covariant) position of its methods;
- The trait has no associated constants.
If Trait is additive, methods with a declared impl Trait type are allowed to pass values of different types as return values at different exit points. The actual underlying return type will be an anonymous coproduct type as proposed here.
If Rust ever gets proper anonymous enum types, the same marker would also mean that (T0|T1|T2|...) implements the trait whenever each of T0, T1, etc. implement it, by the same mechanism.
By the same token, marking Trait as additive ensures that an impl Trait for ! is available. (Bringing some resolution to another thorny issue.)

Oh wait, the associated types part won't be so easy. After all, there's impl Iterator<Item=T> where T should be treated like a concrete type. But something along the above lines.

fstirlitz on 12 May 2018

@Ixrec I wouldn't want automatic injection into (A|B) types any more than I'd want automatic projection from (A, B). (Or, for that matter, than automatic injection into Result<A, B> or Option<A>.)

Off-topic

A previously proposed construction/deconstruction syntax was (some_a|!) resp. (!|some_b), which is "shaped-based" like tuples are; another possibility would be taking inspiration from tuples' numeric field access, and doing something like 0(some_a) and 1(some_b), although that's a bit weird (and I'm not sure if it's syntactically unambiguous). Anyway, I think this has been discussed in the RFC PRs and issues about it.

glaebhoerl on 12 May 2018

I think that From/Into handle injection fine, but if those prove ambiguous then the macro might generate .into_enum()/.into_sum() methods. Also, injecting non-explicitly might simply happen for other reasons, assuming folks do not rush into this.

I could imagine eventually optimizing trait objects into enums, or at least not being DSTs, when their size can be determined at compile time. If so, then roughly this works:

fn foo() -> impl Trait {  // dyn Trait : Trait
    // Some auto_enum!{Trait} macro generates the following replacing $t:
    trait Summand$t { }
    impl Summand$t for Foo {}
    impl Summand$t for Bar {}
    type AutoSum$t = dyn Trait+Summand$t;  // Not a DST because Summand$t is not exported
    // regular code:
    ...  return x;  ...  // Conversion form T: Trait to dyn Trait is automatic.
}

burdges on 12 May 2018

@burdges Trait objects can be a lot more efficient than enum types when the number of variants is large, so they probably won't be going anywhere.

alexreg on 12 May 2018

@Pauan (from here)

The reason why this doesn't require : impl Trait on the let binding is what I've tried to explain in https://github.com/rust-lang/rfcs/issues/2414#issuecomment-384144887, with the TypeInProgress type (esp. the first code example -- actually I just noticed I added an unnecessary marker in the if branch there, as x already had TypeInProgress(…) type). I'm not completely sure it could be implemented in the compiler, but can't see a reason why it couldn't.

Basically, the idea is that so long as no type is enforced by a type annotation or passing to a function call, the compiler builds an ever-growing enum, and when a type is enforced (eg. with : impl Trait or : (A|B) if we have anonymous enums with this syntax, or foo(x) if foo imposes type restrictions on the value, or even unification with another if/match branch), then the type of the variable “retroactively” becomes this type (if it matches). Don't be scared by the “retroactively” word, it's just like the current behaviour of {integer}, except with custom traits instead of just integers.

I'm not a functional developer, so am not sure this is the right way to put it, but in my mind marker would “lift” the value into a TypeInProgress “monad”, and forcing the type of such a variable would execute the “monad” and recover its result.

Now, things become harder when considering types like Result<u8, impl Fail>, and that's why I'm not completely sure it is possible to implement (even though it appears to work not-so-bad with {integer} currently): something like this should compile:

fn foo() -> Result<u8, impl Fail> {
    if a() { Ok(0) } //: Result<{integer}, _>
    else if b() { Err(marker c()) } //: Result<{integer}, TypeInProgress(C)>
    else { Err(marker d()) } //: Result<{integer}, TypeInProgress(Fail, enum { C, D })>
}

And here, hoisting TypeInProgress into other template structures may hide pitfalls for properly implementing this (or maybe not, at least that's what I'm hoping for).

@joshtriplett (from here)

This sounds like the return-site-marker approach, adding in that the markers are named. But then I'm curious, do you think this should compile?

fn foo() -> u8 {
    if bar() { `a 0 } else { `b 1 }
}

If it should compile, then it's going pretty far away from OCaml-style anonymous enums. If it shouldn't, then it makes try!-like macros hard to write, while with the return-site-marker approach the try!-like macros can just add marker everywhere and it becomes a noop if it's unused.

@Paluth (from here)

Unless there is a bug in the compiler, the compiler can already guarantee points 1 and 2. How does the compiler know that? Well for every valid expression the compiler can determine all the return sites with a 100% precision. Not only that, the compiler can also determine the type of the values that are returned in all of the return sites. These abilities that the compiler has, are not implementation-dependent. They are deterministic and and every implementation of rust, regardless of who or how its made, will have to achieve that. If they don't achieve points 1 and 2, it either means they are incompatible with rustc or that they have a bug.

Not necessarily. Taking back the match-if example (let's assume for now all these functions return Foo):

let a: Foo = match foo() {
    Foo1 => if bar() { baz() } else { quux() },
    Foo2 => iwantmorenames(),
}

I don't know the Rust compiler specifically, but most compilers are built like this:

Parse the file into an AST like (simplified here:

let-binding "a": Foo
 → match (foo())
    → case Foo1: if (bar())
       → true: baz()
       → false: quux()
    → case Foo2: iwantmorenames()

Typecheck the result:

let-binding "a": Foo
 → match (foo()): Foo
    → case Foo1: if (bar()): Foo
       → true: baz(): Foo
       → false: quux(): Foo
    → case Foo2: iwantmorenames(): Foo

The important thing here is that typing occurs on all AST nodes. Which means that the value of the if/else must have a type.

With the return-type-marker approach, this type is undefined, because baz and quux have different return types, and the if is not constrained by a marker impl Trait type boundary. (that said, you're right in that this shouldn't be dependent on optimizations, I was considering typing occurring later, which would be surprising indeed, even if technically possible)

So actually I'm worried that the return-site-marker approach would

either not work across the match-if example and require an explicit marker, like:

let a: marker impl Trait = match foo() {
    Foo1 => {
        let x: marker impl Trait = if bar() { baz() } else { quux() };
        x
    },
    Foo2 => iwantmorenames(),
}

which would be a big drawback for both usability and learnability.

or work across the match-if example, but then every unannotated merge point between two expressions becomes an anonymous enum, and error reporting for the whole rust compiler becomes a mess when an actual typing error occurs
or, last solution, come from the : marker impl Trait and implicitly expand the marker down in the AST. Which would work for let bindings, but would be harder to make work for function returns (because there can be multiple return points) and would have unexpected effects on refactoring:

trait Trait { fn with_set_something(self, b: bool) -> Self; }
fn f2() -> impl Trait {…}
fn f3() -> impl Trait {…}
// typechecks
fn foo() -> marker impl Trait {
    if bar() {
        (if f1() { f2() } else { f3() }).with_set_something(true)
    } else {
        (if f1() { f2() } else { f3() }).with_set_something(false)
    }
}
// no longer typechecks, requires adding `: marker impl Trait`
fn foo() -> marker impl Trait {
    let x = if f1() { f2() } else { f3() }
    if bar() { x.with_set_something(true) }
    else { x.with_set_something(false) }
}

@MajorBreakfast (from here)

I'd have written your example like:

fn foo() -> impl Trait {
    let a = if cond1() { Ok(marker f1()) } else { f2().map(|x| marker x) };
    if cond2() { a.unwrap() } else { f3().map(|x| marker x) }
}

with the return-site approach. (see my reply to @Pauan above as to why it should be possible to make this work)

It's true that the .map(|x| marker x) is a bit painful to write, but I'm not seeing it as being worse than : Result<marker impl Trait, String> :) (esp. with the drawbacks of the return-site-marker approach raised in my reply to @Paluth above)

@fstirlitz (from here)

I'd think your point on which traits should be auto-derived is mostly orthogonal to the discussion on where to place the marker? I don't think having explicitly-additive enums could avoid markers: we would very likely want Copy to be additive, but then code like:

let x = if foo() { 0u32 } else { 0f32 };
function_expecting_u32(x);

would have an error message like “x is of type AnonymousEnum(Copy, enum { u32, f32 }), expected u32” at the call of function_expecting_u32, which would be very unexpected as the error would be at the 0f32 place.

So to sum up my opinion (and I'm more and more feeling like I'm alone in this opinion, so won't bother you much longer with it :)):

I think a marker, either at return-site or at return-type, is necessary
I think the return-type-marker approach has the inconvenient of not explicitly saying where types start being enum-ified, and as such will likely generate surprising behaviour that'll implicitly leak the internals of the compiler (eg. “typing occurs at the AST phase”), thus hindering learnability
I think the return-site-marker approach has the advantage of being explicitly “from the marker up to the next type enforcement, the type is enum-ifiable with other enum-ifiable types”
The return-type-marker approach is roughly as verbose as the return-site-marker approach as soon as we start considering non-trivial examples where everything is not just coming from a single match (I'd like to say “more verbose” but would likely be contradicted with an example I didn't think of)

And to be fair, the advantage of the return-type approach is that it's closer to the place where type merging is actually done.

Also, just as a last point: I'm sorry for the confusion the “return-site-marker” expression may have spread, but I can't think of any better term, and “marker-lifting-values-to-enum-ifiable-monad” is way too scary to be usable 😁

Ekleog on 12 May 2018

Unrelatedly, @Nadrieril pointed me to a potential pitfall of this proposal:

fn foo<T: Iterator<Item = u8>>(iter: T, n: usize) -> impl Iterator<Item=u8> {
    let mut iter = marker i;
    for _ in 0..n {
        iter = marker iter.enumerate();
    }
    iter
}
// or
fn foo<T: Iterator<Item = u8>>(iter: T, n: usize) -> impl Iterator<Item = u8> {
    let mut iter: marker impl Iterator<Item = u8>> = iter;
    for _ in 0..n {
        iter = iter.enumerate();
    }
    iter
}

So I think the easiest solution for now in order to reject this program that'd require realization of a 2⁶⁴-elements enum is to outright forbid enum-ification in assignment to mut variables for now, and maybe in a follow-up RFC relax this requirement to allow some cases of assignment to mut, just like what is happening with const fn.

Ekleog on 12 May 2018

A proposal:

fn foo_agst(a: bool) -> impl ::std::fmt::Debug
{
    let b: dyn ::std::fmt::Debug = if a {
        7
    } else {
        "Foo"
    };

    b
}

Rationale:

Auto-generated sum types (AGSTs) are pure syntax sugar

It is always possible to manually write the sum type. It's extremely verbose, and if some of your variants are impl Traited themselves it would require writing new type wrappers for those variants as well, but it is possible. So if you really do need to avoid boxing everything for performance reasons it can be done today.

This suggests that any proposal for AGSTs should be judged heavily on the syntax and ergonomic benefits provided.

`impl Trait` in return-type position makes no guarantees to the caller beyond what it says

A function with impl Trait in return-type position today can return boxed trait objects, things that use dynamic dispatch, etc. Adding marking to the return type only in cases where said type happens to be an AGST provides no additional valuer to the caller.

A monomorphizing let binding is already required to return boxed trait objects of differing concrete types through an impl Trait

In other words, this compiles:

fn foo(a: bool) -> Box<::std::fmt::Debug>
{
    if a {
        Box::new(7)
    } else {
        Box::new("Foo")
    }
}

But this doesn't:

fn foo(a: bool) -> impl ::std::fmt::Debug
{
    if a {
        Box::new(7)
    } else {
        Box::new("Foo")
    }
}

But this does:

fn foo(a: bool) -> impl ::std::fmt::Debug
{
    let b: Box<::std::fmt::Debug> = if a {
        Box::new(7)
    } else {
        Box::new("Foo")
    };

    b
}

Since impl Trait as a return type already requires an explicit monomorphizing binding to return different boxed trait objects it makes sense to reuse that spot for an AGST marking.

AGSTs should be seen as a foil to boxed trait objects, not to a monomorphized `impl Trait`

As shown above, boxed trait objects can already be returned through an impl Trait in return-type position (if there exists a impl<T: Trait> Trait for Box<T>, which there usually does). An AGST is not a special case of impl Trait, it's another thing you can shove through an impl Trait-sized hole. But there's no reason they should be solely limited to impl Trait.

Put another way, I should also be able to write

struct DisplayForDebugWrapper<T>(T);

impl<T: Debug> ::std::fmt::Display for DisplayForDebugWrapper<T> {
  fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result {
    write!(f, format!("{:#?}", self.0))
  }
}

fn foo(a: bool) -> impl ::std::fmt::Display {
  let b: dyn ::std::fmt::Debug = if a {
    7
  } else {
    "Foo"
  };

  DisplayForDebugWrapper(b)
}

This will be very important for futures, where authors will want to have parts of a processing pipeline that are conditional and other parts that are not. i.e. my database backend might vary but my code to display values from the database is shared.

The salient performance tradeoff in choosing a sum type vs a boxed trait object is potentially over-allocating space vs having a separate but minimal heap allocation

Both sum types and boxed trait objects use dynamic dispatch, and there's no a priori reason that a sufficiently smart compiler cannot transform a sum type's matches into a vtable-style dispatch if that makes sense. A sum type does require reserving space for its largest variant up front though, while a boxed trait object allows a minimal allocation.

If Rust did grow AGSTs I expect the primary performance pitfall would be programmers auto-summing a "frequent and small" variant with an "infrequent but enormous" variant.

This further suggests that the difference in syntax vis-a-vis boxed trait objects should highlight the boxing.

The "should I box it or just pass it around unboxed" question already exists throughout Rust

This is not new cognitive overhead for a Rust programmer concerned about performance.

The reason people don't like boxed trait objects is not usually performance, it's that boxing is annoying.

The syntax overhead of boxing things gets rather annoying. Most programs won't notice the overhead of dynamic dispatch or boxing, but their authors will notice the overhead of typing Box::new(x) all over the place.

Boxed trait objects are already becoming `Box<dyn Trait>`, so AGSTs as `dyn Trait` are a clear unboxed analog

i.e. we'll eventually have

fn foo_box(a: bool) -> impl ::std::fmt::Debug
{
    let b: Box<dyn ::std::fmt::Debug> = if a {
        Box::new(7)
    } else {
        Box::new("Foo")
    };

    b
}

fn foo_agst(a: bool) -> impl ::std::fmt::Debug
{
    let b: dyn ::std::fmt::Debug = if a {
        7
    } else {
        "Foo"
    };

    b
}

Both use dynamic dispatch, and the salient difference on the heap allocation is precisely reflected in the syntax differences.

khuey on 12 May 2018

👍4

This looks interesting, even though this would add some sized and usable-without-fat-pointers dyn Trait, which may be unexpected. That said, I have a question.

With boxing, we can do things like:

fn foo(a: bool) -> impl Debug
{
    if a {
        Box::new(7) as Box<Debug>
    } else {
        Box::new("Foo")
    }
}

(ie. without a separate let).

This is the equivalent of return-site marker, and the separate-let solution you raised is the equivalent of return-type marker.

Do you think

fn foo(a: bool) -> impl Debug
{
    if a {
        7 as dyn Debug
    } else {
        "Foo"
    }
}

should be made to compile?

If so, then we get both return-type-marker and return-site-marker (although here it is actually return-site-marker and not marker-lifting-values-to-enum-ifiable-monad), as one prefers.

Also, then there is the question of how to put it in the return-type position for use with ?, as we would likely not want every use of ? to generate an enum. (that, or saying that one-type autogenerated enums can be converted to the said one type)

That said, @Nadrieril pointed out on IRC a potential unexpected behaviour of this proposal:

fn foo(a: bool) -> Box<dyn ::std::fmt::Debug> {
    let b: dyn ::std::fmt::Debug = if a {
        7
    } else {
        "Foo"
    };

    Box::new(b)
}

Here, the user would expect a single dynamic dispatch, but (apart from potential compiler optimizations) two would occur: one at the Box level, and then another one for the enum. Then, that's likely an optimization question. :)

Ekleog on 12 May 2018

@Ekleog

So I think the easiest solution for now in order to reject this program that'd require realization of a 2⁶⁴-elements enum is to outright forbid enum-ification in assignment to mut variables for now

I don't think the issue there is that it's assigning to a mut variable. The issue is that it's generating recursive types. When initially assigning let mut iter = marker iter; you start the sum type off with typeof(iter) = { T | ... } (to pick some arbitrary syntax for unfinished AGST I hope is relatively easy to understand). Then when assigning iter = marker iter.enumerate() you extend this to typeof(iter) = { T | Enumerate<typeof(iter)> | ... }. You now have a recursive type, which cannot be supported by Rust.

This probably results in a very similar limitation in practice, but I believe is a more correct way to look at it.

@khuey

So if you really do need to avoid boxing everything for performance reasons it can be done today.

This is _not_ only necessary for performance reasons. The main reason I want a feature like this is for alloc-less embedded development.

I do agree with a lot of the rest of your points. There's a reason this isn't auto-generated enums, as I mentioned a long time ago, and as @DataAnalysisCosby linked to, this can be alternatively implemented via a union + vtable.

I don't think dyn Trait fits as a syntax though. dyn Trait already has a meaning, it's an unsized dynamic trait object. Just because they cannot today be used without hiding it behind something that can stored the associated size data like Box or &mut doesn't mean they will never exist as a directly usable type.

Nemo157 on 12 May 2018

@Ekleog

I actually wasn't aware of the "as at the return site" syntax. Maybe we would want that to work. But, unlike Box<(dyn) Debug>, dyn Debug isn't actually a named type. The difference there is worth thinking about at least; there's not currently any as <not-a-type-name> that I'm aware of.

I'm not sure I understand the question about interaction with ?.

It would be sort of confusing for boxing a dyn Trait to not be identical to Box<dyn Trait>. It seems straightforward for an AGST to roll out into a boxed trait object by pulling out its individual variants though. i.e. auto-implement Into<Box<dyn Trait>> for the AGST and then warn if it's explicitly boxed. Or it could even be rolled out implicitly if people are comfortable with it.

@Nemo157

This is not only necessary for performance reasons. The main reason I want a feature like this is for alloc-less embedded development.

Noted, but I don't think this actually changes anything I said. You can still avoid the boxing today. And if you don't have an allocator it's even more essential!

I don't think dyn Trait fits as a syntax though. dyn Trait already has a meaning, it's an unsized dynamic trait object. Just because they cannot today be used without hiding it behind something that can stored the associated size data like Box or &mut doesn't mean they will never exist as a directly usable type.

Maybe. It's not clear to me what you'd ever use dyn Trait for as an actual type. Has anyone proposed anything that would make sense?

khuey on 12 May 2018

@khuey I like the 7 as dyn Debug syntax a lot. I've already proposed usage of the dyn keyword above but had to conclude that dyn Trait is something different than the sum types that are discussed here. It is however very likely that Rust will one day support the syntax you're proposing, not with sum types, but with real unboxed, dynamically sized dyn Trait values on the stack. It will be without heap allocation. It will require less or equal memory compared to sum types. But, it won't work inside async functions across yield points because all types of values that are stored in the state machine need to be statically sized.

MajorBreakfast on 12 May 2018

@khuey

Maybe. It's not clear to me what you'd ever use dyn Trait for as an actual type. Has anyone proposed anything that would make sense?

https://github.com/rust-lang/rust/issues/48055 has been in the works for a while.

But even without that, I would consider it far too confusing to have dyn Trait mean both "trait object" and "autogenerated sum type". There are proposals for using the dyn keyword in other ways, such as [x; dyn y] for an array allocated on the stack with a fixed but not-known-until-runtime size y, but that doesn't involve putting a trait name after dyn so it's clearly something different.

Regarding the rest of https://github.com/rust-lang/rfcs/issues/2414#issuecomment-388571136, @khuey most of that looks identical to what I thought was the dominant proposal already; the only difference I can see is the use of dyn as the marker. Were there meant to be any other new/different suggestions in there, or just a summary of what we've come up with so far?

@Ekleog

So to sum up my opinion (and I'm more and more feeling like I'm alone in this opinion, so won't bother you much longer with it :)):

I think a marker, either at return-site or at return-type, is necessary

I _think_ everyone agrees with this... or at least I really hope everyone does.

I think the return-type-marker approach has the inconvenient of not explicitly saying where types start being enum-ified, and as such will likely generate surprising behaviour that'll implicitly leak the internals of the compiler (eg. “typing occurs at the AST phase”), thus hindering learnability

Now this part truly baffles me, because this is exactly why I've been advocating the opposite: putting the marker on the types makes it pretty obvious where the enum-ified types are. They're where the markers are. Putting them anywhere else immediately makes it less than obvious. I'm... not sure what else could be said about this.

I think the return-site-marker approach has the advantage of being explicitly “from the marker up to the next type enforcement, the type is enum-ifiable with other enum-ifiable types”

I still don't see what the problem is with the surface syntax implying that these autogenerated enum are nested rather than flattened. Nested enums aren't evil. I don't think anything that's been proposed so far would prevent the compiler from flattening some autogenerated enums as an optimization (assuming that even makes a difference; I don't recall seeing any evidence that it would), and if you're using this feature in the first place you shouldn't care that much about the precise layout of the enums getting generated. When the precise layout is a big deal, just write the type by hand.

The return-type-marker approach is roughly as verbose as the return-site-marker approach as soon as we start considering non-trivial examples where everything is not just coming from a single match (I'd like to say “more verbose” but would likely be contradicted with an example I didn't think of)

I assume what you're referring to is the subset of your comment where you argue that a match arm with an if else expression would need to be transformed into a block with a let statement just so the marker could be applied to it. I do agree that this would be an ergonomic showstopper, but that seems only slightly worse to me than your proposal that we mark the return sites rather than the types, and I'm not seeing what's wrong with simply making the feature "work across the match-if example". I don't buy that "but then ... error reporting for the whole rust compiler becomes a mess when an actual typing error occurs" because error reporting with unnamed types is going to be a challenge no matter what rules we choose for the syntax, and I really don't think that challenge is intractable unless you have several layers of autogenerated enums within a single function, in which case your function is probably far too long anyway.

Ixrec on 12 May 2018

👍2

rust-lang/rust#48055 has been in the works for a while.

I don't see dyn in the unsized values rfc at all but maybe I'm missing something ...

Anyways, if Rust does eventually support unboxed, dynamically sized trait objects on the stack, isn't an auto-generated sum type just a Sized reification of that?

khuey on 12 May 2018

Anyways, if Rust does eventually support unboxed, dynamically sized trait objects on the stack, isn't an auto-generated sum type just a Sized reification of that?

Probably not. The deep, fundamental difference between trait objects and enums is that the set of concrete types that a trait object might be wrapping is open, and not necessarily known to the compiler, while enums must have all variants known to the compiler. In theory, the surface syntax of trait objects could be compiled down to enums as an optimization if the compiler happened to know all the concrete types, though I have no idea if that would be practically useful. Is that optimization what you're trying to propose?

I don't see dyn in the unsized values rfc at all but maybe I'm missing something ...

I think the only reason dyn doesn't show up in that RFC is because it predates the use of dyn syntax for trait objects (judging by the "Alternatives" section, it also predates the suggestion of [x; dyn y] syntax for alloca'd arrays). But trait objects are unsized, so it would apply to them, unless I'm deeply misunderstanding something.

Ixrec on 12 May 2018

👍1

@Ixrec

I think the return-site-marker approach has the advantage of being explicitly “from the marker up to the next type enforcement, the type is enum-ifiable with other enum-ifiable types”

I still don't see what the problem is with the surface syntax implying that these autogenerated enum are nested rather than flattened. Nested enums aren't evil. I don't think anything that's been proposed so far would prevent the compiler from flattening some autogenerated enums as an optimization (assuming that even makes a difference; I don't recall seeing any evidence that it would), and if you're using this feature in the first place you shouldn't care that much about the precise layout of the enums getting generated. When the precise layout is a big deal, just write the type by hand.

I'm not seeing issues with nested enums (well, I am, but that's not the reason why I put it here because I know they can be fixed in other ways). The reason why I'm saying this here is to balance the previous point about learnability of the marker. (and I'm replying to your reply about it just below)

Now this part truly baffles me, because this is exactly why I've been advocating the opposite: putting the marker on the types makes it pretty obvious where the enum-ified types are. They're where the markers are. Putting them anywhere else immediately makes it less than obvious. I'm... not sure what else could be said about this.

I must be misunderstanding something in your learnability argument.

The main argument I'm trying to say is this:

let a: marker impl Trait = match … { … }

This variable a is a variable that will be an auto-generated enum. But from what? I can't know until I look down into the match.

And now, if I look into the match, and see this:

let a: marker impl Trait = match f() {
    A => if g() { foo() } else { bar() },
    B => quux(),
}

Then I need to know whether foo() and bar() are being enum-ified or not.

And the answer to this question leads to the issues about having to know the compiler internals I was trying to put forward towards the end of my reply to @Paluth:

If they are not being enum-ified, then it's a pain to write (see my example introducing another let binding just to fix that), and it would make more sense to put the marker on the match than on the return type anyway (which is indeed another possibility we forgot to consider)
If they are being enumified, there are two possibilities:
- Either the rust compiler enum-ifies at all match/if/etc., this sounds stupid, as eg.
  
  rust let foo = if f() { bar() } else { baz() }; function_expecting_bar(foo);
  
  would fail with an error at the function_expecting_bar call while it should have failed at the if
  - Or the rust compiler enum-ifies only the match/if/etc. that are downwards in the AST from the : marker impl Trait, and:
    - Some stuff start to make no sense:
      
      rust // Does not compile fn foo() -> marker impl Trait { let a = if foo() { bar() } else { baz() }; a } // Compiles fn foo() -> marker impl Trait { if foo() { bar() } else { baz() }; }
    - Some error messages become weird:
      
      rust fn foo() -> marker impl Trait { match f() { Foo => { let bar = if x() { y() } else { z() }; function_expecting_Y(bar) } Bar => bar(), } }
      
      encounters the same issue as the function_expecting_bar above
    - Code refactoring becomes weird, as things no longer have the same meaning in the AST “below a return point” and everywhere else (see the last code example in my reply to @Paluth from my last comment for an example)

Anyway, this requires some additional knowledge of how the compiler works to know the answer to “from what point until the marker are types enum-ified?”

(BTW, at least in my Future-based use case for this, I'm returning from nested matchs and if/then/else, so I can say that it's a real-world use case)

On the other hand, the return-site approach has an easy answer to that: the types are enum-ified from the marker until the next point where a type is forced.

Let's discuss verbosity once this discussion about the exact semantics of the marker-at-return-type (and the drawbacks of it) is closed :)

Ekleog on 12 May 2018

@Ekleog One big problem I see with the "TypeInProgress" you're proposing is that its finalization is not explicitly marked. What happens if it should be passed as an argument to a function call instead of being returned?

fn foo() -> impl Trait {
    let a = if cond1() { marker f1() } else { marker f2() };
    f3(&a); // Used in function call, finalize type here?
}

To me it feels very different than the constraint based type inference that the compiler does today.

MajorBreakfast on 12 May 2018

@MajorBreakfast Yes, I think it should finalize if passed to a function call, in most cases. Exactly like with integers, actually:

fn foo(_: usize) {}
fn bar<T>(_: T) {}
// ...
let a = 3;
foo(a);
// a has type `usize`
let b = 3;
bar(b);
// b still has type `{integer}`

See https://play.rust-lang.org/?gist=97ad7a96d24e85f8f097cd513dcda3c5&version=stable&mode=debug

In order to simplify the thing, I think for a first draft any constraint imposed should finalize the type, eg. the bar function above would not finalize, but fn baz<T: Debug>(_: T) would. This is not exactly the same as with integers (where the type would stay {integer}), but would likely be much simpler to implement.

That said, it would likely be possible to not finalize the enum but just add a constraint on it (ie. that it must be Debug). Actually, that's similar to what rustc does for integers, and it seems to even partially succeed at it:

trait MyTrait {}
impl MyTrait for usize {}

fn bar<T: MyTrait>(_: T) {}

fn main() {
    let a = 3;
    bar(a);
    // Here a has type `usize`
}

See https://play.rust-lang.org/?gist=2226c34c8b98ee8ac936b7190f779961&version=stable&mode=debug

But not perfectly:

trait MyTrait {}
impl MyTrait for usize {}
impl MyTrait for isize {}

trait MyTrait2 {}
impl MyTrait2 for usize {}
impl MyTrait2 for u32 {}

fn bar<T: MyTrait + MyTrait2>(_: T) {}

fn main() {
    let a = 3;
    bar(a);
    // Here `a` is still {integer}
}

See https://play.rust-lang.org/?gist=194af36fb8e18a53b3e8a1874975ec24&version=stable&mode=debug

(well, actually I guess rustc internally has enough information to enforce that a has type usize, but it's not displayed)

Basically, the marker at return-site would be approximately like a generalized {integer} from a compiler point of view, or so I think. :)

Ekleog on 12 May 2018

I'm observing that dyn Trait consists of a vtable pointer and a T: Trait @alexreg so it need not be a DST if we do not export Trait and thus only have a few fixed T: Trait.

At that point, we likely have an implicit conversion from a T: Trait to a dyn Trait anyways, so almost by necessity the syntax takes the form mentioned by @Ekleog

fn foo(a: bool) -> impl Debug {
    if a {
        7 as dyn Debug
    } else {
        "Foo"
    }
}

Avoiding this syntax would require tweaking trait objects, ala dyn 7 and dyn "Foo", which sounds unrealistic. We do have markers here in the as dyn Debug, but only at the type level, so not at each site.

There is one important caveat however: Is foo::Output : Sized? In principle yes. But what about dyn MySummand in

trait MySummand { }
impl MySummand for X {}
impl MySummand for Y {}
fn foo(a: bool) -> impl Debug+MySummand { ... }
fn foo(a: bool) -> impl Debug+MySummand { ... }

Again presumably yes but stuff could get weird.

burdges on 12 May 2018

@Ekleog The following code is from your comment here:

fn foo() -> impl Trait {
    let a = if cond1() { Ok(marker f1()) } else { f2().map(|x| marker x) };
    if cond2() { a.unwrap() } else { marker f3() }
}

(Edit: @Ekleog mentions in his next comment that he inadvertently introduced a little mistake in this code. I've now changed f3().map(|x| marker x) to just marker f3() to fix it.)

Why does neither map() nor the closure finalize marker x?

MajorBreakfast on 12 May 2018

Anyways, if Rust does eventually support unboxed, dynamically sized trait objects on the stack, isn't an auto-generated sum type just a Sized reification of that?

Probably not. The deep, fundamental difference between trait objects and enums is that the set of concrete types that a trait object might be wrapping is open, and not necessarily known to the compiler, while enums must have all variants known to the compiler. In theory, the surface syntax of trait objects could be compiled down to enums as an optimization if the compiler happened to know all the concrete types. Is that what you're trying to propose?

I'm trying to salvage my proposal :)

I read and I think I understand the RFC now. AIUI, the intended use case for dyn Trait is roughly:

fn passed_by_value_without_boxing_or_monomorphization(f: dyn FnOnce()) {
  f()
}

let x = || {
  // Stuff that makes this FnOnce.
};

passed_by_value_without_boxing_or_monomorphization(x);

Ok, fair enough. Let's think out loud for a bit. So I can also do this:

fn foo(a: bool) {
  let x = || {
    // Stuff that makes this FnOnce.
  };

  let y = || {
    // Different stuff that makes this FnOnce.
  };

  if a {
    passed_by_value_without_boxing_or_monomorphization(x);
  } else {
    passed_by_value_without_boxing_or_monomorphization(y);
  }
}

But if I can do that, I should probably be able to do this too.

fn foo(a: bool) {
  let x = if a {
    || {
      // Stuff that makes this FnOnce.
    }
  } else {
    || {
      // Different stuff that makes this FnOnce.
    }
  };

  passed_by_value_without_boxing_or_monomorphization(x);
}

Presumably x here will need the same monomorphizing (type erasing? not entirely sure what to call this) annotation either on the let binding or the as blah syntax that is necessary for everything else. So that's really:

fn foo(a: bool) {
  let x: dyn FnOnce() = if a {
    || {
      // Stuff that makes this FnOnce.
    }
  } else {
    || {
      // Different stuff that makes this FnOnce.
    }
  };

  passed_by_value_without_boxing_or_monomorphization(x);
}

Notably, x is not Sized. It doesn't have a constant size known at compile time. But the compiler does know all possible types and sizes it can have. For the moment, let's call that set of knowledge AllVariantsKnown. Contrast that with f inside passed_by_value_without_boxing_or_monomorphization. That function can be called with any dyn FnOnce() trait object, so we don't know all variants there.

Constructing an auto-generated sum type now is equivalent to having a compiler builtin

fn build_auto_generated_sum_type<X: Trait + AllVariantsKnown>(x: X) -> impl Trait + Sized

So perhaps at the end my original example looks like

fn foo_agst(a: bool) -> impl ::std::fmt::Debug
{
    let b: dyn ::std::fmt::Debug = if a {
        7
    } else {
        "Foo"
    };

    build_auto_generated_sum_type(b)
}

(Obviously we can bikeshed what build_auto_generated_sum_type looks like :) )

Also note that we can solve things like @Ekleog's loop example by being conservative in what we mark as AllVariantsKnown.

khuey on 13 May 2018

@MajorBreakfast

fn foo() -> impl Trait {
    let a = if cond1() { Ok(marker f1()) } else { f2().map(|x| marker x) };
    if cond2() { a.unwrap() } else { f3().map(|x| marker x) }
}

Why does neither map() nor the closure finalize marker x?

Well, for the map and unwrap, these come under the case of the fn bar<T>(_: T) function from my previous message: they impose no bound on the type, thus do not finalize the enum.

Here, if I type-annotate, we'd have something like (hope it's understand-able):

trait Trait {}
fn f1() -> T1 {…} impl Trait for T1 {}
fn f2() -> Result<T2, E> {…} impl Trait for T2 {}
fn f3() -> T3 {…} impl Trait for T3 {}

fn foo() -> impl Trait {
    let a = if cond1() {
        Ok((marker (f1(): T1)): TypeInProgress(T1)): Result<TypeInProgress(T1), _>
    } else {
        (f2(): Result<T2, E>).map(
            (|x: T2| {
                (marker (x: T2)): TypeInProgress(T2)
            }): Fn(T2) -> TypeInProgress(T2)
        ): Result<TypeInProgress(T2), E>
        // Here, map() takes a Fn(T2) -> Whatever and returns Result<Whatever, E>,
        // without any bounds on Whatever, so we're saved by the
        // no-bound-means-no-finalization rule
    }: Result<TypeInProgress(T1&T2), E>;
    if cond2() {
        (a: Result<TypeInProgress(T1&T2), E>).unwrap(): TypeInProgress(T1&T2)
        // Same here, unwrap() imposes no bound whatsoever on the type, so TypeInProgress stays
    } else {
        (marker (f3(): T3)): TypeInProgress(T3)
    }: TypeInProgress(T1&T2&T3)
}

(Actually I've changed the code around f3 a bit, as it made no sense and I had misread your example)

For the closure, it's a harder question. I think that, as the closure has no return type, then it should be typed following the no-bound-means-no-finalization rule. Actually, that's also what rustc already does with {integer}:

fn main() {
    let c = || 3; // Fn() -> {integer} ; even if I can't get rust to display this type
    let () = c(); // expected type `{integer}`, found type `()`
}

See https://play.rust-lang.org/?gist=e55518d52ce9154962259e970f266754&version=stable&mode=debug

So I don't think these should be big issues 🙂

Ekleog on 13 May 2018

@Ekleog Another question: You've inserted the marker into the Result by calling the map method. What if such a method does not exist?

@Ekleog You've previously mentioned that the marker-at-return-type syntax has a problem with nested if or match expressions:

let a: marker impl Trait = match foo() {
    Foo1 => if bar() { baz() } else { quux() },
    Foo2 => iwantmorenames(),
}

It's good that mention this case. I, however, do not agree with your conclusion. Even today the compiler ensures that all three are of the same type. This means the compiler is able to handle nested expressions. It follows that it can be made smart enough to create an enum with the appropriate amount of variants. No additional markers needed.

MajorBreakfast on 13 May 2018

@MajorBreakfast

First, the answer about if-nested-in-match, as it's shorter: See here for a better explanation of why the marker-at-return-type syntax is a problem for syntactic reasons, and my reply to @Paluth from here for why I use the AST as a basic building block :) (this blog post also appears to confirm that typing is done at HIR ~= AST level currently)

Now, the “what if there is no map method?” question. Actually, I'd argue it's better not to be able to auto-enumify in this situation, and marker-at-return-site handles this situation better.

For instance, let's consider Vec (that turns out to have a map equivalent by .iter().map().collect()).

With marker-at-return-type, I'd write things like:

fn f1() -> Vec<A> {…}
fn f2() -> Vec<B> {…}

fn foo(x: bool) -> Vec<marker impl Trait> {
    if x { f1() } else { f2() }
}

I… do not want this to compile. Because rustc has no way to know how to properly map the auto-generated enum into the Vec: it requires re-allocation because the size will change, etc. Same thing for HashMap, where especially marker-ing the key would in addition require re-keying as the hash would change. For Ref, it's just not possible to wrap the inner type in an enum without also owning the RefCell.

So this argument would actually almost rule out marker-at-return-type (well, it could be specified to not handle cases where the type is hidden in an *, but I'm not sure that'd solve all the issues, and then mapping the marker into it becomes even more painful than with marker-at-return-site as you have to type-annotate your mapping function).

On the other hand, marker-at-return-site would handle it pretty well, using the .iter().map().collect() idiom, which is the lowest-possible-overhead way to say the Rust compiler how to map the enum over the Vec.

Also, if there is no .map()-like method on the type, then it means that the type is not meant to be mapped on. And so allowing to implicitly map an enum into such a type I got from a function sounds like a Bad Idea™ to me, and if it's not a type I got from a function, I could just do like the Ok(marker foo()), ie. add the marker at object creation time :) (that said I may be missing some cases for which implicit enum-ization without a map()-like function would make sense)

Ekleog on 13 May 2018

On the other hand, marker-at-return-site would handle it pretty well, using the .iter().map().collect() idiom

Can you give an example?

About learnability and readabiliy: We disagree there. You say that it's hard to see what types are enumified because there are no markers at the return points. I, however, don't agree with that assessment because in normal if and match expressions without enumification returning values looks exactly the same. Learnability of marker-at-return-type is IMO better because we can clearly see where the enumification happens.

MajorBreakfast on 13 May 2018

@MajorBreakfast

Sure:

fn f1() -> Vec<A> {…}
fn f2() -> Vec<B> {…}

fn foo() -> Vec<impl Trait> {
    if bar() {
        f1().into_iter().map(|x| marker x).collect()
        // works because neither map() nor collect() impose any bound on the TypeInProgress type
    } else {
        f2().into_iter().map(|x| marker x).collect()
    }
}

This is incidentally exactly the same syntax when going from Vec<Box<T>> to Vec<Box<Trait>>.
BTW, @Nadrieril proposed forward on IRC that marker x be written x as enum impl Trait or x as enum _ so that this parallel becomes obvious. (let's try not to discuss that right now, it was just an on-the-fly comment)

How would you handle such an enum-ification with marker-at-return-type?

About learnability and readability, can I just know which of the three options I gave in https://github.com/rust-lang/rfcs/issues/2414#issuecomment-388582513 for marker-at-return-type you consider easy-to-learn? Or maybe another one I haven't thought of?

Ekleog on 13 May 2018

How would you handle such an enum-ification with marker-at-return-type?

Not sure if it can be done. Your notation uses .map(|x| marker x) to convert from the vector item type into to sum type. Seems like a clean way to do this. I can't think of a way to integrate this step into the marker-at-return-type system.

I'm beginning to like this solution. I'd be really interested if someone of the compiler team could comment on whether the "TypeInProgress" system is technically possible.

BTW the name "TypeInProgress" kinda suggests that the type could dynamically change. But, this is not the case, right?. It's a static type known at compile type. Maybe there's a better name for it.

MajorBreakfast on 13 May 2018

Marker suggestion for marker-at-return-site style: ~~fuse~~ unify
(Edit: fuse is already a method on Iterator. Making it a keyword is impossible. I've change the keyword in the code example below from fuse to unify)

fn foo1() -> impl Trait {
    if cond() { unify f1() } else { unify f2() }
}

fn foo2() {
    let a: impl Trait = if cond() { unify f1() } else { unify f2() };
}

fn foo3() -> Option<impl Trait> {
    if cond() { Some(unify f1()) } else { Some(unify f2()) }
}

It's short
It's descriptive: A sum type is essentially the unification of many types with one or more shared traits
Reserving it as a keyword is possible because it's probably not used often (unlike sum for instance)
It's not the enum keyword which is good because sum types don't work like enums (no matching, implement the trait(s) instead)

MajorBreakfast on 13 May 2018

👍1

Hmm, for the TypeInProgress name, indeed that's a treacherous name. Maybe it'd be better named with brackets like closures (the other autogenerated type), [sum T] for marker (t: T) and [sum T|U] for the fusion of [sum T] and [sum U]? (I first thought of {T} for similarity with {integer}, but the user could declare a integer type, and {integer} would then be ambiguous)

Then I'm not sure it matters a lot: it's a type that can never appear in code anyway, so it'd only be used in compiler error message, just like for closure types :)

~~As for the choice of the marker, I must say I don't really care, and fuse looks nice to me :) (and reserving it as a keyword will be made easier by epoch 2018 coming soon)~~

Edit: Actually not: fuse is a method on Iterator, so we unfortunately can't make it a keyword.

Ekleog on 13 May 2018

Edit: Actually not: fuse is a method on Iterator, so we unfortunately can't make it a keyword.

Damn ^^'

Alternatives: http://www.thesaurus.com/browse/fuse?s=t

MajorBreakfast on 13 May 2018

Various keywords were analysed in RFC2388 (try-expr).unify is one of them

MajorBreakfast on 13 May 2018

unify looks like a nice keyword, and has also been shown to not break anything. :+1: I also liked coalesce in this list, but it's used by itertools.

Ekleog on 13 May 2018

I've updated the code example above to use unify. Looks good IMO.

MajorBreakfast on 14 May 2018

👍2

Procedural macros on the function you want to have the types automatically summed I think is the best way to go about doing this. We can do this by creating a type local to the function and, somewhat ironically, impl trait.

This code works in stable:

trait Test {
    fn method(&self) -> u32;
}

struct A {
    x: u32,
}

impl Test for A {
    fn method(&self) -> u32 {
        self.x
    }
}

struct B {
    x: u32,
}

impl Test for B {
    fn method(&self) -> u32 {
        self.x + 1
    }
}

fn test(b: bool) -> impl Test {
    enum Inner {
        A(A),
        B(B)
    }
    impl Test for Inner {
        fn method(&self) -> u32 {
            match self {
                Inner::A(a) => a.method(),
                Inner::B(b) => b.method()
            }
        }
    }
    if b { 
        Inner::A(A { x: 10 })
    } else {
        Inner::B(B { x: 10 })
    }
}

Because this is a strict transformation of the function, and the numerosity and type of the AST nodes has not changed by the transformation, this can be implemented with a proc macro attribute like the following:

#[auto_sum]
fn test(b: bool) -> impl Test {
    if b { 
       A { x: 10 }
    } else {
       B { x: 10 }
    }
}

maplant on 15 May 2018

❤1

It "works on stable" if you manually write everything out. But there's no way to write a procedural macro for doing this with arbitrary traits. Even if there were, this auto_sum proc macro attribute would require introducing a function boundary at any point you want to construct one of these types.

khuey on 15 May 2018

👍2

It can only be implemented as a procedural macro for a fixed number of traits, as if I understood @Nemo157's argument correctly there is no way to get the list of functions for a trait from a procedural macro.

The end-result you mention is however approximately what we're looking for, modulo the lengthy discussion about the preferred syntax, that you can read to forge your opinion and that I currently consider as in favor of marker-at-return-site unless the concerns of https://github.com/rust-lang/rfcs/issues/2414#issuecomment-388622263 and https://github.com/rust-lang/rfcs/issues/2414#issuecomment-388582513 can be addressed by marker-at-return-type proponents :)

Ekleog on 15 May 2018

👍2

The crate named enum_derive used to have a feature named #derive[EnumInnerAsTrait] which can simplify the latter half of DataAnalysis's example a little. IIRC The crate is presently defunct, so I haven't tested the following, but I guess it can still serve as an inspiration

~~~ rust

[derive(EnumInnerAsTrait(as_test -> &Test)],

enum Inner {
A(A),
B(B)
}

impl Test for Inner {
fn method(&self) {
self.as_test().method()
}
}

fn test(b: bool) -> Inner {
let inner = if b {
Inner::A(A { x: 10 })
} else {
Inner::B(B { x: 10 })
}
}
~~~

nielsle on 15 May 2018

❤1 👍1

Hmm… I don't see how that can be used to implement this feature as a proc macro? The issue mentioned just above, that is “you can't access the list of methods of the traits,” is still there.

I mean, if you have a proc macro wrapping a function, then it can access this function and little more. In particular, it cannot query the Rust compiler to know which method it should put in the impl Test for Inner, and it doesn't have access to the definition of Test.

The proc macro EnumInnerAsTrait can perfectly be implemented without any access to the trait method list: it just needs to access the enum variants (which are in the enum being wrapped, so accessible), and the name of the trait (so that it can do the as &Trait cast).

So, even though it'd be really nice to have this feature implemented as a proc macro, I still don't think it's possible, as lines 7-11 of your code sample (the impl Test for Inner) can't be generated by a proc macro. (oh, and your example also assumes the method takes &self, a fact that can't be known by the proc macro either)

Ekleog on 15 May 2018

I was thinking you could do something like this:

fn test(b: bool) -> impl DerefMut<Target = Test> {
    enum Inner {
        A(A),
        B(B)
    }
    impl Deref for Inner {
        type Target = Test;

        fn deref(&self) -> &Test {
            match self {
                Inner::A(a) => &a,
                Inner::B(b) => &b
            }
        }
    }
    impl DerefMut for Inner {
        fn deref_mut(&mut self) -> &mut Test {
            match self {
                Inner::A(a) => &mut a,
                Inner::B(b) => &mut b
            }
        }
    }
    if b { 
        Inner::A(A { x: 10 })
    } else {
        Inner::B(B { x: 10 })
    }
}

and get most of the way there, but it seems that this code requires a lifetime for Test, so I'm wrong.
Maybe you could do AsRef.

maplant on 16 May 2018

Also, it only supports functions that use only a reference, eg. not Future's multiple combinators that take self by value (which is unfortunately my primary use case, although I guess other use cases exist that could be helped by such a proc macro)

Ekleog on 16 May 2018

@joshtriplett As you already reacted to this issue, would you happen to know whether https://github.com/rust-lang/rfcs/issues/2414#issuecomment-388588661 is possible to implement, or to know who could answer this question? I'll be able to start writing the RFC soon, and would just like to check whether this looks sane or not to someone familiar with rustc's type inference :)

If there is any need for details I'm available on Mozilla's IRC as ekleog, both on #rust and #rust-internals. :)

Ekleog on 21 May 2018

Not necessarily a show stopper, but I think it should be mentioned. Doing this with unsafe traits would be unsafe. E.g. it would cause problems if used with FixedSizeArray.

gmorenz on 21 May 2018

👍2

@Ekleog Possible? Perhaps; you'd need a compiler expert to say how hard that'd be, and I'm definitely not an expert on the guts of the compiler.

That said...my initial reaction from a Rust language perspective is that I find myself really not wanting to see something like this built into the compiler. Into a macro, perhaps, but not into the language itself. Every example I see of this feels like something that'd work more cleanly with an explicitly derived trait.

Right now, it's a pain to take an arbitrary trait and create a sum type that also implements that trait. As an alternative to this approach, suppose that something like the following existed (handwaving on syntax):

#[enum_derive_delegate(Trait)]
enum YourSumType {
    T1(T1),
    T2(T2),
    T3(T3),
}

This syntax would automatically generate an implementation of Trait for YourSumType that delegates to the implementations for T1, T2, and T3, generating an error if any of those types didn't implement Trait.

If you had that, and could write it that simply without requiring an explicit implementation, would that suffice? Would that make it simple enough to handle the cases that you have in mind for this feature?

joshtriplett on 21 May 2018

👍4

@joshtriplett Hmm, I don't think with this alone it'd solve the maintenance burden eg. this is. (still have to change numbers and adapt everything at every time a return point is added/removed)

However, with this, it may become possible to write a proc macro that would handle this issue, by a process like:

#[coalesce_returns]
fn foo() -> impl Trait {
    if bar() { coalesce baz() }
    else { coalesce quux() }
}

That would turn into

fn foo() -> impl Trait {
    #[enum_derive_delegate(Trait)]
    enum AnonymousSumType<T1: Trait, T2: Trait> {
        T1(T1),
        T2(T2),
    }

    if foo() { AnonymousSumType::T1(baz()) }
    else { AnonymousSumType::T2(quux()) }
}

Thank you for this idea!

However, the issue of this solution is that it would lose something it still couldn't do: it could only handle coalescing types together at points where the expected traits are explicitly mentioned (so no possibility of using it transparently as a function call argument, unless rustc at some point allows proc macros to query the type of symbols -- which afaik isn't even being thought of for the time being).

Also, and more importantly, I'm not sure how it could handle eg. the try! macro (and try!-like macros), that couldn't just be changed to embed the marker at the Err return point. @Nemo157, you appear to know proc macros much better than I do, do you think it'd be possible to handle this case cleanly with this added helper?

Ekleog on 22 May 2018

👍2

@joshtriplett as @Ekleog said this doesn't solve the issue of churn when changing the number of return types. In terms of being used as part of another macro like #[coalesce_returns] I don't think it really gains you anything over the raw delegation feature that it would need to use under the hood (unless you're proposing this as a compiler provided attribute to avoid the dependency on the vastly extended delegation feature it would require?).

Basically, from what I remember of the discussions of the extended delegation support, this would expand to something like

impl Trait for YourSumType {
    delegate * to {
        |self| match self { T1(t1) => t1, T2(t2) => t2, T3(t3) => t3 };
        |&self| match *self { T1(ref t1) => t1, T2(ref t2) => t2, T3(ref t3) => t3 };
        |&mut self| match *self { T1(ref mut t1) => t1, T2(ref mut t2) => t2, T3(ref mut t3) => t3 };
    }
}

which is pretty trivial to generate inside #[coalesce_returns] itself.

EDIT: Although that misses out on arbitrary self type support, which would block use on a PinMut taking Future trait, not sure how that could be supported in this form, but that's a question for the unposted delegation RFC rather than this one.

Nemo157 on 22 May 2018

Also, and more importantly, I'm not sure how it could handle eg. the try! macro (and try!-like macros), that couldn't just be changed to embed the marker at the Err return point. @Nemo157, you appear to know proc macros much better than I do, do you think it'd be possible to handle this case cleanly with this added helper?

I'm not sure if there is, unless there's a way to use Into::into as the marker in those cases. I have had thoughts about using a type-level-cons-list based sum type rather than an enum that might be capable of implementing From, frunk may have some techniques for implementing something like this that I want to investigate at some point, but not sure when I may find time.

Nemo157 on 22 May 2018

One thought I had in terms of an in-compiler implementation was using a generated function signature to replace the marker during type checking, post type checking of the function the body of this generated signature could then be created based off the results of type checking (I don't know how the compiler works so not sure if this is implementable in this order though).

Given a function like

fn foo() -> impl Trait {
    if bar() { coalesce baz() }
    else { coalesce quux() }
}

The first step would be to insert a body-less function signature to do the coalescing:

fn foo() -> impl Trait {
    existential type __Coalesced: impl Trait;
    fn __coalesce(x: impl Trait) -> __Coalesced;

    if bar() { __coalesce(baz()) }
    else { __coalesce(quux()) }
}

(the explicit existential type is required to avoid the output type depending on the input type parameter)

This _should_ allow fully type-checking the body of foo, and would at the same time acquire a list of all the type parameters that __coalesce would need to monomorphise over. This list could then be used to generate the enum and body of the function (the body of this function is not implementable in Rust, but maybe the post-monomorphisation variants of the function could be directly generated instead).

fn foo() -> impl Trait {
    existential type __Coalesced: impl Trait;
    enum __ConcreteCoalesced { _0(Baz), _1(Quux) };
    impl Trait for __ConcreteCoalesced { delegate * to ... }
    fn __coalesce(x: Baz) -> __Coalesced { __ConcreteCoalesced::_0(x) }
    fn __coalesce(x: Quux) -> __Coalesced { __ConcreteCoalesced::_1(x) }

    if bar() { __coalesce(baz()) }
    else { __coalesce(quux()) }
}

_{OT: Sorry about the chain-posting, but I thought it better to separate these different ideas into separate comments.}

Nemo157 on 22 May 2018

❤2

Clearly it must not be implicit; that would violate the principle of an _existential_.

I don’t actually know whether I like the idea. I had the problem lately and came to the realization that it showed me two things:

I wasn’t using the same types (the rustc error was really, really misleading :D)
A possible design improvement.

My implementation was showing me that I really wanted to return things with different types. That disjonction was important and needed to leak into the public interface.

fn whatever(a: f32) -> impl ::std::fmt::Debug {
    if a < 10. {
        "ohai!"
    } else {
        42u32
    }
}

This gives the error message. Now, two possibilities:

// (1)
#[derive(Debug)]
enum Either<A, B> {
    Left(A),
    Right(B)
}

fn whatever(a: f32) -> impl ::std::fmt::Debug {
    if a < 10. {
        Either::Left("ohai!")
    } else {
        Either::Right(42u32)
    }
}

Or:

// (2)
#[derive(Debug)]
enum Either<A, B> {
    Left(A),
    Right(B)
}

fn whatever(a: f32) -> Either<impl ::std::fmt::Debug, impl ::std::fmt::Debug> {
    if a < 10. {
        Either::Left("ohai!")
    } else {
        Either::Right(42u32)
    }
}

I don’t know yet whether generating sum types on the fly would be a good idea, because you might miss the design opportunity by just solving the problem quickly.

phaazon on 25 May 2018

@phaazon

Clearly it must not be implicit; that would violate the principle of an existential.

As far as I can tell everyone that chimed in so far agrees on this because the generation of a sum type isn't something trivial and, as you say, quite different from how impl Trait in return position usually works.

because you might miss the design opportunity by just solving the problem quickly.

The definition of the sum type is usually only tedious, but not interesting. Here are two code examples that were posted so far (sorry if I missed a good one).

transport.rs (posted by @Nemo157 in this comment)
stupidfut.rs (posted by @Ekleog in this comment).

Sum types would make these code passages a lot shorter and they would eliminate copy/paste errors.

See the comment by @Nemo157 directly above yours for a sample of the currently desired syntax for this feature.

(Tip: You can start code blocks with ```rust to enable syntax highlighting)

MajorBreakfast on 26 May 2018

👍1

@alexreg I think the use of this feature should be an implementation detail of the body of the function and should not "leak" into the function signature! But after reading the proposal/first post I also immediately thought that impl enum would be a suitable syntax to express this. Not in the function signature but in the function body where the values are to be unified (into an enum that the trait should be implicitly impl'd for), like this:

fn foo(x: bool) -> impl Iterator<Item = u8> {
    if x {
        impl enum b"foo".iter().cloned()
    } else {
        impl enum b"hello".iter().map(|c| c + 1)
    }
}

The disadvantage is having to write impl enum N times (as many times as there are enum cases) vs once, but I think it should not leak into the signature, because it should be treated like it's syntactic sugar for a function-local enum that impls this trait.

@Ekleog: Btw, what do you think?

Boscop on 14 Jun 2018

👍1

@Boscop Yep, I was won over to this approach, more or less, some time ago. (I think my change of opinion was lost in all the comments.) I prefer the slightly shorter enum rather than impl enum in the body, but perhaps that creates parsing ambiguities?

alexreg on 14 Jun 2018

@alexreg Yes, with declaring an item:

fn foo() -> impl Debug {
    enum foo {} // an item declaration, not an enum return of a struct literal
}

scottmcm on 14 Jun 2018

@scottmcm Right, thought so... I guess look-ahead parsing is out of the question eh? :-P Well, impl enum is okay with me in that case, I guess.

alexreg on 14 Jun 2018

the reason impl enum should be in the signature is because, it has potential performance costs that could be hidden.

i dont like impl enum before every expression, impl enum is not a verb in any sense however it is being used as an action or calculation.

warlord500 on 14 Jun 2018

@warlord500 The implementer of the function needs to be aware of possible performance costs, not the consumer. That's the whole point.

Yes, it's maybe a bit verbose to write impl enum everywhere (perhaps an impl enum block could help with this?), but the whole idea is to make it explicit at the point of creation of the enum variant, which is where the real concern lies.

alexreg on 14 Jun 2018

👍2

my reasoning against impl enum is that the those two words together are basically meaningless for any english reader. nor would they would be helpful searching for errors in relation to this feature.

unify or coalesce sound much better for this purpose because the are verbs and it would much eaiser to search against. I don't disagree that marking for unification would be a bad idea.

warlord500 on 14 Jun 2018

@warlord500

my reasoning against impl enum is that the those two words together are basically meaningless for any english reader.

But they would be meaningful for someone familiar with Rust.
Like I wrote above, my immediate thought after reading the first post/proposal (before reading any comment) was that impl enum would be a good fit to express the semantics of this:
Because it hints at the 2 concepts that are used here:

an (auto-generated) enum
impl Trait for this enum

One could also memorize this by thinking of the impl in impl enum as a double meaning: it's also an "implicit" enum (implicitly defined) :)

nor would they would be helpful searching for errors in relation to this feature.

impl enum would be a Rust-specific term like impl Trait, which people can also google for.

I agree that writing impl enum before each occurrence is a bit verbose, and I'm interested in searching for a more concise solution (without it affecting the function signature), but I think if possible we should re-use keywords and they should be semantically related to these 2 concepts.

We could also completely "hide" the fact that the resulting implicit type will be an enum, because neither the author of the function body, nor the author of any callee of this function really has to deal with the fact that it's an enum.
Then we could also use one of the following keyword combinations:

impl trait <expr> / impl trait { <expr> }
do trait <expr>
impl do <expr> / do impl <expr>

Actually, is there any good reason why we must make the author of the function body code conscious about the fact that the return type will be an enum?

If not, I'd prefer to leave the enum keyword out of this, and use impl trait <expr> if there's no parsing ambiguity.

Boscop on 14 Jun 2018

do trait <expr>

Can you explain why you would use the do keyword here? To me it makes no sense (I’m a Haskeller, do maps to monadic code to me).

I’d rather be for a new keyword like wrap or if you don’t want adding a new keyword, enum has the closest meaning to what’s going on under the hoods.

With more hindsight, I’m not really fan of this RFC, because it means adding more magic to Rust.

phaazon on 14 Jun 2018

Yeah, I know that do is used for monadic do-notation in Haskell/PureScript but in Rust it's currently only used for do-catch blocks. Honestly I don't feel strongly about using do here but it was the only other keyword that seemed somewhat useable for this situation..

But if there's no good reason to expose the fact that the return type will actually be an enum, I'd prefer the syntax impl trait <expr> if there's no parsing ambiguity.

Boscop on 14 Jun 2018

@Boscop I'm currently leaning towards @joshtriplett's enum_derive_delegate, that would make it possible to implement this as a proc_macro, even though with a few limitations :)

Ekleog on 14 Jun 2018

@Boscop I think it's important the user realises it's implemented as an enum, because of the pattern matching involved... when there's a large number of variants, this can become very inefficient. At some point it can become more efficient to use dynamic dispatch.

alexreg on 14 Jun 2018

I think it's important the user realises it's implemented as an enum, because of the pattern matching involved... when there's a large number of variants, this can become very inefficient. At some point it can become more efficient to use dynamic dispatch.

@alexreg that's what I initially believed as well, but @Ekleog had a good argument against it: https://github.com/rust-lang/rfcs/issues/2414#issuecomment-383630811 (at least in terms of method call overhead, it also affects the memory efficiency patterns quite a bit).

Nemo157 on 14 Jun 2018

👍1

@Ekleog it's already pretty possible to implement this as a proc_macro, with a few limitations.

Nemo157 on 14 Jun 2018

🎉3

@alexreg also, an enum is not the only implementation strategy, it could also be a union + vtable, in which case it is literally dynamic dispatch, just with the data inline instead of being on the heap.

Nemo157 on 14 Jun 2018

@Nemo157 Right, although I'm not sure it's an argument directly against the impl enum syntax. Also, I'm not exactly sure how "union" + vtable would work, but I don't think it's equivalent. Anyway, the important thing is that the user knows they're doing something essentially static vs. dynamic, hence I think impl enum is basically good syntax.

alexreg on 14 Jun 2018

@alexreg: @DataAnalysisCosby linked a post about the union + vtable approach earlier: https://internals.rust-lang.org/t/allowing-multiple-disparate-return-types-in-impl-trait-using-unions/7439/6.

Nemo157 on 14 Jun 2018

@alexreg The biggest argument against impl enum is that you can't match on it. Doesn't really matter how it's implemented, if it can't be matched on then marketing it as an enum is in my opinion misleading. So, to avoid false associations it's best to call it something else. My current favorite syntax is the one in @Ekleog's post and @Nemo157's post.

MajorBreakfast on 14 Jun 2018

👍1

@MajorBreakfast I don't think that's misleading at all. It's just an anonymous enum.

alexreg on 14 Jun 2018

👍2

But the enum is an implementation detail of the codegen, the user code never deals with the fact that it's an enum..
So it's not really misleading, just contains more than the necessary info that people have to care about, and could maybe be confusing / overwhelming.

@Nemo157 That proc-macro is nice but I think we should have support for this in the language so that it would also work for things like -> (impl Foo, Result<impl Bar, impl Baz>) etc.

Boscop on 14 Jun 2018

That proc-macro is nice but I think we should have support for this in the language

@Boscop so do I. The reason I bring it up is that I don't believe enum_derive_delegate meaningfully improves the situation for providing a proc-macro based solution.

Nemo157 on 14 Jun 2018

👍2

@alexreg Let me explain this more: Abstraction is generally agreed upon as a very fundamental programming principle. A good abstraction conveys the information that is of interest and hides implementation details that are irrelevant to the users.

On the outside sum types only implement a specified set of traits. The fact that they're implemented using an enum is an implementation detail. This very similar to the anonymous type returned by an async function which is also essentially an enum. However, this is just "nice to know". Knowing that implementation detail isn't required to use the type. We only care about that async functions return a future.

With sum types it's the same: The enum-ness is completely lost as an implementation detail. Users do not have to care about it. Instead it would just be misleading/confusing because users expect to be able to match on enums. I think that sum types should be marketed similar to async functions without mentioning the word "enum" anywhere but in the internal documentation for the code that implements them.

MajorBreakfast on 14 Jun 2018

👍1

@MajorBreakfast You are of course correct about abstraction, but in Rust performance is also extremely important, and knowing implementation details is important for writing performant code.

Knowing that it is guaranteed to be implemented internally as an enum is useful information to have, even if it doesn't affect the semantic behavior.

Also, impl enum is an internal and local detail of the function which is using it, so there's no expectation of being able to match on it.

Of course there's no need to literally call it enum, a different word is fine, I just don't want us to veer too far into the "implementation doesn't matter" direction.

I think that sum types should be marketed similar to async functions without mentioning the word "enum" anywhere but in the internal documentation for the code that implements them.

Since it's an internal detail, I would expect impl enum to not show up at all in the docs. Only the function which is using impl enum should know or care about it.

Pauan on 14 Jun 2018

@Pauan Zero-cost abstractions form one of Rust core principles. This principle however does not require us to reveal irrelevant implementation details. It also does not require us to put a confusing keyword into our code which suggests behavior different to what is actually possible. Users should be able to just trust in zero-cost abstractions without distraction or being mislead.

MajorBreakfast on 14 Jun 2018

👍1

On the outside sum types only implement a specified set of traits. The fact that they're implemented using an enum is an implementation detail. This very similar to the anonymous type returned by an async function which is also essentially an enum. However, this is just "nice to know". Knowing that implementation detail isn't required to use the type. We only care about that async functions return a future.

I agree, also the fact that closures are structs is an implementation detail that would be unnecessary to make users of closures spell out in their code.

As long as mechanisms are implemented with zero overhead / in the most efficient way, spelling out that it's an enum adds no usefulness to the user..

Boscop on 14 Jun 2018

👍2

@Boscop And yet, this is an abstraction specifically for sum types. The very purpose of this feature is to save writing boilerplate that would otherwise be implemented manually by creating an enum. You could say the declaration of ordinary named enums is itself abstracting away some sort of implementation detail, and it is. This is no different.

@MajorBreakfast I think you're missing the point: I agree the consumer shouldn't care about the details of the implementation, that's the whole point it's an anonymous existential type. That's literally all I'm proposing the type signature for the function contains, as indeed it's all the user (consumer) would need to know about it. The implementor of the function however should have to explicitly use some keyword to make explicit the generation of an anonymous sum type. This is in the spirit of Rust letting you know what's zero-cost and what's not.

alexreg on 16 Jun 2018

As a point further to balancing of the verbosity of the impl enum (or whatever other) keyword used within the function body, the explicitness of the language and cost of abstractions, and the additional of "language bloat", I'm honestly starting to think it might be best to leave this as a proc macro (both an attribute and an ordinary one), such that one could do the following (adapting @Ekleog's example):

#[derive(auto_enum)]
fn foo() -> impl Iterator<Item = char> {
    let mut tmp = match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    match n {
        Some(_) => tmp,
        None => "foo bar".iter(),
    }
}

or equivalently

fn foo() -> impl Iterator<Item = char> {
    let res = auto_enum! {
        let mut tmp = match bar() {
            Some(x) => x.iter(),
            None => "".iter(),
        };
        let n = tmp.next();
        match n {
            Some(_) => tmp,
            None => "foo bar".iter(),
        }
    };
    res
}

The latter approach (callable proc macro) allows for doing more flexible things of course, since one doesn't need the return type of the function to necessarily be an existential type.

An alternative to the above would be getting rid of the proc macro attribute and (optionally) converting the auto_enum! macro into a keyword, but there's probably no point to that?

alexreg on 16 Jun 2018

@alexreg Proc macros are obviously the best (and least committal) solution, but they currently aren't powerful enough to implement this feature.

Improving the power of proc macros to enable this feature would be useful, but it would be a separate RFC.

Pauan on 16 Jun 2018

@alexreg I believe it’s impossible for macros to find the parts of the expressions that would need to be wrapped without a marker, e.g. in this example one of the variants is constructed inside a closure.

Nemo157 on 16 Jun 2018

@alexreg My previous posts should make it clear that I do want an explicit notation. I just don't want to use the word "enum" to describe this feature because I find it misleading for reasons I already explained. E.g. change #[derive(auto_enum)] into something like #[derive(auto_sum_type)] and I'm happy.

MajorBreakfast on 16 Jun 2018

@MajorBreakfast Then we might as well replace the existing enum syntax with sum_type. I don't see the conceptual difference. :-)

alexreg on 16 Jun 2018

@Pauan @Nemo157 You're right, and I was initially under this impression too, but something led me to change my view (I forget what exactly). Anyway, in this case, I would adapt my above proposal to get rid of the attribute and just have blocks like auto enum { ... } or something similar.

alexreg on 16 Jun 2018

As a response to the "revealing irrelevant details" argument, consider that we are already revealing mutability of by-val arguments in the signature of a function, so this exists inside the language already.

fn foo(v: mut String) {
    v += "hi";
    println!("{}", v);
}

rustdoc works around this by iirc hiding that mutability. We could do similar stuff for fn foo() -> impl enum Trait: hiding it in rustdoc.

est31 on 20 Jun 2018

👍1

@est31 That's more of a convenience point. It's not actually making the argument mutable (i.e. pass by reference), it's just the equivalent to doing let mut v = v; at the top of the function.

alexreg on 20 Jun 2018

👍2

@est31: This impl enum unification shouldn't only work for returning different values from a function, it should also work inside a function:
e.g. f(if cond { impl enum expr1 } else { impl enum expr2 }) where fn f<T: SomeTrait>(x: T).
Not that this is the most efficient way to use it in this particular case, but it should be allowed!

In some situations, e.g. with diesel's query builder, one could want to write:

let mut query = impl enum query1;
if cond {
    query = impl enum query.more_stuff();
}
query.execute(..)

where query.more_stuff() returns a different type that still impls the same trait that represents a query.

Boscop on 20 Jun 2018

As the discussion appears to have circled back to marker-at-return-type vs. marker-at-return-site, I'd just like to push up this bit of code:

fn bar() -> bool { /* ... */ }
fn baz() -> Vec<Baz> { /* ... */ }
fn quux() -> Vec<Quux> { /* ... */ }
// Baz and Quux implement Trait
fn foo() -> Vec<impl Trait> {
    if bar() {
        baz().map(|x| marker x).collect()
    } else {
        quux().map(|x| marker x).collect()
    }
}

Which simply has no equivalent with marker-at-return-type.

If anyone disagrees with this and thinks marker-at-return-site still makes more sense, I'd love to see an argument that hasn't already been brought up :)

Ekleog on 21 Jun 2018

👍2

Summary

So just to sum up where we are, in my mind, in this discussion (feel free to add to this if I missed anything):

Marker-at-return-site is strictly more expressive than marker-at-return-type, and should thus be preferred (see the previous discussion, a lot of time has been spent on this debate, and I don't think there is much more to say about it :) there were also arguments about intuitiveness in favor of marker-at-return-site, and of fewer characters to type in favor of marker-at-return-type)
It is currently possible to implement the proposal as a procedural macro, for a fixed set of traits that must be explicitly defined in the proc macro.
Adding a compiler builtin like @joshtriplett's enum_derive_delegate would make it possible to implement the proposal as a proc macro in a way generic over the trait to be implemented. (Or any other more powerful mechanism of delegation, if any such mechanism were to appear.)
However, such a proc macro couldn't be nicely integrated with ?, and in particular couldn't change the definition of ? to return marker err instead of err when erroring. For such an improvement, there would need to be support from at least the stdlib.
Also, it can't handle automatic dereference from a single marked type to the inner type -- ie. let t: Foo = marker foo; isn't valid -- which is a considerable drawback when considering adding marker inside various macros for easier integration. (actually, eg. the try! macro would need such a mechanism, were it to include a marker at the early-return site)
Finally, the proc macro could likely not handle cases where there are two different types to enum-ify, like when returning a Result<impl Trait1, impl Trait2> where we would want the two traits to be enum-ified. The question of whether a compiler extension would be able to handle that cleanly hasn't been solved for sure, but it appears likely that it could.

So the remaining open questions are:

Do we want an enum_derive_delegate helper for writing a proc macro that does generate auto-generated sum types?
Do we want a more powerful in-compiler mechanism for auto-generating sum types?

Ekleog on 21 Jun 2018

👍4 ❤2

@Ekleog Pretty good summary, but you accidentally wrote the same thing twice in the first point. :-)

Marker-at-return-site is strictly more expressive than marker-at-return-site

I think you mean return-type in the second case.

alexreg on 21 Jun 2018

👍1

I wouldn't call it "Marker-at-return-site" because it should also work for unifying expressions that aren't returned, as in @Ekleog's or my example above (in Ekleog's example, the unification happens way before the return).

Boscop on 21 Jun 2018

@Boscop I think you quoted me!

est on 21 Jun 2018

@alexreg Thanks! fixed :)
@Boscop Indeed, that's the initial name that stuck, I can't think of a better name :/ hopefully it is still understandable in order to compare the two ideas :)

Ekleog on 21 Jun 2018

@Ekleog Maybe the two ideas could be called "marker at unification" and "marker at expression"?

"marker at unification" requires writing the keyword only once, so it'd be more concise, but "marker at unification" should not be mistaken to mean "marker at return type".
I think "marker at unification" makes the most sense, but not at return type.
E.g. it could look like this:

fn bar() -> Option<LinkedList<char>> { /* ... */ }

fn foo() -> impl Iterator<Item = char> {
    impl enum match bar() {
        Some(x) => x.iter(),
        None => "".iter(),
    }
}

fn foo() -> impl Iterator<Item = char> {
    let mut tmp = impl enum match bar() { // first unification, of the match arms
        Some(x) => x.iter(),
        None => "".iter(),
    };
    let n = tmp.next();
    impl enum match n { // second unification
        Some(_) => tmp,
        None => "foo bar".iter(),
    } // this should not generate a 2-level nested enum as return type because it can be flattened
}

With "marker at expression" it would be much more verbose, requiring the impl enum keyword at every match arm (often you have many match arms).

("marker at return type" would only allow unification for return expressions but we want to allow it for expressions inside function bodies, too.)

Boscop on 4 Sep 2018

@Boscop I like your names :) however, I still am not convinced by marker-at-unification: how would you have it handle this example (from somewhere above in the thread)? There is no way to unify after the Vec has been built here without a silent reallocation (which would be the case with marker-at-unification), while with marker-at-expression the compiler does the job of unifying at the right place (ie. inside map's argument lambda).

fn bar() -> bool { /* ... */ }
fn baz() -> Vec<Baz> { /* ... */ }
fn quux() -> Vec<Quux> { /* ... */ }
// Baz and Quux implement Trait
fn foo() -> Vec<impl Trait> {
    if bar() {
        baz().map(|x| marker x).collect()
    } else {
        quux().map(|x| marker x).collect()
    }
}

Ekleog on 5 Sep 2018

@Ekleog The location of the marker does not influence where the compiler actually unifies!
Since the return type is Vec<impl Trait>, it knows it has to do the unification before the collect, since it can't do a unifying conversion for Vec<A> and Vec<B> (or any Foo<A> and Foo<B> when A and B both impl a trait).
So it will do the unification at the same location, no matter if we use "marker at expression" or "marker at unification".

But this shows that maybe the name "marker at unification" should be changed to "marker before branches" (aka "marker at sub-tree root"), because the actual unifying conversion always happens somewhere in the branches (in this case before the .collect(), in other cases at the end of each branch).
(But with the name "marker at unification" I meant that the logical "unification" point (that creates the need for unification) is the root of the sub-tree. The location of the inserted unifying conversion then is the "physical" act of unification.)

So the syntax is independent of the unification location, we just need to express which sub-tree should be unified. "marker before branches" (aka "marker at sub-tree root") would only require this keyword once (at the root of the sub-tree, before the (syntactical) branches), whereas "marker at expressions" (aka "marker inside branches" / "marker at sub-tree leaves") requires the keyword in every branch/sub-tree leaf..

With both syntaxes, the compiler would choose the same point for the actual unification: The last point where single values of type T: Trait occur in each branch (where you placed the marker above).

So to make the naming clearer, maybe we should just use "marker at tree root" vs "marker at leaves"..

Boscop on 6 Sep 2018

@Boscop The problem is that if I expect this to work with marker-at-tree-root: (just taking the previous example and moving the marker keyword, and adding .into_iter()s I had forgotten)

fn foo() -> Vec<marker impl Trait> {
    if bar() {
        baz().into_iter().map(|x| x).collect()
    } else {
        quux().into_iter().map(|x| x).collect()
    }
}

then I would also naturally expect this to work, because .map(|x| x) is a no-op:

fn foo() -> Vec<marker impl Trait> {
    if bar() {
        baz().into_iter().collect()
    } else {
        quux().into_iter().collect()
    }
}

and then the .into_iter().collect() looks really like a no-op, so I would expect this to work:

fn foo() -> Vec<marker impl Trait> {
    if bar() {
        baz()
    } else {
        quux()
    }
}

at which point, if the compiler actually implements this in a way so that it works, there is an implicit reallocation.

Hence my not really liking this option :)

Ekleog on 9 Sep 2018

and then the .into_iter().collect() looks really like a no-op, so I would expect this to work:

Intuitively, I do not expect baz().into_iter().collect() to be equivalent to baz(), because the former is transforming the value into a new Vec based upon the expected return type, whereas the latter is not.

Consider this simple example:

let x: Vec<char> = baz();
let x: String = baz().into_iter().collect();

As you can see, baz().into_iter().collect() is absolutely not the same as baz()! So it is not a no-op at all: in this specific situation your intuition is wrong. You can think of it as being similar to how foo and |x| foo(x) are not equivalent.

My understanding is that the auto-coercion of a value into the anonymous enum only applies to the base type.

So a T can be auto-coerced into a marker impl Trait, but a Vec<T> cannot be auto-coerced into a Vec<marker impl Trait> (on the other hand, a Vec<T> can be auto-coerced into a marker impl Trait).

So I would expect that your final example will give a compile error, because it will refuse to unify Vec<T> and Vec<marker impl Trait>. I would expect it to give the same compile error regardless of whether it's using "marker at expression" or "marker at return type".

In other words, this should also be a compile error:

fn foo() -> Vec<impl Trait> {
    if bar() {
        marker baz()
    } else {
        marker quux()
    }
}

Pauan on 9 Sep 2018

What you say about .into_iter().collect() not being a no-op is true indeed, however it is only changing the type of the container, not of the contents, currently as far as I know. Hence my saying it really looks like a no-op (to me currently) when going from Vec<_> to Vec<_> :)

Also, even baz().into_iter().collect() in my previous example cannot be unified as per the rules you raise up by the compiler, and the .map(|x| x) is actually necessary: baz().into_iter() is a impl Iterator<Item = Baz>, and the expected return type of the .collect() is Vec<marker impl Trait>, which cannot be made from a impl Iterator<Item = Baz> using method .collect().

Also, the .map(|x| x) would be required to give the compiler a chance to coerce inside the impl Iterator<Item = …>, as impl Iterator<Item = …> cannot be cast into a marker-ed one more than a Vec<…> could. (I'm not at all sure whether I'm putting my thoughts clearly here, please tell me if you find me hard to understand!). And I'm almost sure that currently considering .map(|x| x) is a no-op is actually correct, and not having it be a no-op would be very confusing :)

As for your last example about marker-at-leafs, we completely agree that this should not work, and it's why I had written baz().into_iter().map(|x| marker x).collect(), in the same way I would have written baz().into_iter().map(|x| MyHandRolledEnum::Variant1(x)).collect() :)

Ekleog on 10 Sep 2018

Also, even baz().into_iter().collect() in my previous example cannot be unified as per the rules you raise up by the compiler, and the .map(|x| x) is actually necessary

That's probably true, yeah (at least without major compiler trickery), in which case regardless of whether it uses "marker at expression" or "marker at type", the code will be identical, the only difference is where the marker is placed.

So the power of the two approaches should be identical, it's purely personal preference whether you prefer to explicitly mark where the coercion happens, or whether you prefer the convenience of only needing to specify the marker once.

I don't have a strong preference either way, but perhaps I'm leaning very slightly toward "marker at expression", simply because it makes it more obvious that .map(|x| marker x) is not a no-op.

And I'm almost sure that currently considering .map(|x| x) is a no-op is actually correct, and not having it be a no-op would be very confusing :)

That's not necessarily true, depending on the implementation of map.

I agree that intuitively it feels like it should be a no-op (just like how foo and |x| foo(x) intuitively feels like they should be identical), but that isn't necessarily true in reality.

As for your last example about marker-at-leafs, we completely agree that this should not work

Great, I'm glad we're in agreement about that.

Pauan on 13 Sep 2018

👍2

@Ekleog @Pauan I think we already agreed that "marker at return type" is not what we want, because we also want to be able to unify sub-expression trees within a function body.
So we have to decide between "marker at expr-tree root" and "marker at expr-tree leafs".

Also, even baz().into_iter().collect() in my previous example cannot be unified as per the rules you raise up by the compiler, and the .map(|x| x) is actually necessary

At least an implicit .map(|x| T::from(x)) would have to be inserted by the compiler but I'm not sure if we should make the compiler aware of iterators just so that it can insert this "magic". If we special-cased it for iterators, other wrapper types would be second-class citizens..

I don't have a strong preference either way, but perhaps I'm leaning very slightly toward "marker at expression", simply because it makes it more obvious that .map(|x| marker x) is not a no-op.

Yes, if we agree that we don't want the compiler to insert a magic .map(|x| T::from(x)) for iterators, we have to go with "marker at expression".

Btw, with "marker at expression", when we get impl Trait for let, your example could be written like this:

fn foo() -> Vec<impl Trait> {
    let iter: impl Iterator<Item = impl Trait> = if bar() {
        baz().into_iter().map(|x| marker x)
    } else {
        quux().into_iter().map(|x| marker x)
    };
    iter.collect() // collect() after the unified expression tree
}

And when we get impl Trait for expressions even if not bound by let, it could be written like this:

fn foo() -> Vec<impl Trait> {
    if bar() {
        baz().into_iter().map(|x| marker x)
    } else {
        quux().into_iter().map(|x| marker x)
    }.collect() // collect() after the unified expression tree
}

so you wouldn't even have to write .collect() twice :)
(But you still need to write .map(|x| marker x) in each leaf expr because the iterator Item type has to be the same.)

Boscop on 16 Sep 2018

I think we already agreed that "marker at return type" is not what we want, because we also want to be able to unify sub-expression trees within a function body.

I don't see how "marker at type" prevents that. You can do stuff like this:

fn foo() {
    let x: marker impl Trait = ...;
}

This will unify the various types within ... into the marker impl Trait type.

At least an implicit .map(|x| T::from(x)) would have to be inserted by the compiler but I'm not sure if we should make the compiler aware of iterators just so that it can insert this "magic". If we special-cased it for iterators, other wrapper types would be second-class citizens..

I don't think I ever implied that. My proposal was that whenever converting an expression of type T to marker impl Trait it would do the automatic conversion. So it will work in any situation (not just with Iterators).

It just so happens that with Iterators the simplest way to create a T -> marker impl Trait expression is with .map(|x| x) (specifically, the x expression within the closure).

My comment about "compiler trickery" was with regards to the compiler doing the automatic conversion without .map(|x| x) (since in that case there wouldn't be any T -> marker impl Trait expressions).

Pauan on 16 Sep 2018

I don't see how "marker at type" prevents that. You can do stuff like this:

fn foo() {
    let x: marker impl Trait = ...;
}

@Pauan But we shouldn't require that the expression that we want to unify has to be bound with a let binding and given an explicit "type" (impl Trait)!

I don't think I ever implied that. My proposal was that whenever converting an expression of type T to marker impl Trait it would do the automatic conversion. So it will work in any situation (not just with Iterators).

The compiler can't do it automatically for wrapper types without knowing how to unwrap and re-wrap them though! It would need special-cased "magic" which would make custom wrapper types second-class citizens.

It just so happens that with Iterators the simplest way to create a T -> marker impl Trait expression is with .map(|x| x) (specifically, the x expression within the closure).

But if we don't use "marker at expression" and then require people to write .map(|x| x) (in the above example) just so that the compiler can insert the conversion there, it looks like a No-Op to humans, which we should avoid. Someone will remove it, thinking that it's a No-Op, and will get weird compiler errors. That's why we should use "marker at expression" (.map(|x| marker x)) so that it will be clear that this is not a No-Op, and this location is where the conversion happens!

Can we agree that "marker at expression" would satisfy all our requirements? It works for sub-expressions in function bodies (not just return expressions), it doesn't require a (typed) let binding, and it makes it clear where the conversion happens!

Boscop on 17 Sep 2018

But we shouldn't require that the expression that we want to unify has to be bound with a let binding and given an explicit "type" (impl Trait)!

But that's already a requirement for "marker at expression":

fn foo() {
    let x: impl Trait = ... marker ...;
}

The type is needed, because without the type annotation the Rust compiler doesn't know what trait you're trying to convert it into.

The two systems are fundamentally equivalent, the only difference is the syntax (whether marker is at the type or expression).

The compiler can't do it automatically for wrapper types without knowing how to unwrap and re-wrap them though!

Yes it can, very easily. Why do you say it can't?

It would need special-cased "magic" which would make custom wrapper types second-class citizens.

Yes, which is the same as how closures are handled, but that's not a problem.

To be clear, "marker at expression" also has that problem, you act like it's a problem only with "marker at type", but it's not.

Regardless of which system is used, the compiler will generate an anonymous type, wrap the values into the type, and implement the trait for the type.

They are exactly the same system, the only difference is the syntax for where the marker goes. There is no fundamental difference between them.

But if we don't use "marker at expression" and then require people to write .map(|x| x) (in the above example) just so that the compiler can insert the conversion there, it looks like a No-Op to humans, which we should avoid. Someone will remove it, thinking that it's a No-Op, and will get weird compiler errors.

I agree, which is why I said in an earlier post that I'm very slightly in favor of "marker at expression".

But even then, the differences between them are minor. Both systems have benefits and drawbacks.

Can we agree that "marker at expression" would satisfy all our requirements?

Sure, but "marker at type" also satisfies the requirements.

It works for sub-expressions in function bodies (not just return expressions)

This is irrelevant, because both systems work fine with sub-expressions.

it doesn't require a (typed) let binding

Except that it does.

and it makes it clear where the conversion happens!

Yes, that is the primary benefit of it (which I already admitted to in an earlier post, I'm not sure why you keep trying to convince me of this).

The primary downside is needing to specify marker multiple times, and also the fact that when looking at the type it's not clear that there is an extra performance cost (compared to a regular impl Trait).

It just occurred to me that there is a third option: have the marker at both the type and expressions. This is even more verbose, but it makes everything extremely explicit.

Pauan on 18 Sep 2018

@Pauan

But that's already a requirement for "marker at expression":
fn foo() {
    let x: impl Trait = ... marker ...;
}
The type is needed, because without the type annotation the Rust compiler doesn't know what trait you're trying to convert it into.

Well, this is potentially not true: the compiler could infer impl Trait from the uses of x, in the same way it currently infers u16 for x if I do fn foo(_: u16) {} let x = 42; foo(x);. See the machinery proposed at https://github.com/rust-lang/rfcs/issues/2414#issuecomment-384144887 for how this could be implemented (basically, similar to the current handling of {integer}, I think).

It would need special-cased "magic" which would make custom wrapper types second-class citizens.

Yes, which is the same as how closures are handled, but that's not a problem.

I think @Boscop was speaking of making Iterator special-cased for automatically inserting the said .map(|x| x), which would sound like a bad idea to me too, because then eg. Streams become awkward to use (the same code that works for Iterator doesn't work for Stream)

The primary downside is needing to specify marker multiple times, and also the fact that when looking at the type it's not clear that there is an extra performance cost (compared to a regular impl Trait).

Well, a regular impl Trait can already be a manually-generated sum type or a trait object, so I'm not sure the extra performance cost should actually be shown here, as it's not necessarily slower than a non-markered type :) and so for this reason (as well as the fact that variable types could be inferred by the compiler) I don't think the explicitness at type location is actually required.

Ekleog on 19 Sep 2018

These examples involving containers or iterators of autogenerated enum types are starting to feel subtle enough that I'm not entirely sure it's worth going out of our way to support them. After all, whatever we end up using for marker, the whole point of marker impl Trait is to be a sugar for simply writing your own enum (and updating it every time your function gains/loses a return path). The other day I tried to catch up on this discussion and it took me several minutes just to make sense out of all this talk of why map(|x| marker x) was or was not necessary under various counterproposals. When understanding the intended semantics of a sugar feature becomes a challenge in and of itself, that feels like we've put the cart before the horse and it no longer qualifies as a "sugar".

Ixrec on 19 Sep 2018

@Ekleog the compiler could infer impl Trait from the uses of x

I'm not sure if that's true. It seems to me that inferring trait usage is quite a lot different (and more complicated) than inferring types like {integer}. But I haven't worked on the rustc compiler, so perhaps I'm wrong.

I think @Boscop was speaking of making Iterator special-cased for automatically inserting the said .map(|x| x), which would sound like a bad idea to me too

Oh, you're right, it does sound like they were saying that, my mistake. Though I had only mentioned compiler trickery in one small off-hand remark (and I wasn't in favor of it), and all of the discussions since then have been about requiring the .map(|x| x).

When I said this:

My proposal was that whenever converting an expression of type T to marker impl Trait it would do the automatic conversion.

I was not referring to some magic Iterator system, just an extremely simple type unification: when the compiler sees .map(|x| x) it knows that the x expression has the type T, and it tries to unify it with marker impl Trait, so it then inserts the wrapper around the x expression.

This works with any expression, anywhere (as long as it is being unified with marker impl Trait). It has nothing at all to do with Iterator. It's no different from ordinary type unification, or Into / From. It's not complicated at all, there's no compiler magic.

And it certainly would not allow for things like baz().into_iter().collect() or baz() (because there aren't any expressions of type T -> marker impl Trait).

Well, a regular impl Trait can already be a manually-generated sum type or a trait object, so I'm not sure the extra performance cost should actually be shown here, as it's not necessarily slower than a non-markered type :)

That is a good point, yeah.

@Ixrec These examples involving containers or iterators of autogenerated enum types are starting to feel subtle enough that I'm not entirely sure it's worth going out of our way to support them.

I'm not sure why the conversation has veered so far into Iterator. I don't think anybody is in favor of Iterator special-casing, and none of the proposals are adding in special support for Iterator (or any other sort of special-casing).

The two primary proposals are:

Have the compiler insert the wrapper when unifying an <EXPR> expression of type T with type marker impl Trait.
Have the compiler insert the wrapper when using marker <EXPR> (where <EXPR> is an expression of type T unifying with type impl Trait).

The differences are purely in the syntax (the type unification is the same, the anonymous type is the same, etc.)

In either case Iterators do not get any sort of special treatment, they're the same as anything else. So you will need to use foo.iter().map(|x| x) (with the first proposal) or foo.iter().map(|x| marker x) (with the second proposal).

And the same applies to Stream, Future, Option, Result, and anything else. No special-casing, no magic.

When understanding the intended semantics of a sugar feature becomes a challenge in and of itself, that feels like we've put the cart before the horse and it no longer qualifies as a "sugar".

This is the exploration / design phase, it's normal for things to get complicated and confusing, because many people are exploring many different possible options. The goal right now isn't to "decide" on some pristine final solution, the goal is to explore, and try out different ideas.

In addition, people are trying to explain their ideas (and also clarify any misconceptions), and people also change their mind, or new proposals are added, etc. So, some messiness is unavoidable.

However, that doesn't mean that the final sugar will be complicated or confusing. The two proposals right now are both quite simple (to understand, and also to implement).

I expect that after the exploration phase, a proper RFC will be written, in a clear and understandable way, without any of the current confusion or complications.

Pauan on 19 Sep 2018

I'm not sure why the conversation has veered so far into Iterator. I don't think anybody is in favor of Iterator special-casing, and none of the proposals are adding in special support for Iterator (or any other sort of special-casing).

Sorry, I didn't mean to imply I thought anyone was proposing Iterator-specific magic, or that the on-paper description of the syntax had ever stopped being trivial.

What I meant was that people are making arguments for what the syntax ought to be based on what would be less weird for TypeConstructor<marker impl Trait>, but in these nested use cases every proposed variation of this syntax seems like it'd be more of a "clever" trick than a helpful self-evident sugar (to a future reader of the code that's not actively thinking about this specific feature like we are), and I'd rather just write out the enum myself at that point, so I don't think we should be encouraging or designing for these use cases (unless we somehow find a way to make even these cases similarly self-evident).

However, that doesn't mean that the final sugar will be complicated or confusing. The two proposals right now are both quite simple (to understand, and also to implement).

It's the need for map(|x| maybe-marker x) in these Iterator/Vec/etc cases that I found confusing. Yes, the proposals are still trivial to describe on paper, but that doesn't make their application to these use cases trivial to understand, even to a reader like me that's familiar with and actively thinking about this sugar.

Ixrec on 19 Sep 2018

@Ixrec The case for map(|x| marker x) came up when I noticed it just wouldn't be possible to handle unification at this point (with a reasonable syntax, I consider having map(|x| x) not be a no-op would be something that'd easily get in a “WTF Rust‽” talk and that we should avoid it) for marker-at-root-expression. Having a syntax that works only for a subset of cases would be sad, even though the unsupported subset of cases is rare there would necessarily come a point where someone would want to do it and be annoyed by such a limitation. Off-my-hat example: impl Stream<Item = impl Future> where the impl Future is going to be an AGST.

That said, my main argument in favor of marker-at-leaf-expressions still remains (well, after the original flip-flop) that it's how things would be done with a manually-written enum: just replace Either::A(stuff) with marker stuff and everything else gets done automatically.

Now, I must also say that my original use case for this was for huge futures. With async/await moving forward, this use case will likely become a lot less relevant, so I'm no longer sure this should actually get syntactic sugar. Just to say that I won't be pushing this forward until we gain some experience with async/await, to know whether there is still a real need for this RFC (for me) after async/await. :)

Ekleog on 19 Sep 2018

Now, I must also say that my original use case for this was for huge futures.

For (historical?) context, way way back when I first got into this topic and I suggested we call this enum impl Trait and we hadn't all collectively decided to always say marker for fear of bikeshed tangents, the motivating use case was autogenerating error enums. The problem statement back then was that whenever you change a function's implementation in a way that produces a new early return/error codepath, that new error type was typically "viral" in the sense that the function signature and all of its callers and their callers had to be updated to use a new slightly different set of error enums. With just regular impl Trait stabilized, the "viral" aspect is now fixed, but there's still the nuisance of bouncing between the "real" function and its dedicated error enum.

So that's part of the reason the recent discussion of nested things like Iterator<marker impl Trait> seemed undermotivated to me. Returning a collection or iterator of errors is very unusual. But you're right that looking at this from a Futures/Streams/etc angle completely changes that.

Ixrec on 19 Sep 2018

👍1

@Ixrec (aside, this is the first time I've noticed that's a capital I not a lowercase l)

error types have some of the exact same issues with wanting to have the conversion happen inside closures though, for example you might want to write something like

use std::error::Error;

fn foo() -> Result<u32, Error1> where Error1: Error { ... }
fn bar(baz: u32) -> Result<u64, Error2> where Error2: Error { ... }

fn qux() -> Result<u64, marker impl Error> {
    let baz = foo().map_err(|e| marker e)?;
    let result = bar(baz).map_err(|e| marker e)?;
    Ok(result)
}

to make the short way to write it work, somehow From::from would need to be supported as an alternative to the marker

fn qux() -> Result<u64, marker impl Error> {
    Ok(bar(foo()?)?)
}

Nemo157 on 19 Sep 2018

Yeah, in older discussions I think it was always assumed that if this feature happened ? would be integrated well enough with it to enable the "short way".

Ixrec on 19 Sep 2018

@Ixrec Once we have impl enum baked into the language we can make ? work with it, too, because it's also baked in..

@Ekleog @Pauan Yes, there was some confusion. I'm not in favor of special-casing unwrapping/rewrapping for Iterators or any other type, that's why I prefer "marker at expression" so that we'd write .map(|x| marker x) etc.
Btw, it should be possible to infer the trait that should be used (more often than not), thus "marker at expression" wouldn't always require a let binding, e.g.:

foo(if cond { marker a } else { marker b });

fn foo<T: Bar>(x: T) { ... } // or written as: fn foo(x: impl Bar) { ... }

it allows the compiler to infer that the trait to use for unification is Bar.

Even in cases where it can't be inferred, it should work with a : impl Trait type ascription after the expression, still not requiring a let binding :)

Another argument for keeping the marker away from the impl Trait is that the trait itself is not affected by the unification. It's just a "coincidence" that impl Trait is written at the expression tree root in some cases (let bindings and return values). impl enum Trait would make it weird because then the marker would occur in a syntactic location that should only be used for specifying trait names.

But the strongest argument for "marker at expression" IMO is .map(|x| marker x). So I'm really in favor of "marker at expression"..

Boscop on 20 Sep 2018

I like roughly the enum_derive_delegate! macro idea meaning do this via procedural macros. We'll want enums that delegate traits like that anyways.

Also, there are seemingly many parallels to trait objects here, which we could leverage via improvements in DST handling:

fn foo(x: bool) -> impl Trait {
    trait MyTrait = Trait<AsssocType = SomeType, const C = some_value>; 
    if x { return enum_from_trait_object!(bar() : dyn MyTrait) }
    return enum_from_trait_object!(baz() : dyn MyTrait) }
}

In this, enum_from_trait_object! is a procedural macro that creates and returns an enum type by delegation from a trait object value realized by only a fixed number of concrete types, all with exactly the same associated types or constants for the trait.

We might need procedural macros to interact with type inference for that exact syntax to work, as the different invocations of enum_from_trait_object! must be linked together. We could avoid this with slightly more limited syntax:

fn foo(x: bool) -> impl Trait {
    trait MyTrait = Trait<AsssocType = SomeType, const C = some_value>; 
    enum_from_trait_object!(if x { 
        bar() : dyn MyTrait 
    } else { 
        baz() : dyn MyTrait 
    })
}

You could still simplify this syntax with a convenience macro though:

#[anonymous_delegated_enum]
fn foo(x: bool) -> impl Trait {
    if x { return bar() : dyn Trait }
    return bar() : dyn Trait }
}

burdges on 20 Sep 2018

@Boscop (disclaimer: I'm in favor of marker-at-expression too)

IMO, there is still an argument in favor of marker-at-type, and it is the question of ?: should it automatically add in marker to the error value, or not?

The answer to this question is highly non-obvious to me.

@burdges The enum_from_trait_object! can be written iff. the enum_derive_delegate proc macro is added to the language (as that would require special compiler support to generate the correct list of functions).

Also, the issues with using only the enum_derive_delegate to actually implement this idea have been listed here. In particular, it cannot be integrated with ?, nor with custom try!-like macros, because it can't say that if a single type is found then there should be no wrapper. This absence of integration makes it really unlikely to be used for errors IMO, which is likely becoming the primary use case for it, with async/await coming into existence.

Ekleog on 20 Sep 2018

@Ekleog Btw, ? also works in do/try-catch blocks (which are expressions that don't necessarily have a type ascription) not just for return values. But ? already calls .into(), so it could also insert the marker but then we'd need a marker at the root of the expression tree (not necessarily at the type of this expression tree because there might not be a type ascription).
So to satisfy both constraints (1. .map(|x| marker x) and 2. ? uses marker when necessary) we'd either need to use a marker at expr AND at tree root, or require at least one of both (but not necessarily both).
Cases like (1) would require the marker at expr but cases like (2) would require it at the tree root / type.

Boscop on 21 Sep 2018

You first do not want ? to wrap errors when all types have identical error types, while you second do want to wrap errors into a second enum when they differ, right?

We cannot have syntactic ambiguity between the first and second cases here of course since they create different associated types. I only suggested requiring identical associated types, which supports the first case but completely forbids this second case.

We might call this second form "associated type erasure safe", especially in the trait object context. We have _many_ object safe traits like AddAssign<&'a T> which do not support erasure of the associated type T, either by enum or reference. Any associated type being used in argument, constant, or constraint position seemingly creates this problem.

Anyways..

An anonymous enum type is seemingly a special case of a trait object, so the syntax enum Trait could be implemented by creating a trait object, except that (a) rustc identifies all variants used, (b) assigns size from the largest variant, and (c) places the vtable reference inside the type as a discriminant. Implementing with actual enums sounds like an optimization.

We should thus explore type erasure for trait objects before worrying about the enum special cases being discussed here. Is there any serious proposal even for generic tooling to say supersede the erased-serde crate? Associated type erasure safety, and merging error types, goes way beyond that, but that's your starting point.

burdges on 21 Sep 2018

_See also the discussion in #2261, which has been closed in favor of this issue._

mqudsi on 9 Oct 2018

I've wrote a pre-RFC which includes functionality requested in this issue.

newpavlov on 7 Nov 2018

👍3

I wrote this feature with procedural macro: auto_enums.
This analyzes obvious branches (match, if, return, etc..) and assigns variants.
In many cases, macros and methods for identification are unnecessary.

#[auto_enum(Iterator)]
fn foo(x: i32) -> impl Iterator<Item = i32> {
    match x {
        0 => 1..10,
        _ => vec![5, 10].into_iter(),
    }
}

Generated code

Several other points:

Traits which are not supported beforehand can also be implemented via proc_macro_derive.
Multiple auto_enum attributes can be nested.
~~Since '?' operator is not yet supported, it is not practical to use it for error handling.~~; implemented
It is necessary to specify the trait to implement. Although there is a feature to analyze impl Trait, it is still experimental.

taiki-e on 24 Jan 2019

👍16 ❤4

@taiki-e Good stuff! This looks like a well-engineered approach.

alexreg on 24 Jan 2019

For the record, my personal colour of the bikeshed would be:

fn foo(do_thing: bool) -> impl Iterator<Item = u32> {
    let a: enum = if do_thing {
        iter::once(1)
    } else {
        iter::empty()
    };
    a
}

This syntax can only be used on let bindings (+ closure args/returns etc) and can't
be used on function returns. This makes it clear where the type is generated (as opposed to enum expr-style syntax) while also not leaking the implementation detail that the function returns an implicit sum type to the caller. Essentially, anywhere where you could have _ to infer a single type, you can use enum to take all the possible inferred types and create an implicit opaque enum that acts like a impl A + B + C + D for the intersection of traits that are implemented by all inner types. Just like an impl A + B can implicitly convert to an impl A, so can an opaque type created with enum implicitly convert to impl A for any A that all variants implement.

Upsides:

Relatively self-explanatory once you know it exists
Makes it obvious which value has a generated type
Doesn't leak implementation details outside a function
Similarity to _ makes it easy to explain
No need to explicitly name all the traits when it's not used outside of the function
Extremely lightweight syntax without sacrificing explicitness
Pairs very well with https://github.com/rust-lang/rfcs/pull/2522 (especially the ability to do let a: impl Trait = match { ... }: enum;)

Downsides:

Not very discoverable (although that's what error messages are for)
Can lead to boilerplate where you need to wrap your whole function body in a let binding for the common case that you only need the generated enum to return from a function (although a trivial macro could hide that boilerplate)
Not particularly clear what to do when the inner types are ambiguous. In the above example, the 1 is of ambiguous type. It would probably be pretty easy to just force resolution of each branch to a specific type
Might make method resolution confusing compared to explicitly stating the traits you want to implement, although AFAICT no more confusing than when one uses let a: _ and you can always add let a: impl Trait = a directly underneath

EDIT: I just realised that this would also be a good testbed for "compact" enums, since that optimisation could trivially be applied to these generated types without affecting stability or compatibility (since the types are only ever touched by generated code) https://github.com/rust-lang/rfcs/issues/311

Vurich on 9 Sep 2019

@Vurich In that case, I believe it would also be appropriate for the syntax .map(|n| -> enum {n}) to also work, if I am not mistaken. It doesn't put the marker where others have proposed, but it does make the unification explicit at the leaf.

Also, I would be excited to see this syntax move forward. Is now a good time to make an RFC? I would like others to confirm that the let: a: enum = ... and .map(|n| -> enum {n}) syntaxes are valid and work here in the Rust compiler. I haven't made an RFC before, but I volunteer to make one if its ready. We have already had a lot of discussion so far and little more is happening.

vadixidav on 16 Nov 2019

Can we consider whether @taiki-e's crate is now sufficiently good for this issue to be closed? It's worth the debate at least.

alexreg on 17 Nov 2019

Can we consider whether @taiki-e's crate is now sufficiently good for this issue to be closed? It's worth the debate at least.

I think we could do that if it supported arbitrary amounts (more than one, if we even support that now), positions (inside a vec or by itself), and traits (any arbitrary trait) of impl Trait in addition to the ? syntax that started this whole discussion to get clean and easy error enums. If it's possible to do all of this with a macro, then I think that is probably a better way to go about it instead of adding a feature to the language. One counter argument I see would potentially be compilation time. Does anyone else have any objections to this being in proc macro crate? It seems like it could be possible, but I'm not an expert.

@taiki-e Do you think if someone took the time that it would be possible to implement all the above features. The main ones are things like Vec where the impl trait exists in some arbitrary place and some spot must exist where the leaf is converted to the underlying enum or an error is raised. It seems like that might require doing some of the work the compiler is already doing to unify the types. Additionally, is the ? syntax with errors doable with the proc macros?

Edit: Upon further thought, I don't think its possible for a macro to do it for an arbitrary trait because it cant visit all the things it needs to implement in the trait since it isn't processing that.

vadixidav on 17 Nov 2019

arbitrary amounts

number of variants supported: 2..usize::max_value()

positions

see https://docs.rs/auto_enums/0.7.1/auto_enums/#supported-syntax and https://docs.rs/auto_enums/0.7.1/auto_enums/#positions-where-auto_enum-can-be-used

proc-macro cannot understand the type, so there are cases where additional annotations are needed:

nested: #[nested] (see also https://github.com/taiki-e/auto_enums/pull/67#issuecomment-539171781)
positions not included above: marker! (or directly use #[enum_derive]).

? is also supported, but I think it is difficult to support it efficiently with proc-macro (see https://github.com/taiki-e/auto_enums/issues/14#issuecomment-483672675).

any arbitrary trait

This is impossible, but unsupported traits can be implemented via proc_macro_derive. (auto_enums passes unsupported traits to #[derive])
Also, can easily add support for any crate by using derive_utils.

taiki-e on 17 Nov 2019

❤2

Seems like it is all possible. It sounds like the more productive course of action is to extend proc macros in some way (if necessary) to assist in avoiding duplicates for the ? operator. I would like to see this issue moved to completion. Do we have potential solutions for inspecting the error types and avoiding generating the different variants for ones we have already seen? If so, issues should probably be opened for those things before this is closed. It does seem to me like this issue will be solved then, since the original motivation was for easier error handling. This ticket has had very little activity recently, but I encourage anyone with any objections to add their part.

vadixidav on 17 Nov 2019

@vadixidav Yes, extending the capabilities of proc macros definitely sounds like a better way to go right now.

alexreg on 17 Nov 2019

Are there similar crates with similar issues for doing delegation with proc macros? If so, maybe organizing all the related issues from both sounds like the first step?

burdges on 17 Nov 2019

Yes, the ambassador crate via https://github.com/rust-lang/rfcs/pull/2393#issuecomment-555388493 does delegation, so maybe some common desires for proc macro features there.

I'll also note https://github.com/rust-lang/rfcs/pull/2587#issuecomment-457382927 was closed as postpone only one year ago.

burdges on 19 Nov 2019

I think overloading the term "enum" might make sense to people who are familiar with why it's named that way, but it'd be confusing conceptually.

Currently -> impl Trait makes sense as it can be read as "return an implementation of Trait".

-> enum impl Trait ("return an enum of an implementation of Trait") would leave the user scratching their head (my reaction would be, "what enum"?)

teohhanhui on 22 Jun 2020

@teohhanhui It would be read as "return an enum that impls the trait".
I think it's not confusing. All language features require reading docs before understanding. I don't think it's an issue: Once you've seen it & looked it up, you know it.
And in this case the meaning could also be inferred if the programmer already knows what -> impl Trait means and that it's a pattern to sometimes use enums to get static dispatch.

In fact, I think -> enum impl Trait is the syntax that makes the most sense because of this. It's just an auto-generation of a pattern that would otherwise be handwritten.

Boscop on 22 Jun 2020

The fact that it uses an enum is an implementation detail. That's why I said it's confusing conceptually. Why should the compiler always be tied to returning an enum?

teohhanhui on 22 Jun 2020

The fact that it uses an enum is an implementation detail. That's why I said it's confusing conceptually. Why should the compiler always be tied to returning an enum?

Actually, with all optimization resulting type might be not an enum at all, thanks to unspecified ABI, enum keyword in this case is used to avoid breaking change: changing behavior of impl trait in return position can be not backward compatible, but this point require further investigation. From syntactic point of view enum impl Trait is a part of function's contract, meaning "one of fixed set of types implementing the Trait (regardless of are they existential or not)".

tema3210 on 8 Jul 2020

Rfcs: Auto-generated sum types

Most helpful comment

All 204 comments

Should the indication of the fact the return is an anonymous sum type lie in the return type or at the return sites?

[derive(IntoLetterIter)]

[IntoLetterIterString="foo"]

[derive(IntoLetterIter)]

[IntoLetterIterString="hello"]

[derive(IntoLetterIter)]

Auto-generated sum types (AGSTs) are pure syntax sugar

`impl Trait` in return-type position makes no guarantees to the caller beyond what it says

A monomorphizing let binding is already required to return boxed trait objects of differing concrete types through an impl Trait

AGSTs should be seen as a foil to boxed trait objects, not to a monomorphized `impl Trait`

The salient performance tradeoff in choosing a sum type vs a boxed trait object is potentially over-allocating space vs having a separate but minimal heap allocation

The "should I box it or just pass it around unboxed" question already exists throughout Rust

The reason people don't like boxed trait objects is not usually performance, it's that boxing is annoying.

Boxed trait objects are already becoming `Box<dyn Trait>`, so AGSTs as `dyn Trait` are a clear unboxed analog

[derive(EnumInnerAsTrait(as_test -> &Test)],

Summary

Related issues

Rfcs: Auto-generated sum types

Most helpful comment

All 204 comments

Should the indication of the fact the return is an anonymous sum type lie in the return type or at the return sites?

[derive(IntoLetterIter)]

[IntoLetterIterString="foo"]

[derive(IntoLetterIter)]

[IntoLetterIterString="hello"]

[derive(IntoLetterIter)]

Auto-generated sum types (AGSTs) are pure syntax sugar

impl Trait in return-type position makes no guarantees to the caller beyond what it says

A monomorphizing let binding is already required to return boxed trait objects of differing concrete types through an impl Trait

AGSTs should be seen as a foil to boxed trait objects, not to a monomorphized impl Trait

The salient performance tradeoff in choosing a sum type vs a boxed trait object is potentially over-allocating space vs having a separate but minimal heap allocation

The "should I box it or just pass it around unboxed" question already exists throughout Rust

The reason people don't like boxed trait objects is not usually performance, it's that boxing is annoying.

Boxed trait objects are already becoming Box<dyn Trait>, so AGSTs as dyn Trait are a clear unboxed analog

[derive(EnumInnerAsTrait(as_test -> &Test)],

Summary

Related issues

`impl Trait` in return-type position makes no guarantees to the caller beyond what it says

AGSTs should be seen as a foil to boxed trait objects, not to a monomorphized `impl Trait`

Boxed trait objects are already becoming `Box<dyn Trait>`, so AGSTs as `dyn Trait` are a clear unboxed analog