Chapel: Split-initing arrays using `proc =` overload doesn't compile

Created on 11 Jun 2020  路  16Comments  路  Source: chapel-lang/chapel

Summary of Problem

This behavior starts with https://github.com/chapel-lang/chapel/pull/15239.
Avoiding split initialization is a potential workaround.

Steps to Reproduce

Source Code:

record Point {
  var x: real;
  var y: real;
}

proc =(ref arr: [?d], ref r: Point) {
  arr[d.first] = r.x;
  arr[d.first+1] = r.y;
}

var arr: [0..1] real;
var p = new Point(10,20);

// arr[0] = 0;  // uncomment this and it works

arr = p;

writeln(arr);

Associated Future Test(s):
https://github.com/chapel-lang/chapel/pull/15809 will add a future

Compiler Bug user issue

All 16 comments

@mppf FYI

I naively tried to add an overload of chpl__coerceMove but couldn't get it to work. I don't know much about the subtleties of that, though. Maybe it shouldn't be added in this case in the first place?

For the curious, the behavior of the test is:

$CHPL_HOME/modules/internal/ChapelIteratorSupport.chpl:330: error: cannot iterate over values of type Point
$CHPL_HOME/modules/internal/ChapelIteratorSupport.chpl:330: note: unresolved call had id 1375289

It may be obvious, but I _think_ what's happening is that the compiler is trying to apply split initialization to the arr=p; statement, doesn't find a matching copy initializer, so tries to implement it as forall (a,i) in zip(arr,p) do a init= i; and then can't iterate over p. Presumably rather than generating the error, it should decide it can't split-initialize the array, default initialize it, and then assign it on the arr=p; line.

For me, this issue also opens an interesting question about whether and how a programmer could provide a copy initializer overload for an array (without relying on internal features).

If we want to allow users to override = for internal types then yes it appears that we also need to allow them to override init=. #15717 has basically the same problem - except that the current compiler does switch to default-init/assign in that case.

I am not so sure we want to switch to default-init/assign in a case like this long-term. The reason for that is that I am concerned that users will find the performance differences between default-init/assign and init= confusing ("Why does my program get slower when I add this = overload?").

I think that an init= for arrays with proper runtime type support is not so hard, because the compiler will arrange to put the runtime type part into this.type before the init= is called at all. So it seems to be quite plausible to me that we could expect the user in this case to provide both = and init=. (Of course there is plenty of compiler work to do in order to get there).

I think that an init= for arrays with proper runtime type support is not so hard, because the compiler will arrange to put the runtime type part into this.type before the init= is called at all. So it seems to be quite plausible to me that we could expect the user in this case to provide both = and init=. (Of course there is plenty of compiler work to do in order to get there).

Here, are you suggesting that the user should be able to write a custom copy initializer for arrays, right? Similar to what @bradcray brought up?

I am not so sure we want to switch to default-init/assign in a case like this long-term. The reason for that is that I am concerned that users will find the performance differences between default-init/assign and init= confusing ("Why does my program get slower when I add this = overload?").

I may be rehashing and already-ended discussion here. But my 2 cents is that that confusion is less important than them not being able to compile without adding a custom copy initializer for arrays. I may be thinking this way because I view split initialization as sort of an "optimization", which is not a very accurate view admittedly. But what I am trying to say is that user should not be thinking of arr=p line as something that initializes the array. The fact that the compiler writes it that way should be more opaque to the user.

Here, are you suggesting that the user should be able to write a custom copy initializer for arrays, right? Similar to what @bradcray brought up?

Yes

I may be rehashing and already-ended discussion here.

I don't think any decisions have been reached yet.

But my 2 cents is that that confusion is less important than them not being able to compile without adding a custom copy initializer for arrays.

I agree that Engin's program should compile without changes and am not concerned about the potential for performance surprises (the same performance surprise argument holds for any record that defines = without a corresponding init=, and the record may hold an array, so I think we already have this form of surprise). In asking about the copy initializer I was only thinking about how one could get to a more efficient implementation, not proposing it as the solution for this compiler error.

But what I am trying to say is that user should not be thinking of arr=p line as something that initializes the array. The fact that the compiler writes it that way should be more opaque to the user.

I think a savvy user definitely should understand the distinction between split initialization vs. default initialization + assignment and that they should understand that this is a case where the compiler would attempt to apply split initialization before falling back to default initialization + assignment. But I also think that in cases where a type has compatible = and init= defined for it (or doesn't have an init= at all), it's reasonable for a naive/lazy user to not particularly distinguish between default initialization + assignment vs. split initialization until they are feeling performance-minded or are writing their own performance-minded types. So I'm OK with Point's author defining the = above without init= so long as they're not worried about performance.

Or we could take a different stance and have the compiler create init= overloads for internal types whenever = overloads are provided for them, though that seems a little scary to me for some reason.

OK, so I am thinking there are 3 strategies that we have might consider, based upon the discussion:

  1. The user needs to provide init= for the above pattern to compile.
  2. Because arr = p can only function with = and not init= in the example, the compiler default-initializes upon the var arr: [0..1] real; and then uses the assignment overload for arr = p.
  3. Upon the arr = p line, the compiler should implement the split-initialization of arr with default-init and then assign, because a compatible init= was not available. (It looks for init= and falls back on looking for default-init and =).
  4. The compiler generates init= overloads to match = overloads when the init= are not already provided.

It sounds like there is not much support for 1.

I think 2 would not be a good strategy because part of the split-init design is that it is type-independent. Doing 2 would make the decision - to split-init or not - no longer type-independent because the compiler would have to consider whether the types had this relationship (= exists but not init=).

I think 3 is a good idea and fits well within the implementation as well as existing concepts.

I think 4 could be considered an extension of 3 but I'm not sure I can think of a case where the difference would actually matter, short of things like canResolveMethod(arr, "init=", mypoint) -- which wouldn't work now anyway for the things arrays can initialize from. (And even if we didn't generate init=, we could consider such queries to be equivalent to asking "Is copy initializable from?").

Option 2 is what I was expecting based on my intuition and some quick experimentation with other cases. For example, given a case like this:

record R {
  var x: int;
}

proc =(ref lhs: int, rhs: R) {
  writeln("In my assignment");
  lhs = rhs.x;
}

var x: int;
x = new R(42);
writeln(x);

since it compiled and ran, I assumed that it was doing default initialization of x and then assigning to it.

To me, options 2-4 seem roughly equivalent as long as they're applied consistently. The main difference that I imagine a user might see is the order in which deinitialization occurs, does that sound right? For example for this variation on the above code:

var x: int;
var y = 42;
x = new R(42);

I think case 2 would deinitialize y then x whereas cases 3 and 4 would deinitialize x then y?

Option 2 still seems most natural to me given what we've got today (in that I think we always either call the default initializer at the declaration point or the copy initializer at the assignment point, so options 3 and 4 would essentially add a third "we call the default initializer at the assignment point" case). But if I'm understanding your point about it being type-independent, I think you're saying that we make these decisions before we resolve types? If that's the case, it makes me think I misdiagnosed what was happening in my exploratory example above. What does the compiler do in that case?

But if I'm understanding your point about it being type-independent, I think you're saying that we make these decisions before we resolve types?

Yes, exactly, but I also think that the type-independent property is an important language design point.

If that's the case, it makes me think I misdiagnosed what was happening in my exploratory example above. What does the compiler do in that case?

In your example, the compiler is split-initializing x and in order to copy-initialize an int from an R it is default-initializing and then assigning. It used to do this for arrays (before the array init PR) and it still does it for tuples (and apparently integers). The decision to split-initialize happens before it figures out that init= is not available.

The main difference that I imagine a user might see is the order in which deinitialization occurs, does that sound right?

Yes, but more than that, the point at which the default initialization runs.

Consider this program that doesn't compile today on master:

record R {
  var x: int;
  proc init() {
    writeln("in default init");
  }
  proc init=(rhs: R) {
    writeln("in init=");
    this.x = rhs.x;
  }
  proc deinit() {
    writeln("in deinit");
  }
}

proc =(ref lhs: R, rhs: int) {
  writeln("in = overload");
  lhs.x = rhs;
}

proc main() {
  writeln("before declaration");
  var x: R;
  writeln("after declaration, before assignment");
  x = 1;
  writeln("after assignment");
}

If we do 2, then it would print out

before declaration
in default init
after declaration, before assignment
in = overload
after assignment 

If we do 3 or 4, it would print out

before declaration
after declaration, before assignment
in default init
in = overload
after assignment 

And yes, it's also possible that the deinitialization order would be different.

In your example, the compiler is split-initializing x and in order to copy-initialize an int from an R it is default-initializing and then assigning.

Oh, OK, I wasn't aware of that. Is that documented in the spec? (a quick check makes it look like it's not; and the existing text seems misleading in this regard, e.g. "These assignment statements and calls to functions with out intent are called applicable assignment statements. They perform initialization, not assignment, of that variable." seems like it requires qualification).

In that case, I think it would be obvious to do the same for arrays, which I think is your option 3?

Yes, but more than that, the point at which the default initialization runs.

Sure. I was imagining that most initializers don't have order-dependent side effects. But I guess I could've / should've assumed that most of them don't have order-dependent de-initializers either...

OK, I wasn't aware of that. Is that documented in the spec?

no; this issue seems to me to be raising an open design issue we havn't previously addressed

In that case, I think it would be obvious to do the same for arrays, which I think is your option 3?

Yes, exactly

The lazy part of me wants to say I can get behind option 3, but the more rigorous part of me feels nervous about it. In part, nervous that we've arrived at it in a roundabout way and have something implemented that isn't specified. But mostly because if the implementation didn't already take this approach for non-arrays and we were having the design discussion from scratch, I don't think I'd be a big fan of it.

Specifically, it seems a little weird to me to sometimes do the default initialization at the variable declaration point, to sometimes do it at the first assignment point, and to sometimes use copy initialization; whereas if we only either (a) default initialized at the declaration point or (b) copy-initialized at the assignment point, there are only two simpler (and I'd say, pretty natural) cases to think about.

I sort of buy the argument that it's nice to have the implementation be independent of the types, but don't find it way compelling. It seems to me that it's far more important to treat the types consistently than in a capability-blind manner, and that most users wouldn't be all that impacted by whether the compiler was inspecting the available interface of a type in making this decision. Or rather, it seems natural for the compiler to inspect the available interfaces in making this decision ("I want to call the copy initializer here, but there isn't one, so I need to do something else"). And if it resulted in simpler rules for the programmer, that seems more valuable than type-independent transformations.

To that end, option 4 seems slightly more attractive than 3 to me in that it essentially says we'll always use the copy initializer and that the compiler will provide one if the user hasn't. But can it really work? What if the assignment overload does something to free up something in a LHS record (say) before overwriting it from the RHS? The copy initializer version of that should presumably not do the freeing step. That seems hard for a compiler to get right in the general case.

Or is the counterargument that if the LHS is a record, the compiler will have already complained about there being a = overload without an init= overload, so we can only get into this case for non-record types which can't have those "need to free" concerns?

To that end, option 4 seems slightly more attractive than 3 to me in that it essentially says we'll always use the copy initializer and that the compiler will provide one if the user hasn't. But can it really work? What if the assignment overload does something to free up something in a LHS record (say) before overwriting it from the RHS? The copy initializer version of that should presumably not do the freeing step. That seems hard for a compiler to get right in the general case.

In Option 4, I think the compiler would just generate an init= like this:

Given

proc = (ref lhs: MyType, rhs: OtherType) { ... }

the compiler would generate this init= :

proc MyType.init=(rhs: OtherType) {
  this.init(); // i.e. default initialize me please

  this = rhs; // call the assignment overload
}

What if the assignment overload does something to free up something in a LHS record (say) before overwriting it from the RHS?

I think it would be fine - it would be just freeing whatever you get by default initializing.

Of course, the compiler would not be able to generate such an init= from an = for a type that is not default initializable. That would result in some sort of compilation error.

Or is the counterargument that if the LHS is a record, the compiler will have already complained about there being a = overload without an init= overload, so we can only get into this case for non-record types which can't have those "need to free" concerns?

AFAIK we are not generating errors for init= missing for = when the types are mixed. That checking is limited to the same-type case. IIRC we debated generating init= from = when only one was provided but decided on an error for now. The different-type case seems to make generating it more compelling and seems to me to be one where we'd have less of a good argument that it should always be an error to not provide both. But of course we could alternatively make it an error (...but that was pretty much Option 1).

Either way though for the near term (at least until arrays actually use init= instead of some other stuff that behaves similarly) it would be Option 3 for arrays (but we could choose to do Option 4 for user records now).

OK, that makes option 4 seem like a different implementation approach for option 3 than a distinct option to me (though, as you point out, it would permit canResolve on init= to pass) in that the synthesized init= would be less initializer-like and more init+assign oriented (though I understand we have no real alternative).

This conversation definitely makes me think that on master we should probably be more consistent in erroring for any =(lhs: RecordType, ...) overload that doesn't have a corresponding init=() case. Or go to the "compiler creates init= overloads using default-init and assignment when one isn't present" route for all cases. It essentially seems to come down to a tradeoff between "surprise that I had to write this obvious thing" (in cases that are) vs. "surprise that my performance is bad until I write an explicit init=" (in cases that aren't). I have a feeling that we should fork that to its own issue and get more opinions on it.

OK, that makes option 4 seem like a different implementation approach for option 3

I agree that these strategies are pretty closely related in terms of program behavior.

This conversation definitely makes me think that on master we should probably be more consistent in erroring... Or go to the "compiler creates init= overloads using default-init and assignment when one isn't present" route for all cases.

I have a feeling that we should fork that to its own issue and get more opinions on it.

I've created #15838 for that question specifically.

This comment https://github.com/chapel-lang/chapel/issues/16732#issuecomment-729721848 discusses how we might support this pattern efficiently by extending split-init to consider calling a user-defined cast instead of array.init=(Point).

Was this page helpful?
0 / 5 - 0 ratings