Chapel: lifetime checking: how to write different kinds of class pointers

Created on 3 Feb 2018  路  43Comments  路  Source: chapel-lang/chapel

In a borrow checked / lifetime checked world, there are two key pointer types:

  • "owned" class pointer (and reference counted or not?)
  • "borrowed" class pointer

Additionally it might be necessary or helpful to create a pointer type that always has infinite lifetime:

  • "unchecked"/"raw" class pointer

It's important for usability of the lifetime checker that 'borrow' vs 'owned' be fairly clear. Additionally, common patterns, such as creating a function that returns a new class instance will require the return type be an 'owned' class instance rather than a borrow - otherwise the borrow checker will infer the lifetime to be based upon the arguments.

For example, this function in MasonUpdate.chpl would need to be updated to return an 'owned' Toml class instance:

proc createDepTrees(depTree: Toml, deps: [?d] Toml, name: string) : Toml {
}

since otherwise, the lifetime checker will assume that the lifetime of the result is the lifetime of depTree or deps since depTree or deps[i] could be returned.

Chosen proposal:

  • A borrow is written just with the normal class type e.g. :MyClass
  • owned MyClass (like atomic), shared MyClass, unmanaged MyClass, borrowed MyClass

Pros of the proposal:

  • +: Seems to be more fundamentally included in the language
  • -: How would we handle Shared or other siblings? Probably don't want keywords for each.

Main alternative:

  • Owned(MyClass)

    • +: We have this now

    • +: supports other things, Shared, OwnedNullable, etc...

Language Design

All 43 comments

Arguably Owned and Shared already allow us to annotate ownership, but there might be some reason why we need another strategy.

Previous lifetime checking systems suggest that borrows are much more common than 'owned' - and so probably the class type name should refer to a borrow.

E.g. the example might be updated to:

proc createDepTrees(depTree: Toml, deps: [?d] Toml, name: string) : Owned(Toml) {
}

I'm not sure to what extent we need an "unchecked" or "raw" pointer type. One direction we could go is to express such things as part of the low-level C interoperability portion of the language. E.g. c_instancePointer or something along those lines. But, that assumes that things like c_ptr will not be lifetime-checked, which isn't necessarily the right direction.

Rust's support for raw pointers is cordoned off pretty heavily. Any meaningful interaction with them needs to be wrapped in an unsafe block. However since we aren't going to be doing a ton of low-level pointer magic in Chapel, I'm not sure we really need raw in the first place.

We could just see how far we can get with Owned and Shared, then determine if users really need raw from there.

I am pretty certain that the default mode for a class pointer, i.e. when no annotations, should be "borrowed". Ditto for a reference to a record.

How to annotate the others?

We can follow the lead of atomic and sync keywords. For example:

var r: raw C = new C();
proc new_C(): managed C { return new C(); }
record R { var borrowedC : C; var ownedC : owned C; }

Then the question is what keywords we want for the memory policy.

  • borrow ? or do not provide the keyword, relying on the default instead.
  • managed + managed(shared) ?
  • shared and owned ?
  • unchecked

Alternatively we can use "parameterized type" syntax like raw(C). Although given that the compiler needs to know it, atomic-like syntax makes more sense to me. We welcome more keywords in Chapel, right? ;)

@mppf's branch seems to be working for those parameterized types though, like Owned() and Shared().

In my mental model I think of ref for borrows, Owned() for ownership, and Shared() for distributed access. (This might be wrong though, so please correct me if that's not the case.)

The only keyword I see us needing is raw or unsafe to mark pointers as such, and that's only if we really think it's needed -- I might lean more toward making it atomic-like in that case.

In my mental model I think of ref for borrows

This does not make sense for class variables/formals/fields. It adds indirection in the common case of passing around a class value.

This is probably adequate for records.

My expectation is this:

  • MyClass indicates a borrow
  • ref x indicates a checked "borrowed" reference
  • Owned(MyClass) and Shared(MyClass) indicate a "managed" MyClass
  • I don't have a particular strategy in mind for indicating a "raw" pointer

(Note that the rules for "borrowing" a class instance and a reference are different, but the lifetime checker does apply "by default" in both cases).

We certainly could change Owned(MyClass) into owned MyClass but I don't see why we need to do so.

Inside the implementation of Owned and Shared, we need a way to indicate an "owned" pointer. I think a pragma is sufficient for that case in the near term.

var ownedC : owned C;

@vasslitvinov - can you say more about why this might be superior to the already-existing

var ownedC : Owned(C);

?

vasslitvinov - can you say more about why this might be superior to the already-existing

The two forms var ownedC : owned C; and var ownedC : Owned(C); can be seen as different syntax for the same thing. To me, however, owned C hints at "this is a first-class construct, regulated by the language". Owned(C) is more like "here is a library data structure, feel free to use it."

Now that Michael articulated it ("My expectation" above), we can leave the syntax that way for borrow vs. Owned/Shared, until we have a good argument to change to something else. If so, Unsafe(MyClass) could be the syntax for a "raw"/unchecked pointer.

ref x indicates a checked "borrowed" reference

This is different from in x when x has a record type, right?
Cf. when x has a class type, ref x and in x are both borrows?

Or maybe ref x means "I am borrowing the L-value being referred to", whether the type of x is a record, an integer or a class? For example, if x has a class type, in x and ref x borrow different entities?

What does it mean when a function returns a record by ref ?

ref x indicates a checked "borrowed" reference

This is different from in x when x has a record type, right?
Cf. when x has a class type, ref x and in x are both borrows?

yes

Or maybe ref x means "I am borrowing the L-value being referred to", whether the type of x is a record, an integer or a class? For example, if x has a class type, in x and ref x borrow different entities?

yes. I prefer to reserve the term "borrow" for the class situation, and just call the business with ref "lifetime checking". (I brought this up in another comment).

What does it mean when a function returns a record by ref ?

Don't know what you're asking. It'd be an error if the 'ref' came from another local variable though.

Just to drop in briefly, my intuition matches what @vasslitvinov said above: That to the extent that "owned" types are increasingly considered a part of the language / something that the compiler will need to reason about, I'd be inclined to support the owned C syntax due to its orthogonality with other "built-in" type variants like atomic, sync, arrays, etc. Offhand, I can't think of a similarly core generic type in Chapel that uses Modifier(coreType) as its syntactic pattern (am I forgetting one?) and agree that it scans more like an arbitrary user-level / library type than something "special / built-in". Maybe put another way, I think I'd want my syntax highlighter to do something with owned types, but think it would be weird to have it color Owned in Owned(C).

All that said, while we're still in prototyping mode, I'd be fine with dragging our feet on this and using the Owned() syntax for the time being to avoid thrashing the language while such issues are getting settled out (i.e., I expect that at some point we'll need to have a CHIP proposing the changes and get buy-in/blessings from the broader developer and user communities. So putting off implementing the keyword approach, if we agree on it, until after getting that broader buy-in seems smart to me).

I don't have strong opinions on other aspects of this conversation that seem to be converging (though also don't feel I have an expert's level of depth of the issue yet either) and am happy to go with the consensus. The ref-related conversation points are the main ones that I'm not sure I'm getting.

@bradcray - I'm curious what your answer to this question about owned C vs Owned(C) is:

How would we handle Shared or other siblings? Probably don't want keywords for each.

I.e. are we going to have owned C and shared C? Or owned C and Shared(C)?

What if we add others? For example we've had ideas floating around for OwnedNullable and OwnedUnsafe and there might even be some annotator for a "raw" (unchecked) pointer. Would choosing owned C set us up for needing to use keywords for the others?

I just found a case that made me wonder if it's going to be confusing:

https://github.com/chapel-lang/chapel/blob/master/test/release/examples/primers/locales.chpl#L236

var head = new Node(0);
var current = head

If new Node returns an Owned(Node), then the normal expectation is that var current = head is a copy-initialization, which for Owned will transfer ownership. We could consider making it a borrow instead (but that might be worse).

In terms of the type system, is Owned more like ref, or sync, where it applies to a particular variable but not a copy of it, or is it more like tuple-ness, which stays with copies of it?

Anyway, under the current rules I'm working towards, the fix is simple:

var head = new Node(0);
var current = head.borrow(); // could be written many ways

I think a key question here is whether or not instantiating a generic function, say:

proc f(arg) {
  ... use arg ...
}

when called with an Owned(SomeClass), does the instantiation with arg type Owned(SomeClass) or with arg type SomeClass.

I think it needs to instantiate as Owned(SomeClass) but I'm not sure I have a good explanation as to why.

Regarding this example that Michael referenced:

https://github.com/chapel-lang/chapel/blob/master/test/release/examples/primers/locales.chpl#L236

This is an interesting example because we also need to specify who owns the additional Nodes. I suspect that Node.next has to be an Owned, because otherwise each Node created in the loop over locales/numLocales will die within its iteration, leaving behind a dangling pointer. If Node.next is an Owned, we need to remove the last loop that deletes Nodes.

Re the above example of instantiating a generic with Owned(SomeClass):

I think the general rule is that when instantiating a function's generic formal argument, the instantiating type is that of the actual argument. For example, when the above f(arg) is invoked on a sync int or an atomic int, the instantiating type is a sync or atomic, not an int.

I suspect we have exceptions from this rule, though. In our case, does it make sense for the formal argument to be an Owned? What would be the default intent of the formal and would it lead to an ownership transfer?

I suspect we have exceptions from this rule, though. In our case, does it make sense for the formal argument to be an Owned? What would be the default intent of the formal and would it lead to an ownership transfer?

Unless we do something to change it, Owned is a record, so default intent is const ref which does not cause ownership transfer on its own and moreover disallows ownership transfer within the body. I think this is a reasonable default.

... which makes me raise this question: the types Owned(MyClass), Shared(MyClass) etc - should they class types or record types, in particular as far as argument/forall intents are concerned?

If we make those be first-class types i.e. defined by the language, I would prefer them to be class types with some kind of tag. On the grounds that their being records is an implementation strategy. It's like saying that sync int or string is a record.

should they class types or record types, in particular as far as argument/forall intents are concerned?

I'm not sure what this means. Could you describe an example that behaves differently depending on what we choose?

But, if you're saying that the default intent for Owned(MyClass) should be in, I don't understand how to implement that without having the default also imply ownership transfer. (Or, put another way, if in no longer implies ownership transfer, what woudl?) It's true that we call the default intent for classes in but I think that's reflecting a degenerate case and not really that meaningful.

Good point. I feel that we want the default argument and forall intent to mean "borrow", whether the formal's type is specified explicitly or is due to generic instantiation as in the above example of foo(arg).

const ref gives us that, so sounds like it is what we want.

We can still consider Owned/... be "class types". All we need is to define them having the default intent mean const ref. For a Borrow class type, the default intent would mean const in.

By contrast, if we present Owned/Shared as special wrapper records, we do not need to change any of intent specification.

If we decide to use owned MyClass instead of Owned(MyClass), we also have to decide how to write converting a pointer from a raw pointer into an "Owned". Currently that happens with e.g. var x = new Owned(someRawPointer) but that syntax is no longer "obvious" if we're generally trying to move away from Owned.

I'm pretty sure we want the default for an Owned variable passed to a generic (say) with default intent to be "borrow" in some form. const ref with an Owned record does that. But, doesn't it add indirection that wasn't there before? e.g. adding communication in some on statement cases? Can we optimize away the indirection, so that after resolution / call destructors a const ref Owned(MyClass) argument is replaced by a in MyClass argument of the borrow? How can we communicate to users that such a transformation is legal?

I think that if our lifetime checker is sufficiently advanced, it could catch modification of what an Owned is pointing to while it is "borrowed" in this manner, but it's not an error we can catch in all cases now. Thus, the transformation described above might change program behaviors, so might need to be communicated to users if we did it.

@psahabu - could you help us with a Rust question? In Rust, if a generic function is instantiated, if it's passed a Box, a borrow, or a raw pointer; is the type of the argument inside the function different in each case? Or, is the Box/borrow/raw treated more like an argument intent?

For our purposes, I've been thinking about whether a "raw" pointer needs to be a different type from a borrow or if it just needs to be marked for the lifetime checker.

@mppf - I don't know the answer for sure, but based on this section about generic functions, my suspicion is that the argument type is different in each case. However the runtime may do coercions that I am not aware of, so I would have to do more investigating to be sure.

Does Rust treat Box more like an intent or more like a type?

It treats Box as a type. Additionally, it treats references as a separate type (which Chapel does not).

Thus, arguably Rust has the same issue with double-indirection for generic arguments of Owned type passed as usual (my limited experience looking at Rust code causes me to characterize & at the call site (i.e. ref) as a usual Rust, rather than copy-in, but I might be wrong about that. Whether one is more common than the other or not, I'm pretty sure & at the call site is pretty common.).

trait TypeInfo {
    fn type_of(&self) -> &'static str;
}

impl TypeInfo for i32 {
    fn type_of(&self) -> &'static str {
        "i32"
    }
}

struct Point {
  x: f64,
  y: f64,
}

impl TypeInfo for Point {
    fn type_of(&self) -> &'static str {
        "Point"
    }
}

impl<'a> TypeInfo for &'a Point {
    fn type_of(&self) -> &'static str {
        "&Point"
    }
}


impl TypeInfo for Box<Point> {
    fn type_of(&self) -> &'static str {
        "Box<Point>"
    }
}

impl<'a> TypeInfo for &'a Box<Point> {
    fn type_of(&self) -> &'static str {
        "&Box<Point>"
    }
}


fn generic_default<T:TypeInfo>(x: T) -> T {
  println!("generic_default {}", x.type_of());
  return x;
}
fn generic_ref<T:TypeInfo>(x: &T) -> &T {
  println!("generic_ref & {}", x.type_of());
  return x;
}


fn main() {
  println!("Point argument");
  let p=Point { x: 0.0, y: 0.0 };
  generic_default(p);
  // output: generic_default Point

  println!("&Point argument");
  let pr=Point { x: 0.0, y: 0.0 };
  generic_default(&pr);
  // output: generic_default &Point

  // Note that Rust requires arguments passed
  // by ref to be marked at the call site:
  /*
  println!("Point argument");
  let p=Point { x: 0.0, y: 0.0 };
  generic_ref(p); // compilation error
  */

  println!("&Point argument");
  let prr=Point { x: 0.0, y: 0.0 };
  generic_ref(&prr);
  // output: generic_ref & Point


  println!("Box<Point> argument");
  let bp=Box::new(Point { x: 0.0, y: 0.0 });
  generic_default(bp);
  // output: generic_default Box<Point>

  println!("&Box<Point> argument");
  let bpr=Box::new(Point { x: 0.0, y: 0.0 });
  generic_default(&bpr);
  // output: generic_default &Box<Point>

  println!("&Box<Point> argument");
  let bprr=Box::new(Point { x: 0.0, y: 0.0 });
  generic_ref(&bprr);
  // output: generic_ref & Box<Point>

}

I'm pretty sure we want the default for an Owned variable passed to a generic (say) with default intent to be "borrow" in some form. const ref with an Owned record does that. But, doesn't it add indirection that wasn't there before? e.g. adding communication in some on statement cases? Can we optimize away the indirection, so that after resolution / call destructors a const ref Owned(MyClass) argument is replaced by a in MyClass argument of the borrow? How can we communicate to users that such a transformation is legal?

I've created a new issue to discuss (part of) this idea: #8618

I missed this question @mppf asked of me some time ago, sorry:

How would we handle Shared or other siblings? Probably don't want keywords for each.

I.e. are we going to have owned C and shared C? Or owned C and Shared(C)?

If we think that shared is equally as first-class as owned, then yeah, I'd add a keyword for it as well.

What if we add others? For example we've had ideas floating around for OwnedNullable and OwnedUnsafe and there might even be some annotator for a "raw" (unchecked) pointer. Would choosing owned C set us up for needing to use keywords for the others?

I was thinking about this in last week's meeting and was wondering if these could be optional arguments to / modifiers on owned itself. e.g., owned(nullable=true, safe=false)

Can we talk more about "unchecked"/"raw" class pointer ?

In the context of having things like owned C, what would be the type modifier for a raw pointer?

I've seen these names tossed around:

  • unsafe
  • raw
  • unmanaged
  • unchecked

Additionally we might consider:

  • lowlevel
  • manual
  • explicit
  • basic
  • weak
  • demanding
  • troublesome (haha, ok)

Are any of these clear favorites among those following this issue? Any other proposals?

I like raw and unsafe. weak is another one I'd consider.

In order of my preference:

  • unchecked - this is my favorite, feels like an adequate description of the situation, although such a pointer is also "unmanaged"
  • unmanaged - opposite of the above (the pointer is also "unchecked"); also feels a bit of a mouthful, maybe just because I am not used to it
  • unsafe - I like it less because the pointer in question is not necessarily unsafe (it may be unsafe, the compiler does not help determine one way or the other)
  • raw, weak - these are not intuitive, the user would need to know apriori what they mean

unmanaged and unchecked are too much of a mouthful to me. I prefer unsafe the most of those options.

Of the additional list, only explicit strikes me as more appealing, but I feel we already have too many ex* words like extern and export

On the topic of "what to name raw pointers?":

  • I feel like unsafe is misleading: These aren't inherently unsafe, they just require more care.
  • I'm attracted to raw for its conciseness and accuracy; I think the main downside is that it seems more likely to be used as a user identifier (e.g., a bool field in a class or record) than other choices. I wonder if searching a large standard library like STL, Boost, or Python would find any identifiers named raw
  • I find unmanaged attractive in its accuracy. I originally found it a little too long / too much of a mouthful, but we could also consider this a plus: The hope is that, in the common case future, people won't be using these much, and there should be a disincentive for doing so. So maybe we force you to type a lot for that reason?
  • Other brainstorms: deletable, deleteme
  • Most of the others didn't appeal to me all that much...

I wonder if searching a large standard library like STL, Boost, or Python would find any identifiers named raw

I did this (obviously imperfect, but quick) experiment with Boost 1.66.0:

grep -Re "[^_[:alnum:]]raw[^_[:alnum:]]" . | grep hpp | grep -v '//' | grep -v '<tr>' | grep -v "This value" | grep -v searchindex | grep -v html | grep -v "This" | grep -v "as a" | grep -v "raw socket" | grep -v "to send" | grep -v "Return a" | grep -v @ | grep -v "Examples of" | grep -v "Right now" | grep -v "raw status" | grep -v "raw deflate" | grep -v "raw integ" | grep -v "raw rep"| grep -v "raw infl" | grep -v "validated form" | grep -v "raw memory" | grep -v "raw value" | grep -v "raw data" | grep -v "raw-oriented" | wc

-> 64 lines of .hpp source code mention 'raw'. Out of 338775 in .hpp files. So the answer is "yes it occurs", but it's not particularly common. Many of the use cases are referring to the same thing we'd use it for (raw vs unique_ptr).

FWIW I think unmanaged is my current favorite, but I'm undecided how annoying the length would be...

Annoying is good, right? I agree with Brad above about that.

unmanaged is not used in Boost code (only in comments referring to "unmanaged code")

I think I like unmanaged best too. If I were to come up with a category for owned and shared it would be "managed classes", so unmanaged seems like the right term in opposition to that.

Originally I liked the brevity of raw, but Brad's point made me remember that I'm all for making "bad style" more inconvenient. Hence I'm on board with unmanaged.

No objection here!

OK, I'll adjust the prototype to use unmanaged in that case.

Would it be better to use keywords own and share and borrow rather than owned shared and borrowed?

var x: own MyClass = new own MyClass();
var y: share MyClass = new share MyClass();
var z: borrow MyClass = y.borrow();

Or maybe with an s after each; owns and shares and borrows:

var x: owns MyClass = new owns MyClass();
var y: shares MyClass = new shares MyClass();
var z: borrows MyClass = y.borrow();

new shares MyClass reads really weirdly.

If so, would we keep unmanaged? (I think so, because unmanage / unmanages really doesn't work).

I tend to like the -ed forms best. They seem to read more like English.

Based on discussion here and over near https://github.com/chapel-lang/chapel/issues/8938#issuecomment-382380513 I plan to keep the -ed forms, but I'm glad we thought about it.

Was this page helpful?
0 / 5 - 0 ratings