Chapel: Should modules be able to define symbols that share their name?

Created on 30 Aug 2019 · 11Comments · Source: chapel-lang/chapel

This issue asks whether it should be illegal to declare a module-level symbol with the same identifier as the module itself, such as:

module M {
  var M...;  // or proc M() ... or any other declaration of M
  var N...;
}

Doing so introduces interesting implementation challenges as I work on the change proposed in issue #13523 to not permit references to top-level modules unless they have been used. These challenges may all be surmountable with effort... but that effort is definitely slowing me down and making me wonder whether this pattern is sufficiently valuable to be worth the effort (whereas creating an error for it would be much simpler and let me rewrite code that relies on this pattern rather than trying to keep it working).

The pattern is also a bit weird in that, for example, if the only way for me to start referring to top-level module M as proposed in #13523 is via use M; then we have to decide whether an unqualified reference to M refers to the module or the variable (or procedure or whatever). My guess there is that it should be the module. That said, it also seems a little asymmetrical that I can refer to variable N directly but not its sibling variable M without saying M.M.

All that said, I'm completely willing to keep this behavior working if people think it's valuable (or simply weird / arbitrary to disallow it).

Language Design

Source

bradcray

All 11 comments

@mppf, @lydia-duncan, @ben-albrecht, @BryantLam: Any thoughts here? (singling you out based on your participation on issue #13523).

bradcray on 30 Aug 2019

Here's another interesting case (abstracted from UnitTest.chpl):

module M {
  use Sub;

  ...Sub...  // Check this out

  module Sub {
    class Sub {
    }
  }
}

Should the line with the comment refer to module Sub (which was used in that scope, so available) or class Sub (which was made available via the use). Or should it just be illegal (too confusing to be worth putting effort into supporting).

(Note that this is also a potential motivating case for supporting a public use of a private module as discussed on issue #13528... I.e., the author probably didn't want the user to be able to refer to module Sub?).

bradcray on 30 Aug 2019

The pattern is also a bit weird in that, for example, if the only way for me to start referring to top-level module M as proposed in #13523 is via use M; then we have to decide whether an unqualified reference to M refers to the module or the variable (or procedure or whatever). My guess there is that it should be the module. That said, it also seems a little asymmetrical that I can refer to variable N directly but not its sibling variable M without saying M.M.

IIRC, there were some other pitfalls I encountered using this pattern, which has led me to stop using it. I'll try to reproduce those and mention them here when time permits.

All that said, I'm completely willing to keep this behavior working if people think it's valuable (or simply weird / arbitrary to disallow it).

I have personally found this behavior valuable in other languages, but tend to avoid using it in Chapel due to the pitfalls it introduces. If we don't expect the pitfalls around using shared module/symbol names to improve, then it would make sense to forbid it altogether.

Additionally, it may make sense to be more restrictive now, given the aim for Chapel 2.0 release candidate, and maybe relax this rule later if we can make it safer to use.

Relatedly, I haven't found a great naming scheme to deal with this. It is usually suggested to make the module name plural, but that isn't always applicable. For example, if the name has an odd plural form (e.g. Octopus -> Octopi, or DNA -> DNAs), or if the module only contains a single symbol (like a class definition) such that the plural name doesn't make sense.

ben-albrecht on 30 Aug 2019

👍1

Should the line with the comment refer to module Sub (which was used in that scope, so available) or class Sub (which was made available via the use).

Not sure if this is the right interpretation, but if I think about use Sub as from Sub import *, then I'd expect Sub to refer to the class Sub. If use Sub only had been used, then I would expect Sub to refer to the module Sub and Sub.Sub to refer to the class.

ben-albrecht on 30 Aug 2019

👍1

Issue #11262 was an issue I meant to link to above that relates to this question.

bradcray on 30 Aug 2019

Thanks for the notes @ben-albrecht: Knowing that you use and like this pattern in other languages is good incentive to push through the challenges. If others have contrary views, I'm still open to those, but this is good incentive to not throw in the towel on these cases tonight.

Your description of what you'd expect to happen in the various use Sub cases is also helpful (and I assume based on Python). Just to make sure I'm not missing something, this means that if one had the Sub-within-Sub pattern in Python and used from Sub import *, they would not have a way to express a fully qualified Sub.Sub, is that right?

bradcray on 30 Aug 2019

Just to make sure I'm not missing something, this means that if one had the Sub-within-Sub pattern in Python and used from Sub import *, they would not have a way to express a fully qualified Sub.Sub, is that right?

That's correct.

ben-albrecht on 30 Aug 2019

The pattern is also a bit weird in that, for example, if the only way for me to start referring to top-level module M as proposed in #13523 is via use M; then we have to decide whether an unqualified reference to M refers to the module or the variable (or procedure or whatever). My guess there is that it should be the module. That said, it also seems a little asymmetrical that I can refer to variable N directly but not its sibling variable M without saying M.M.

The symmetry argument makes the most sense to me because it is difficult to use the M module symbol in any meaningful context without the dot operator as a disambiguation, ... though now that I say that, I can see an issue with nesting since the dot operator is overloaded for both path resolution and referring to fields. As a user, I would expect a naked Sub in all contexts except use or import statements to be class Sub.

Here's an example from Rust. It does the "right thing" because it's pretty difficult to use the module symbol in an ambiguous way; field access is via dot operator but module scope traversal is done with the :: operator.

pub mod M {
    pub mod Sub {
        pub static Sub: i32 = 42;
    }
}

fn main() {
    use M::Sub::*; // all items in `mod Sub`, but not `mod Sub` itself
    use M::Sub;    // only `mod Sub`, but none of its items

    let x = Sub;
    let y = Sub::Sub;
    println!("{} {}", x, y); // compiles and prints "42 42"
}

In most expressions, I think Rust and Chapel have enough similarities where the symbol is disambiguated as not-a-module.

IMO, I hope to never have to use in the future because unqualified accesses and qualified accesses from the same statement make it difficult to read and/or trace for reasons like this.

BryantLam on 30 Aug 2019

👍1

I've now got this working on my branch (at least as well as it was before), so there's no need to not support this pattern assuming people agree with Ben's interpretations.

bradcray on 30 Aug 2019

module M {
  var M...;  // or proc M() ... or any other declaration of M
  var N...;
}

We talked about the contents of module M having an implicit import M to make M available as a regular symbol inside the module. I think we can just imagine that this is in an outer scope compared to the body of M:

module M {
  import M; // notional, not a real thing a user would type
  {
    var M:int;
   }
}

This interpretation would allow the pattern above and also has clear meaning.

To generalize this rule, we might expect that local symbols can override module symbols always (i.e. use and import are always considered weaker bindings of the name). I suspect something like this is what is/was actually implemented.