Chapel: multiple-defined module symbol has shadowing instead of error

Created on 6 Sep 2019  路  10Comments  路  Source: chapel-lang/chapel

Forked from https://github.com/chapel-lang/chapel/issues/11262#issuecomment-528612310

This code prints "42" regardless of ordering for the two use statements in block scope. I think it should be an error since N is multiple-defined.

// 4.chpl
module M {
  module N {
    var x = 100;
  }
}

module N {
  var x = 42;
}

{
  use N only; // It doesn't matter if first or second.
  use M;
  writeln(N.x); // Always prints 42.
}

chpl version 1.20.0 pre-release (8cb22586)

Note that in chpl version 1.19.0, this code always prints "100" instead. [TIO]

Edit: From Brad's explanation in the next post, this code is supposed to always print "100".

Language Design user issue

Most helpful comment

This example continues to haunt me... thanks for contributing it, Bryant. I continue to think it's reasonable to say "imports will handle this better", but it points to some problems with use statements that could potentially be improved upon as well (and I think, should).

First, it haunts me because, though you didn't say it, it points to a potential example of hijacking through submodules. Let's say that my code had started out as a program that used two library modules:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  } 
}

module Main {
  proc main() {
    use M;
    use N;
    foo();
    bar();
  }
}

but then later, the implementer of M adds a new sub-module N:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc foo() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  } 
}

module Main {
  proc main() {
    use M;
    use N;
    foo();  // this now calls the hijacked version
    bar();
  }
}

While I've rationalized above why use N chooses to prefer the submodule over the top-level one and why this is consistent with use being designed as a programming-in-the-small "bring everything in by default" concept, that doesn't make it any less surprising if your code changes out from under you.

This makes me wonder whether, more as Bryant's original intuition was saying, if a use could be resolved to multiple modules in the parent lexical scopes or through symbols made available by use, the programmer should get an "ambiguous modules, but I picked this one" warning or simply an "ambiguous modules" error, forcing them to resolve the conflict by filtering or renaming. This could still arguably be annoying (a change in a library could require a change in your code), but at least wouldn't lead to surprises. It has the downside of continuing to make module symbols more special than other types (which seemed to be a negative to Bryant in the original comment that spawned this issue?), but would treat them somewhat more like function resolution where multiple candidates would be considered if they were available. That said, I might say that resolving a use to an obviously local-to-scope symbol would not result in an ambiguity under assumptions that the module author would be aware of their own module's contents (or should be) and that it isn't a case that could result in hijacking.

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level, say if you wanted to disambiguate the use N to refer to the top-level module rather than the sub-module. The only way to ensure you are referring to the top-level module would be use M only [bar]; to avoid bringing M.N in, or to rename M.N via something like use M only N as MN;. That makes me wonder whether there ought to be some symbolic way to refer to the root / program scope in code.

I'm going to reopen this issue because of the unease it's causing me.

All 10 comments

The reason for this is due to the current interpretation of use which effectively puts the symbols made available by the use into a scope just outside of the current lexical scope and the module symbols themselves into a scope just outside of that.

Bringing over a comment from #11262:

The use M and use N only; statements drop their contents into the scope just outside of the use (putting M.N and [nothing] into the scope respectively); and they drop their own names into the scope just outside of that (putting top-level M and N there). Sketching that out:

use M;
use N only;
...N.x...
...N.fn()...

results in:

{ // scope making module symbols themselves available
  ...module M...
  ...module N...
  { // scope making module contents available
    ...module [M.]N...
    ...<nothing from N>...
    {  // scope in which the `use` statements appeared
      ...N.x...
      ...N.fn()...
    }
  }
}

So then when the compiler resolves N.whatever, it sees module M.N as being closer to scope than top-level N.

Interestingly, I tried dumping both sets of symbols into the same scope as part of #13930, but this resulted in problems when a module and its symbols shared the same name, which is what made me ask how important it was in #13925. It seemed that this was pretty important/standard practice in other languages which is why I reverted to the traditional interpretation above.

I think this code demonstrates non-intuitive behavior with how module symbols and module contents can confusingly shadow, but I attribute the behavior to use statements and I agree that import statements will intuitively work in almost all cases.

I haven't thought of a way to create a conflict with import+use statements using re-exports https://github.com/chapel-lang/chapel/issues/13979 yet..

It's interesting to me to note (if I'm not making any mistakes) that if (a) one were to always utilize use statements in the defensive / "import nothing" mode (i.e., use M only;) and (b) we required sub-modules to be used before they could be referenced like top level modules (i.e., issue #13536) then this program would give the behaviors you want (which arguably is similar to say that import statements will be more intuitive, but also maybe suggests an obvious answer to #13536... and arguably is similar to the one that @mppf was suggesting when both modules were named M.

Specifically:

  use M only;
  use N only;  // can only refer to top-level N due to `use M only;`
  writeln(N.x);  // can only refer to top-level N.x or 42
  use M only;
  use M.N only;  // obviously refers to M.N
  writeln(N.x);  // can only refer to M.N.x or 100 because we haven't `use-d` submodule `N`
  use M only;
  use N only;
  use M.N only;  // error: symbol `N` is multiply defined                                                             
  writeln(N.x);

That takes me more off the fence on the "must submodules be used" issue than I had been before (in favor of requiring them to be used). Now I'm wondering how hard it would be to extend #13930 to apply to submodules as well (I'm guessing "not very", but it's late in the week...).

This example continues to haunt me... thanks for contributing it, Bryant. I continue to think it's reasonable to say "imports will handle this better", but it points to some problems with use statements that could potentially be improved upon as well (and I think, should).

First, it haunts me because, though you didn't say it, it points to a potential example of hijacking through submodules. Let's say that my code had started out as a program that used two library modules:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  } 
}

module Main {
  proc main() {
    use M;
    use N;
    foo();
    bar();
  }
}

but then later, the implementer of M adds a new sub-module N:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc foo() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc foo() {
    writeln("Computing foo()");
  } 
}

module Main {
  proc main() {
    use M;
    use N;
    foo();  // this now calls the hijacked version
    bar();
  }
}

While I've rationalized above why use N chooses to prefer the submodule over the top-level one and why this is consistent with use being designed as a programming-in-the-small "bring everything in by default" concept, that doesn't make it any less surprising if your code changes out from under you.

This makes me wonder whether, more as Bryant's original intuition was saying, if a use could be resolved to multiple modules in the parent lexical scopes or through symbols made available by use, the programmer should get an "ambiguous modules, but I picked this one" warning or simply an "ambiguous modules" error, forcing them to resolve the conflict by filtering or renaming. This could still arguably be annoying (a change in a library could require a change in your code), but at least wouldn't lead to surprises. It has the downside of continuing to make module symbols more special than other types (which seemed to be a negative to Bryant in the original comment that spawned this issue?), but would treat them somewhat more like function resolution where multiple candidates would be considered if they were available. That said, I might say that resolving a use to an obviously local-to-scope symbol would not result in an ambiguity under assumptions that the module author would be aware of their own module's contents (or should be) and that it isn't a case that could result in hijacking.

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level, say if you wanted to disambiguate the use N to refer to the top-level module rather than the sub-module. The only way to ensure you are referring to the top-level module would be use M only [bar]; to avoid bringing M.N in, or to rename M.N via something like use M only N as MN;. That makes me wonder whether there ought to be some symbolic way to refer to the root / program scope in code.

I'm going to reopen this issue because of the unease it's causing me.

if a use could be resolved to multiple modules in the parent lexical scopes or through symbols made available by use, the programmer should get an "ambiguous modules, but I picked this one" warning or simply an "ambiguous modules" error,

馃憤

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level,

I think use /N or use ::N or use ^N or use _.N are all brain-stormy ways of writing that but that probably deserves its own issue.

(Edited for outdated comment)

My initial thought was to use the file name as the module prefix to reference the top level N, given that we insert that as a wrapper when there is other top level code. This doesn't work when only modules are defined at the top level of the file, but does suffice when there is a non-module symbol at the top level. E.g. this version of "foo.chpl" compiles:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc baz() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc baz() {
    writeln("Computing baz()");
  } 
}

module Main {
  proc main() {
    use M;
    use foo.N; // prefix with file name
    baz();  // this still calls the original
    bar();
    whee();
  }
}

proc whee() {
  writeln("in whee");
}

But this does not:

module M {
  proc bar() {
    writeln("Computing bar()");
  }
  module N {
    proc baz() {
      writeln("Hijacked!");
    }
  }
}

module N {
  proc baz() {
    writeln("Computing baz()");
  } 
}

module Main {
  proc main() {
    use M;
    use foo.N; // prefix with filename
    baz();
    bar();
    whee();
  }
}

Maybe we should consider always allowing reference to the file name for disambiguation in this way?

I agree with the ambiguous module warning idea

simply an "ambiguous modules" error, forcing them to resolve the conflict by filtering or renaming.

:+1:

This could still arguably be annoying (a change in a library could require a change in your code), but at least wouldn't lead to surprises.

I'm okay with this. It means that a dependency has changed in a meaningful way that would potentially break your code.[1] (I assume it's meaningful since the author made or left the inner module public.)

It has the downside of continuing to make module symbols more special than other types (which seemed to be a negative to Bryant in the original comment that spawned this issue?), but would treat them somewhat more like function resolution where multiple candidates would be considered if they were available.

My initial comments were intended as counterargument to the claim that symbols in Chapel are treated as equally as possible, since I don't think that's reasonable given that each symbol kind has a different design goal (functions can overload, variables can scope-shadow, modules shouldn't scope-shadow[1]).

That said, I might say that resolving a use to an obviously local-to-scope symbol would not result in an ambiguity under assumptions that the module author would be aware of their own module's contents (or should be) and that it isn't a case that could result in hijacking.

Probably true, though at least to me, it's not apparent at all why "100" is printed even after learning the scoping rules. I'd be curious what most programmers would blind guess.

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level,

I think use /N or use ::N or use ^N or use _.N are all brain-stormy ways of writing that but that probably deserves its own issue.

:+1: :+1: :+1: https://github.com/chapel-lang/chapel/issues/13915

[1] Once there's a way to specify a precise module you want to use, these conflicts can then only occur with the generic use M/import M form. it might even make sense to get rid of this generic form altogether (and reuse the syntax for one of the specific forms) if it becomes easy enough and idiomatic to precisely specify e.g., use /N for top-level.

Also interesting to me is that there isn't any way to say use rootScope.N to qualify a use as being intended as top-level,

I think use /N or use ::N or use ^N or use _.N are all brain-stormy ways of writing that but that probably deserves its own issue.

Another idea that @mppf mentioned on the phone last week (and credited to @BryantLam, though I don't think it's on this issue, and I may not have seen it (yet) in its original context) would be to always require fully-qualified module references in use statements. Thus, given:

module M {
  module N {
  }
}

module N {
}

The code:

  use M;
  use N;

would never get us to module M.N, but only to top-level modules M and N. To get to module M.N, I would have to do:

  use M [only [N]];
  use M.N;

or perhaps just:

  use M.N;

This would obviate the need for a "root scope" qualifier for top-level modules at the cost of always needing to type more to use a submodule.

I find the benefit here attractive (not that I'm opposed to introducing a root scope qualifier, but am not immediately drawn to any of the current proposals), but don't know whether the downsides would be considered attractive or unattractive to others (nor whether they're familiar / burdensome for those coming from other languages).

Another idea that @mppf mentioned on the phone last week (and credited to @BryantLam, though I don't think it's on this issue, and I may not have seen it (yet) in its original context)

It likely came from one of the many Rust links between @mppf and myself starting from https://github.com/chapel-lang/chapel/issues/12923#issuecomment-494225760.

... would be to always require fully-qualified module references in use statements.

This is effectively how Rust 1.0 behaves. They've since introduced an explicit way to choose a module from some point in the hierarchy, which #13915 attempts to address.

This is fine, but likely not what we should do. I would caution that absolute paths would break a lot of code in submodules, as submodules would no longer be able to directly use Child anymore and instead would have to start at the top-level with use Project.Parent.Child.

As a consequence, this behavior also makes refactoring efforts via moving directories around also more painful, as not only will all use targets break that point to the moved modules (this is normal for a refactoring effort) but the modules within that directory will subsequently break as per the note above. All use Project.Parent.Child inside the Parent module trying to get at their Child would also change to use Project.SomeNewMiddleTier.Parent.Child despite the Parent and Child not moving relative to each other.

... Unless you meant only top-level use statements would be absolute. In which case, there is precedence for how confusing this can be in Rust. These articles talk about it:

Was this page helpful?
0 / 5 - 0 ratings