Chapel: use Module only; behaves differently for default-included modules

Created on 6 Sep 2017  路  20Comments  路  Source: chapel-lang/chapel

Summary of Problem

When adding a module usage of the format use Module only; to a default-included module, such as Math.chpl or anything in modules/internal, the use-statement behaves like a normal use Module;.

The expected behavior for a use Module only; is to make the module available without exposing any symbols (similar to Python's import Module).

Steps to Reproduce

Source Code:

Below is a test that was compiled while adding use Reflection only; to different modules.

None of the examples below should compile, since canResolveMethod() should not be visible. Here is a list of what compiled and what didn't:

  • modules/internal/ChapelArray.chpl

    • compiles

  • modules/standard/Math.chpl

    • compiles

  • modules/packages/Random.chpl

    • does not compile

  • modules/packages/Sort.chpl

    • does not compile

// `use Reflection only;` added elsewhere
use Sort;
use Random;

writeln(canResolveMethod(1, 'foo'));

Associated Future Test(s):

This is difficult to capture as a test... Ideas welcome

Configuration Information

  • Output of chpl --version: chpl Version 1.16.0 pre-release (2e2dbd2)
Compiler Libraries / Modules Bug

Most helpful comment

I believe I have discovered the source of the problem, and as a consequence I think this only impacts routines in the default included modules. Vass pointed out that this would happen if we made every use of RangeChunk private, and I also noticed in PR #14145.

The problem is due to computeStandardModuleSet - this is a functionResolution optimization wherein modules that are used by ChapelStandard or any of its dependencies have their contents treated as though they are defined at the rootModule scope. When I first noted it in 2016, it was problematic as it caused modules like List and Sort to not make their symbols available through qualified access. Over time, it has become even more problematic - this optimization was not updated when use M only; or use M except *; was added, and was not updated for private use either. In fact, it was never set up to handle these strategies at all - these strategies depend on the scope hierarchy being maintained but moving all scopes from default included modules into a single scope destroys that hierarchy. The scope-dependent uses are left behind when the symbols are moved to the root module, and bringing them as well would make their symbols available more widely than is appropriate.

As a result, I think this "optimization" no longer has a place in the current compiler. Though it may cause a compiler performance slowdown (due to having to travel into scopes to find symbols that would have been "siblings" with the optimization), it is negatively impacting correctness and always has enabled symbols to be visible that should not have been. We won't be able to truly utilize private uses in the internal and default included standard modules with it.

All 20 comments

Note to self: check on this after we've finished with Cray/chapel-private#345, in case that was the source of the error

That does seem likely...

Might also be related to this

Or even possibly this - when I realized that was potentially an issue, I had thought that it wasn't causing problems, since dead code elimination should always occur after scope and function resolution, but it could very well be that I am wrong

I believe I have discovered the source of the problem, and as a consequence I think this only impacts routines in the default included modules. Vass pointed out that this would happen if we made every use of RangeChunk private, and I also noticed in PR #14145.

The problem is due to computeStandardModuleSet - this is a functionResolution optimization wherein modules that are used by ChapelStandard or any of its dependencies have their contents treated as though they are defined at the rootModule scope. When I first noted it in 2016, it was problematic as it caused modules like List and Sort to not make their symbols available through qualified access. Over time, it has become even more problematic - this optimization was not updated when use M only; or use M except *; was added, and was not updated for private use either. In fact, it was never set up to handle these strategies at all - these strategies depend on the scope hierarchy being maintained but moving all scopes from default included modules into a single scope destroys that hierarchy. The scope-dependent uses are left behind when the symbols are moved to the root module, and bringing them as well would make their symbols available more widely than is appropriate.

As a result, I think this "optimization" no longer has a place in the current compiler. Though it may cause a compiler performance slowdown (due to having to travel into scopes to find symbols that would have been "siblings" with the optimization), it is negatively impacting correctness and always has enabled symbols to be visible that should not have been. We won't be able to truly utilize private uses in the internal and default included standard modules with it.

Note that I did try removing it when I originally found it, but that it caused a 10% compilation slowdown. I think this will not be quite as bad with private uses (especially of ChapelStandard) and the resulting presence of more explicit uses when dependencies did occur. But I'm not certain

I agree that we need to focus on correctness over compiler performance (though both is obviously the best) and can consider a hit for the latter to get the former, particularly on the early side of a release cycle. When was the optimization introduced? (I'm having a hard time tracing the history in GitHub for some reason... maybe just because the file was so big in that timeframe?) Why does treating things as though they're in the root module scope make anything faster?

It looks like it was added in 2008

Why does treating things as though they're in the root module scope make anything faster?

I think it has to do with how symbols were found - without explicit uses present, we would have had to find some things by going down some deep chains. As an example, the LocaleModel has a dependency chain like this:

   LocaleModel
     LocaleModelHelpFlat
      LocaleModelHelpSetup
       ChapelLocale
       DefaultRectangular
        DSIUtil
        ChapelArray
         Sort                              <- first standard module used
          List
           ChapelLocks
           IO
            SysError
            FormattedIO
             Regexp

We'd do a lot of deep diving to get to certain symbols, and would traverse our uses before going up in scope. Which means to find something defined in ChapelTuple, say, we'd go through Sort's use of ChapelStandard instead of going up to our defining scope and finding sibling modules, making it take longer to reach something that would be found quickly if we explicitly used it (or if it was considered "in our scope" in the first place).

That suggests to me that our increased use of private use in standard and internal modules could mitigate these overheads by keeping use traversal shallower and more limited to things we use more directly? Is that right?

I think so! It might be offset by having more uses to go through at the earlier depths, but since it means we would find things at the first depth, maybe it will be an overall win?

It might be offset by having more uses to go through at the earlier depths

That seems to assume that all of the deep uses were necessary to the module searching for its functions, but I don't think that's typically the case. E.g., in your example above, if my program were to use LocaleModel and most of the uses within it were changed to private, I don't think it suggests my program would necessarily want or need to add a use for each of the other modules in the list (I agree I may have gotten a few things I wanted or needed for free by accident, but I suspect the common case in the old world was that each use resulted in a deep chain as you've shown here, whereas I think the common case now will be that most modules will use a smaller number of modules).

The following program:

for i in chunks(1..4, 2) do
  writeln(i);

compiles without any use statements, invoking chunks from RangeChunk.chpl:62

Indeed, the following program compiles...

I'm not sure what the "Indeed" is in respect to, but I'm fairly certain that that program compiles because of the non-private use of RangeChunk in DefaultSparse.chpl.

My above program compiles even when switching use RangeChunk to private in DefaultSparse.chpl and LayoutCS.chpl.

It might be offset by having more uses to go through at the earlier depths

That seems to assume that all of the deep uses were necessary to the module searching for its functions, but I don't think that's typically the case. E.g., in your example above, if my program were to use LocaleModel and most of the uses within it were changed to private, I don't think it suggests my program would necessarily want or need to add a use for each of the other modules in the list (I agree I may have gotten a few things I wanted or needed for free by accident, but I suspect the common case in the old world was that each use resulted in a deep chain as you've shown here, whereas I think the common case now will be that most modules will use a smaller number of modules).

I was thinking more broadly. For each use we add at the current scope, that will lengthen the amount of time to find other symbols that we previously found by looking through the first level of use statements, since we need to check every module being used at a particular level in case of conflicts. So instead of having to go through a bunch of scopes for one symbol, we're having to go through an extra scope for every symbol that required a use to find.

Here's an example, looking at the impact of finding all referenced symbols in Main1 vs Main2 (not just the symbols that we defined at the end of a long use chain).

module A {
  var w: int;
}
module B {
  use A;
  // potentially other uses
  var x: int;
}
module C {
  use B;
  // potentially other uses
  var y: int;
}
module D {
  use C;
  // potentially other uses
  var z: int;
}
module Main1 {
   use D;
   writeln(w); // Goes into 4+ modules
   writeln(x); // Goes into 3+ modules
   writeln(y); // Goes into 2+ modules
   writeln(z); // Goes into 1 module
}
module Main2 {
   use A, B, C, D;
   writeln(w); // Goes into 4 modules (better than or no worse than before)
   writeln(x); // Goes into 4 modules (potentially worse than before)
   writeln(y); // Goes into 4 modules (potentially worse than before)
   writeln(z); // Goes into 4 modules (worse than before)
}

Assuming I'm understanding it, I think your example assumes that we need symbols from every module that was present in the chain of uses, though. Whereas my hypothesis is that the current scope cares very little about most of the modules in the deep use chain, and probably only cares about those in the first module or so (D, in your example). So rather than converting a 1x4 use chain into a 4x1 use chain, I'm imagining that it'll be converted into a 1x1 use chain. (Or, more generally, a 1xn use chain will be converted into a small number of uses... like maybe 2-4).

That's not quite what I'm getting at, but I don't think it's super important to understand it either - I'm happy to try and explain better but will otherwise just let it lie :)

It's important to me to understand. Maybe you can explain it to me sometime?

Sure, go ahead and poke me in the next couple of days (or I'll poke you if it seems like you're feeling like you have the time but have forgotten)

This should be resolved by #14270 (I was seeing evidence that it was - e.g. I had to add some qualified access to symbols defined by modules used with this strategy)

Was this page helpful?
0 / 5 - 0 ratings