Chapel: Handle file name collisions across projects

Created on 21 Nov 2017  Β·  53Comments  Β·  Source: chapel-lang/chapel

From my Stack Overflow question, if I have two files named core.chpl in projects A and B, when I use B in project A I get a conflict.

```{bash}
projA/alpha.chpl
/classes.chpl // classes for project A, includes AThang

projB/beta.chpl
/classes.chpl // classes for project B, includes BThang

```{bash}
chpl alpha.chpl -M../projB/

it will use the classes.chpl file under A and error on compiling when beta can find BThang. Also, in this case the error should read "Ain't nuthin but a B Thang, baby".

Language Won't fix / Ain't broke Design user issue

Most helpful comment

OK, I've created #8470 to be specifically about the Mason issue. I still think we should fix the original issue here, but let's start with the Mason part.

All 53 comments

Another example that demonstrates this:

> chpl AB.chpl A/A.chpl B/B.chpl
warning: Ambiguous module source file -- using A/C.chpl over B/C.chpl

file hierarchy:
.
β”œβ”€β”€ AB.chpl
β”œβ”€β”€ A
β”‚Β Β  β”œβ”€β”€ A.chpl
β”‚Β Β  └── C.chpl
└── B
 Β Β  β”œβ”€β”€ B.chpl
 Β Β  └── C.chpl
// AB.chpl
use A;
use B;

proc main() {
  writeln(c);
}
// A/A.chpl
use C;
writeln(c);
// B/B.chpl
use C;
writeln(c);
// A/C.Chpl
var c = 'A class';
// B/C.chpl
var c = 'B class';

I think it is proper to have name collisions when two names are at the same scope and both are available at compilation time. When we implement separate compilation, this will be a more reasonable request as the linking stage shouldn't cause issues.

How is this handled in C / C++?

C/C++ have separated the concept of files and namespaces, while our insertion of implicit modules for files that don't have an explicit top level module removes that distinction. This means that with the file name case, in C/C++ you could distinguish the two with a more explicit path; Chapel doesn't care about the path to find the file's module once it has been found, after that point it will treat it more like a namespace - meaning that after parse time, your program has been turned into:

...
module classes {...} // from projA/classes.chpl
module classes {...} // from projB/classes.chpl
module other {
  use classes; // Which one should I use?
  ...
}
...

which is roughly similar to:

var a = 10;
var a = 12;
var b = a + 1; // which a do I use?

Similar to this Stackoverflow, you could resolve your issue by either:

A) renaming one of the files
B) inserting an explicit top level module in one of the files
C) inserting an explicit top level module with a different name, and then a sub module around the remainder of the contents with "classes" as its name.

But in all of these cases you will have to change the use statement in the code that relies on the file you have modified. In case A and B, this will be a simple change of use classes; to use diffName;. In case C, it will look more like use diffName.classes;

Should this issue be closed? It seems to be less about "I want to use the same filename twice" and more about "I want to have two modules with the same name in a single Chapel program." But that isn't possible / legal in Chapel as it's defined today (and I don't have a vision as to what it would mean to attempt to support it.

Not the same Chapel program. In the example, B is an external module. This seems to imply that I can't use the same file name as any file in modules I include.

I'm confused then, perhaps because of there being detail lacking in your original description. Specifically, I'm tripping over the clause "and error on compiling when beta can find BThang." How did beta.chpl get pulled into the compilation?

When you use -M/path/to/module/src it pull the files in path/to/module/src into compilation.

That's not actually true. It _considers_ pulling them in to help resolve use statements, but it doesn't auto-compile every file in that directory.

For example, given:

hello.chpl:

writeln("hello, world!");

testdir/die.chpl

compilerError("You can't use this module!");

If I run:

chpl hello.chpl -M testdir

die.chpl is not read in and so the compiler error isn't triggered. However, if I add a use die; to hello.chpl then it is.

I'm sure I'm wrong on the internals! However, project A has a set of models or classes it needs to use, and I put them in a cleverly disguised file called classes.chpl with A clearly has to use. Now, B, being a capricious devil, also has a set of classes to use. Hence, two files named classes are being used. Ultimately, I only want to use the actual class in B, not the file.

I'm still not following. Which files have code that you are hoping to bundle into a single executable? It sounds like one of these combinations, but I'm not clear on which one:

  • alpha.chpl + B/classes.chpl
  • alpha.chpl + A/classes.chpl + B/classes.chpl
  • alpha.chpl + A/classes.chpl + beta.chpl + B/classes.chpl

I will try to put together two github repos to illustrate the point over the weekend.

I'm not sure I fully understand the issue here, but I think it's a request for Chapel programs to consider directory when deciding which module to use - and/or a request for modules "names" to include directories (the way in Java, it'd be org.myorg.myproject.MyThingy, say).

I think it's a request for Chapel programs to consider directory when deciding which module to use - and/or a request for modules "names" to include directories (the way in Java, it'd be org.myorg.myproject.MyThingy, say).

If so, I'm personally not a fan of that approach... We discussed doing more of a jar-like approach approach early in Chapel's design and agreed at that time that interpretation of source code presented to the compiler should be largely independent of the location of that code in the file system. Like any decision, we can revisit it, but I'm not keen on it (at least, without a motivating example I can understand, sympathize with, but not come up with a reasonable solution to).

I think that within the context of mason packages it'd be nice if we didn't have to worry about making every module file name globally unique, and if instead we could rely on the mason package name being globally unique.

I will try to put together two github repos to illustrate the point over the weekend.

it is here: https://github.com/buddha314/chapel-go-boom

it'd be nice if we didn't have to worry about making every module file name globally unique

Is there something that requires filenames to be globally unique? If so, that doesn't sound familiar to me, nor like our intention (i.e., sounds like a bug).

I just found the Makefile in the chapel-go-boom bug, so will see if that clarifies things for me.

OK, so for those who don't want to recreate this at home, the chapel-go-boom situation is a lot like what was described at the outset of the issue, but with different names:

projA/
  Core.chpl
  ProjA.chpl
projB/
  Core.chpl
  ProjB.chpl

Importantly: ProjA.chpl contains the statement:

use ProjB, Core;

Then, _from within the projA/ directory_, the command: chpl -o projA -M../projB ProjA.chpl is executed, resulting in the output:

warning: Ambiguous module source file -- using ./Core.chpl over ../projB/Core.chpl

What's happening here is that the use of the modules ProjB and Core cause the compiler to check to see whether it already knows about such modules, and if not, to go looking for them in its search path. The module search path always includes . by default (the projA directory in this example) and in this case, ../projB was also explicitly added to it via the -M flag.

In this use statement, the compiler doesn't already know about ProjB, so goes looking for files that might define it (i.e., ones named ProjB.chpl ()), finds ProjB.chpl in the projB directory of its search path and parses it. The compiler also doesn't know about Core (because neither ProjA.chpl nor ProjB.chpl define a module with that name) so it goes looking for a file named Core.chpl along its search path ().

The compiler finds two files that seem to be likely candidates for defining a module Core (correctly, because they both do): projA\Core.chpl and projB\Core.chpl. It warns about the ambiguity and informs which copy of Core.chpl it's going to use (projA\Core.chpl because . came earlier in the search path).

If the goal was to have one Core.chpl file be added to the compilation, the way to get around this would be to name the Core.chpl file that you want to use on the command line. I.e., the following command lines get rid of the ambiguous module source file warning:

chpl -o proja -M../projB ProjA.chpl ../projA/Core.chpl   # use projA's Core.chpl
chpl -o proja -M../projB ProjA.chpl ../projB/Core.chpl  # use projB's Core.chpl

With either of these, the next complaint is that there are multiple potential main modules, so to disambiguate, we could add the flag --main-module=ProjA. But then the problem is that both of projA and projB want to use their own Core.chpl. In terms of the command-line, this isn't a problem, as we can tell the compiler to parse both of them:

chpl -o proja -M../projB ProjA.chpl ../projB/Core.chpl ../projA/Core.chpl --main-module ProjA

but then the compiler (correctly) complains that there are two modules named Core. And since all top-level modules are stored in a single namespace, this is correct. This is the reason that back in this comment I said that this issue seemed to be asking to have two modules with the same name in a single programβ€”it is.

I'm not quite sure what to suggest to move past this because I"m not sure what Brian's ultimate goal/desire is. One approach would be to rename one or both of the modules named Core to something else (e.g., CoreA and CoreB). Another approach would be to push both Core modules into the module scopes of ProjA and ProjB such that each contained its own local / private sub-module named Core (module names don't conflict if they're not at the top level).

(*) = This is a part of the current implementation that makes me wince and which has long been intended to be improved. That said, I don't think it's a part of the issue at hand here, so this is a sidebar. More specifically, our long-term intention has been that all .chpl files in the module search path would be scanned to understand what modules they contain / potentially define such that the compiler wouldn't need to assume that only files named Foo.chpl could define a module Foo and so that a user wouldn't be required to follow the convention of naming their files after their modules precisely (i.e., they could stick in additional information like version numbers, labels, or just name the file something arbitrary).

Just got back from a meeting, wasn't ignoring. Thought the Makefile would be more obvious.

The ultimate goal, of course, is to have the Ninjas fight the Fighters. That is, have both classes available to projA for use without having to break patterns within each project. E.g. using a file called Core.chpl or Models.chpl something to hold models. It's very common in Python Flask applications, for instance to create models.py, services.py, and views.py in every project, then people new to the task can find things quickly.

This habit, by the way, is reinforced in many Convention Over Configuration frameworks like Flask, Groovy on Grails, etc.

If that is simply not possible in Chapel due to design considerations, perhaps we find a way to conveniently name space the models, like new projB.Ninja() for external classes. I'm open to suggestions. Once it's been figured out, perhaps a section on the docs would be helpful.

Can you splainy splainy this little bit?

Another approach would be to push both Core modules into the module scopes of ProjA and ProjB such that each contained its own local / private sub-module named Core (module names don't conflict if they're not at the top level).

The idea would be to structure the code like:

module ProjA {
  // put ProjA stuff here
  module Core {
    // pur ProjA's core stuff here
  }
  // put more ProjA stuff here
}

module ProjB {
  // put ProjB stuff here
  module Core {
    // pur ProjB's core stuff here
  }
  // put more ProjB stuff here
}

This would create modules ProjA and ProjB in the top-level namespace, each of which would have its own sub-module named Core. But since they're scoped within ProjA and ProjB, there's no conflict there.

Is there a way to move the "inner" modules to their own files? Like an overload use Core(submodule=true) or something?

I almost answered that before you asked it, but didn't want to get ahead of myself. Today, unfortunately, there is not. For a long time, we've considered adding some sort of equivalent to C's #include directive (yet within the language, not relying on a pre-processor) that would say "insert the contents of the named file into this place in the source code). I don't think there have been any significant objections to this proposal, but that it hasn't had a strong enough proponent to get it done (and someone will need to propose syntax / semantics).

So imagine, for instance, that you wanted to keep your code in four separate files as you currently do, but wanted to get the scoping advantages of my response above. With a feature like this, you ought to be able to do so using:

ProjA.chpl:

module ProjA {
   // Optionally, projA stuff here

  include "Core.chpl";  // or `inject` or `insert` or `...`?

  // optionally, more ProjA stuff here
}

and:

ProjB.chpl:

module ProjB {
   // Optionally, ProjB stuff here

  include "Core.chpl";  // or `inject` or `insert` or `...`

  // Optionally, more ProjB stuff here
}

I think such a feature is important in order to break the current tyranny of "each entire module must be defined in a single monolithic file."

I'd have to double-check, but I don't think a feature request issue exists for this yet (the last discussion predated our use of GitHub issues), but it would be very fair game for such an issue in my opinion.

I'd have to double-check, but I don't think a feature request issue exists for this yet (the last discussion predated our use of GitHub issues), but it would be very fair game for such an issue in my opinion.

Is this really the hint you want to send? Don't I own like 50% of the really annoying issues already? And you know what else I can do that's annoying? Thumbs-up my own comments. Yeah, check it.

Adding issues that I would also like to see addressed never bothers me. :)

Back on _this_ issue, have we addressed the original concern now (explained what's happening
and why and why that seems like the right thing to do), or is there more to do beyond
the possibility of adding inclusion of source files to resolve it? Specifically, should I / we spend
more time on your comment about Flasks before moving on?

The comment about Flask was to emphasize that filename re-use is a real thing, and common in the application world. If this issue has been deemed the "discussion" issue, then I think the Chapel team needs to decide if they have the right design. I don't feel qualified to participate further than "this is a problem I ran in to". So close this at your leisure ...OR PERIL!!!

I continue to contend that we permit filenames to be re-used, just not top-level module names.
Are there others on this issue who are more expert in this topic and who could comment on
whether they think there's more we could/should do here or not? (e.g., @mppf, @ben-albrecht?).

Without having a source file's path determine language semantics (which, as stated above, is
not a proposal that I'm going to champion), I'm not sure there's more we can do beyond
support for an include statement. (I.e., I think this could be closed, but I'm willing to have others
sketch out a counterproposal. I just don't want us to keep this open without a champion or
owner pushing for a specific design).

@bradcray - I don't think we have a tenable position right now, because I think the implication is that every top-level module needs to have a globally unique name.

I've created an example with Mason that demonstrates the problem.

https://github.com/mppf/ProjA
https://github.com/mppf/ProjB

I didn't actually publish projB to the Mason registry - ProjA README has instructions for how to pretend. Just like @buddha314's example, ProjA depends on ProjB, and both ProjA and ProjB have a module named Core. You should be able to reproduce the issue by pasting the commands from ProjA's ProjA README.

One direction we could go is to adjust the rules for finding modules to do something clever in the context of a mason build.

To me, this seems more like something to address in Mason / the context of Mason rather than a need to change the language (I realize you're not suggesting a language-based fix/change).

IIRC, there are other ways in which a Mason build can get into trouble, such as requiring two distinct versions of a single library due to hard dependences from different modules. At times, I believe we've discussed having Mason contain some sort of mechanism to rename one of the conflicting module versions to avoid the conflict and make both available. Perhaps we could do the same here? (i.e., when Mason becomes aware that there are two modules of the same name, it strives to resolve the conflict). A variation on that might be to have Mason rename all modules in sources that it deals with to munge their names with their path to guarantee / improve uniqueness.

Back on the language side of things, I'm not sure what more we can/should do. Wanting multiple top-level modules to have the same name seems as ridiculous as wanting a way to create two files with the same name in the same directory to me. Having all module names encode their paths in some way might fix the issue but feels inherently unattractive to me, and rigid in terms of how source is organized and interpreted. We could have some sort of language-level module renaming along the lines of use Core as ProjAsCore rather than doing it in mason, but I don't think that really solves the problem because it requires modifying sources, which wouldn't work so well if the sources of two external packages each need a module named Core, but neither of them was under your control to edit.

You can tell I'm not eager to pursue a language-based solution for this problem, but I don't even have a very clear sense of what we would do in the language to help.

You can tell I'm not eager to pursue a language-based solution for this problem, but I don't even have a very clear sense of what we would do in the language to help.

Why don't we figure out what a Mason-specific solution would look like, and then consider whether or not it generalizes to non-Mason use cases?

A variation on that might be to have Mason rename all modules in sources that it deals with to munge their names with their path to guarantee / improve uniqueness.

Right, I think it'd be reasonable for Mason to convince the compiler to include the package name and version in the "module" the compiler actually works with. But another interesting idea would be to allow one to treat a directory as a module, so that all files inside of that directory would be considered nested modules. But we still have a problem in that the module searching has to respect a particular order that will make sense to Mason - in particular, to look in the package containing the current module first, and then the dependencies (possibly in a particular order).

E.g. in in the ProjA / ProjB example, what if we could "use" a directory?
It might look something like this:

  ProjA/Core.chpl
    ...
  ProjA/Main.chpl
    use Core, ProjB; // Note 2
    ...
  ProjB/Core.chpl
    ...
  ProjB/Main.chpl
    use Core; // Note 1
    ...

Note 1: Inside of ProjB/Main.chpl, the compiler needs to know to look first for Core (or any other module) in ProjB/.
Note 2: Inside of ProjA.chpl, the 'use' of ProjB is actually 'use'ing the directory ProjB/. The compiler would have some rule, like always load up 'Main.chpl' in the directory if present; all modules if not.

Having all module names encode their paths in some way might fix the issue but feels inherently unattractive to me, and rigid in terms of how source is organized and interpreted.

Do you still feel this way about the above proposal? It seems to me to add flexibility (you can now do something - use a directory - you couldn't before) while leaving the old functionality alone. How does that make it more rigid?

I definitely disagree with the "Won't fix / Ain't broke" label on this issue.

ME TOO! It's just that Lydia is finally realizing she does hate me. She laughed when I warned her, too...

In the context of mason, one alternative strategy would be to require that Mason packages consist of only a single source file, a module with the same name as the package. They could use nested modules inside of that if needed.

I think that's unreasonably restrictive, but it would "solve" the problem in the mason context.

Here is a "story" for why this issue prevents Mason packages from being reasonable "libraries":

Suppose we have two libraries, in the form of mason packages, called SuperLib and NanoLib. Further suppose that these are developed by different groups that are unaware of each other.

Suppose there is an application using both of these.

... use SuperLib, NanoLib ...

Further, suppose that each of these packages consists of a main module
and a helper module:

  package SuperLib
     module SuperLib
     module Helper
  package NanoLib
     module NanoLib
     module Impl

OK, application can use both of these modules, everything can work.

Now suppose that SuperLib developer adds a module called "Impl" to add
some new component. Then, the application developer upgrades. Now we have

  package SuperLib
     module SuperLib
     module Helper
     module Impl
  package NanoLib
     module NanoLib
     module Impl

Now the application does code does not compile, because there are two Impl modules. But the SuperLib developer can't possibly know that some other library possibly used with SuperLib used that module name.

I definitely disagree with the "Won't fix / Ain't broke" label on this issue.

I supported its being labeled as such at the time Lydia added the label because the original issue talks in terms of not supporting two files with the same name in a Chapel compilation which I would argue is actually supported, even though that support could be improved. Now that the issue has been clarified and evolved into more of a design question of how mason (if not the language) should support independent packages that happen to have top-level modules of the same name, I'd argue that we should fork it off into a new issue (or else rename it, though that has the downside of requiring having people read through all of the churn to understand what's actually being requested).

Well, if that means Lydia doesn't hate me, I'm all for it. However, she will eventually...

There is no correlation between my labeling of this issue and whether I hate you :p

OK, I've created #8470 to be specifically about the Mason issue. I still think we should fix the original issue here, but let's start with the Mason part.

I wasn't clear what you meant by "fix the original issue" just above, but going back, I see that you proposed something during this month of being behind on things and asked a question I didn't reply to:

Do you still feel this way about the above proposal? It seems to me to add flexibility (you can now do something - use a directory - you couldn't before) while leaving the old functionality alone. How does that make it more rigid?

To summarize, I think you're saying that we should permit multiple top-level modules to have the same name, but to have "close-ness" rules for disambiguating between them, similar to how in the code one might prefer a "nearby" overload of a function over one that's "further away." A module in the same directory as another that used it would be preferred to one in another directory. In the limit, one might also define close-ness in terms of whether the two modules are in the same subdirectory hierarchy, how far apart they are in terms of the directory tree, what common parent directory they share, etc.

To my tastes, this adds complexity to the language and implementation which shouldn't be necessary for a well-defined package module.

I say "complexity" because (a) it requires doing something in the compiler and possibly language that we don't do now and because (b) anytime someone asks me what the resolution rules are for two very similar-looking functions in distinct modules or sub-modules or scopes today, I want to crawl away and hide because I can't internalize the close-ness rules well enough to know the answer and am often wrong. So adding additional "is this [module] closer to that one?" rules isn't attractive to me since it seems like adding more tie-breaking complexities to learn and remember rather than keeping things simpler (if that's indeed sufficient). I prefer the simplicity of having a flat top-level module namespace rather than introducing an additional level of hierarchy (again, if it's sufficient).

I say "shouldn't be necessary" because I think when a user uses a well-engineered package module 'M', that module shouldn't inject its own private helper modules into the global module namespace -- they should be sub-modules (and we should support a file include capability to break the current requirement that submodules be in the same file as their parent module). useing a well-engineered module could well result in saying "this will result in use-ing other well-defined standard/package modules like BLAS or MPI" but I don't think it should result in injecting MyUtilitiesForM (or Impl in your example) into the global module namespace if that module has no purpose other than supporting M. Essentially, MyUtilitiesForM or Impl should be private to M, not part of the public top-level namespace.

I call directory/file-based schemes rigid because they require arranging your code "just so" in order to keep something working. I.e., I couldn't hand you a printout of some program and have you type it in to get it to work, or to send you a bunch of source files as attachments, I'd effectively have to give you a tarball that also maintained directory structure.

We could argue that mason would only use very simple rules and patterns and therefore we wouldn't have to define an entire near-ness hierarchy across multiple directory levels, but that seems like it's stepping onto a slippery slope (once we have two levels, it seems likely other users will want more), and if we're only really worried about the mason-specific case, it seems like we could address it with a mason-specific solution. Particularly given that discussions of mason have already anticipated needing some means of renaming modules in order to deal with two packages potentially requiring incompatible versions of the same module (I need FFTW 2.3 and you need FFTW 3.4).

All that said, if you want to put forward a proposal "the Chapel language and compiler should support overloaded top-level module names disambiguated by directories" (or maybe equivalently "should not have a top-level module scope but should sort all modules into a hierarchy dictated by the directory hierarchy", I think that should be kicked off as a new language/compiler-design issue for the same reasons as my previous comment: I think it's really deep in this issue whose title describes something else, whose conversation would take awhile to digest, and where the current description of the proposal arguably isn't concrete enough for people to weigh in on. I'm not enough of a fan to spend the time forking off that issue myself.

To summarize, I think you're saying that we should permit multiple top-level modules to have the same name, but to have "close-ness" rules for disambiguating between them, similar to how in the code one might prefer a "nearby" overload of a function over one that's "further away."

Nope, that's not what my straw-man idea proposes. Instead, it just enables the compiler to automatically gather together some files in a particular directory and treat the directory name as the top-level module name and treat the files/ modules within as submodules.

In this way I view it as "automatically creating nested modules from a directory of modules" rather than "choosing modules based upon how nearby they are".

I reproduced this proposal in #8470, since it is a potential solution to the mason form of the issue.

Besides, all of this assumes that nested modules actually solve the namespacing issue. I havn't tested that myself yet. (I.e. can two top-level modules with different names both have a submodule called Impl?)

Instead, it just enables the compiler to automatically gather together some files in a particular directory and treat the directory name as the top-level module name and treat the files/ modules within as submodules.

In this way I view it as "automatically creating nested modules from a directory of modules" rather than "choosing modules based upon how nearby they are".

OK, thanks for clarifying. My concerns remain the same: I don't like languages where the location
of the file affects the semantics of the code, but that doesn't mean you shouldn't champion it.

I reproduced this proposal in #8470, since it is a potential solution to the mason form of the issue.

When you said "I've created #8470 to be specifically about the Mason issue. I still think we should fix the original issue here" it suggested to me (and still does) that there was something not captured in #8470 which motivates keeping this issue open. It's probably obvious by now, but I'd really prefer to close this issue (due to its sprawl) and split key takeaways from it into issues of their own. If #8470 already does that, can we close this one? If not, what's missing?

Besides, all of this assumes that nested modules actually solve the namespacing issue. I havn't tested that myself yet. (I.e. can two top-level modules with different names both have a submodule called Impl?)

Barring bugs that I'm unaware of, yes:

module M1 {
  module Impl {
    proc writeMessage() {
      writeln("In M1.Impl");
    }
  }

  proc run() {
    use Impl;

    writeln("In M1's run");
    writeMessage();
  }

  proc main() {
    run();
    M2.run();
  }
}

module M2 {
  module Impl {
    proc writeMessage() {
      writeln("In M2.Impl");
    }
  }

  proc run() {
    use Impl;

    writeln("In M2's init");
    writeMessage();
  }
}

generates:

In M1's run
In M1.Impl
In M2's init
In M2.Impl

[edited to fix back-to-back cut-and-paste errors]

When you said "I've created #8470 to be specifically about the Mason issue. I still think we should fix the original issue here" it suggested to me (and still does) that there was something not captured in #8470 which motivates keeping this issue open. It's probably obvious by now, but I'd really prefer to close this issue (due to its sprawl) and split key takeaways from it into issues of their own. If #8470 already does that, can we close this one? If not, what's missing?

I think we need to decide what solution we want for the problem in the Mason context. I'd like that solution to also be applicable to Chapel programs not using Mason, which is arguably what @buddha314 was originally requesting. But we may or may not decide to do that.

It doesn't make much difference to me personally whether or not this particular issue remains open.

@mppf: Thinking about your proposal more, I have some questions:

  • What would the implication of your proposal be for the $CHPL_HOME/modules hierarchy? Would it be exceptional in some way (because it's known to the compiler? because it's searched using a -M option rather than by listing a relative path on the command line?) or treated as modules within modules? What would this imply for the use statements in a user's code?

  • Related but coming at it from a different angle: Where would the directory-based namespace for modules be rooted? (e.g., in the cwd where the compiler was invoked?) What happens to modules found using the -M flag? What about modules specified via ../../dir/to/module.chpl?

@buddha314: Would you check and see whether you find issue #8470 satisfying or whether you'd like a second issue branched off to replace this one describing Michael's proposed strategy?

Well, in an effort to stay out of the way, I'd go with what @mppf suggests first. That is, close #8470 . I've habitualized a few ways around this but would prefer a general solution at some point. Also, Mason does not yet appear to be useful for testing and building, so I've stopped using it. I'd love to be wrong here, if someone has a complicated Mason project they could share.

What would the implication of your proposal be for the $CHPL_HOME/modules hierarchy? Would it be exceptional in some way (because it's known to the compiler? because it's searched using a -M option rather than by listing a relative path on the command line?) or treated as modules within modules? What would this imply for the use statements in a user's code?

It wouldn't be exceptional, because we don't normally have -M paths that contain other -M paths (do we?). But, if you had -M $CHPL_HOME/modules, you could (unless we intentionally prevent it) 'use dists', say, and then the compiler would try to jam all the distributions into one big module, each in their own nested submodule. Here, I'd expect we'd try to avoid such misuses of modules/, either explicitly or by somehow marking those directories as not eligible to be package/directory modules.

Related but coming at it from a different angle: Where would the directory-based namespace for modules be rooted? (e.g., in the cwd where the compiler was invoked?) What happens to modules found using the -M flag? What about modules specified via ../../dir/to/module.chpl?

When the compiler is looking for a module named Foo, say in the process of handling use Foo; - in addition to looking for Foo.chpl (and possibly looking for a toplevel module Foo in .chpl files, which we've long wanted but havn't implemented) - in addition to those it would also look for a directory called Foo. It might additionally check that such a directory have a particular file name inside it.

Thus the "implicit module name" comes from the directory in which the modules were used.

I agree it could get confusing if you had a -M directory-containing-foo/ and -M directory-containing-foo/Foo/ and then somebody did use Foo (assuming Foo/ contained Foo.chpl). One way to address this confusion would be to avoid having a "package" directory Foo/ containing a Foo.chpl. In particular, in Python, the "packages" feature that I'm drawing inspiration from only considers a directory a "package" if it has an __init__.py. We could use a similar strategy, with a different required file name, obviously. But I'm trying to say there's some advantage to avoiding the package/directory name as the .chpl file name in it.

Modules specified as ../../dir/to/module.chpl? would continue to work as now, except for the potential for confusion with referring to a module from both use-a-directory-and-now-its-a-nested-module and also directly on the command line. This is similar to the above problem.

I'd expect you'd be able to include one of these package/directory modules on the command line, e.g. as some/path/to/Foo/.

I'm not convinced I'm communicating clearly, so ask more questions if it didn't make any sense :)

OK, I'm back to being confused then. Let me jump to a much simpler example to start out. Say we had the following:

prog.chpl
lib/
  M1.chpl

where prog.chpl wanted to use module M1 as a library, so contained use M1; Today, to compile it, I'd have to say chpl -M lib prog.chpl or chpl prog.chpl lib/M1.chpl. I think what you're saying is that, under your proposal, I could also edit the source code to say:

use lib;
use M1;

or:

use lib.M1;

Yet editing the source code seems contrary to the thrust of this proposal. I.e., my program presumably just wanted to use M1 not lib (or, as a library author I could rename my lib/ directory to M1/ and then use it, but then I'd still need to do an additional use to access the file / module contained within that directory.

What am I missing?

[It almost feels like your proposal is imagining that we'd _automatically_ peek into modules to look for submodules that would help resolve use statements, but that's pretty different than what we do today. I.e., given:

module M {
  module Sub {
  }
}

We wouldn't resolve use Sub; today without either having a use M; statement already in scope or changing it into a fully-qualified use M.Sub; But I'm sure you know this which is why I suspect I'm missing something much more basic.]

@bradcray - I think you're very close to understanding (whew, this one has been harder than I expected to communicate!).

Yet editing the source code seems contrary to the thrust of this proposal. I.e., my program presumably just wanted to use M1 not lib (or, as a library author I could rename my lib/ directory to M1/ and then use it, but then I'd still need to do an additional use to access the file / module contained within that directory.

What am I missing?

Right, I'm expecting the library author would name the directory M1.
Then, the compiler would have some canonical .chpl file in a "package directory" that it treats as code to put into the top-level module. I havn't sketched this out in detail yet, but below I extend your example to show how it might work:

Let's start out with this directory structure:

prog.chpl
M1/
  init.chpl

Here, init.chpl is the special, canonical .chpl file that both makes M1 a "package directory" and also serves as the place for the author of M1 to put anything they want in that top-level module.

Now, if use M1 is intended to make available a proc foo(), then init.chpl would contain it:

// this is M1/init.chpl
// Note that the compiler considers this code to be in
// the top-level module called M1.
proc foo() {
  writeln("In foo");
}

Then prog.chpl can just use M1 and get foo:

// this is prog.chpl
use M1;
foo();

Alternatively, suppose that the author of M1 wanted to use a style like @buddha314 was originally wanting in this issue (and BTW I interpret this issue as a request for enabling that style rather than that the exact example provided should compile). Anyway, in that style, the actual implementation is in Impl.chpl or Core.chpl. The can do that, and it would look like this:

prog.chpl
M1/
  init.chpl
  Core.chpl
// this is M1/init.chpl
// Note that the compiler considers this code to be in
// the top-level module called M1.
// Also note that the compiler implicitly inserts nested
// modules in M1 for each of the other modules in the
// package directory. For that reason, Core is available
// here, as a nested module inside M1.
use Core; // makes foo available to `use M1`
// this is M1/Core.chpl
// I didn't write 'module Core { }' but I'd expect that to work too
proc foo() {
  writeln("In foo");
}

Now prog.chpl can use exactly the same pattern and get foo from M1:

// this is prog.chpl
use M1;
foo();

OK, this latest description helps me understand your proposal much better. And again, if this is something you'd like to champion, I think it's time to fork it off into its own feature request issue and away from this one which has bogged down.

From my perspective, if you gave me the choice between this and a way to include Chapel files within others, I'd still prefer the latter. Why? Because in either case the resulting module structure is the same -- both solutions create one outer client-visible module that is implemented using multiple inner modules; and in either case I get the ability to break my single conceptual module into multiple source files. However, in the "support include" proposal, I can also do file inclusions on a different granularities than complete modules (e.g., I could potentially include a file that defined a procedure or a single statement), and it's also an approach that doesn't rely as much on files being arranged in a specific way w.r.t. special filenames and directory structure (which, as I've said, I'm not a big fan of) in order to work.

But that's not to say you shouldn't / couldn't propose it for others to weigh in on.

I feel inclined to close this issue at this point. It seems to boil down to a few things:

  • concern about multiple mason packages sharing the same top-level module names, which has been forked off to issue #8470.
  • desire for a way to split modules across multiple files which has been forked off to #10909 and #10796
  • a potential desire to have directory structure imply something about module structure as Michael describes in this comment above. This is a feature I'm personally not a fan of, and encouraged splitting it off into its own issue if others were (because I don't think it's clearly stated in this issue). That hasn't happened, which I interpret as not having broad support.

Any objections?

No objection, your Honor.

I created #10946 for the only other point that wasn't covered.

Was this page helpful?
0 / 5 - 0 ratings