Compilation of source files with exactly the same name as built-in type names
gives internal compilation error (ICE). Specifically, files with names such as
"real.chpl", "int.chpl", "bool.chpl", and "list.chpl" gives ICE,
while "imag.chpl", "complex.chpl", "string.chpl", "range.chpl"
give similar (but non-internal) errors.
If I use a file name like _real_.chpl and include the contents
of the file into a module (i.e., module real { ... }), I get
the same error messages as using real.chpl (without an explicit module
definition). So it seems that the error originates from the use of
a user-defined module name such as real.
Though this issue can be easily avoided once we get to know that a file
name implicitly defines a module name (when an explicit one is not
defined), I guess a new user might stumble on such a case in the initial
learning step (which was exactly the case for me 1-2 years ago with real.chpl,
and the last month with list.chpl). So, I would appreciate it if some error message
is printed that recommends the user to avoid such a file/module name.
The source files contain only a single line (writeln("hello");).
In the case of "real.chpl", we compile it as
$ chpl real.chpl
which gives
$CHPL_HOME/modules/internal/CString.chpl:228: In function 'c_string_to_real32':
$CHPL_HOME/modules/internal/CString.chpl:228: error: illegal use of module 'real'
... (snip)
$CHPL_HOME/modules/internal/Atomics.chpl:242: In function 'atomic_init__real64':
$CHPL_HOME/modules/internal/Atomics.chpl:242: error: illegal use of module 'real'
... (snip)
internal error: EXP1157 chpl Version 1.15.0
Other files such as "int.chpl" give similar errors.
If we compile "imag.chpl", we get
$CHPL_HOME/modules/internal/CString.chpl:232: In function 'c_string_to_imag32':
$CHPL_HOME/modules/internal/CString.chpl:232: error: illegal use of module 'imag'
... (snip)
$CHPL_HOME/modules/internal/StringCasts.chpl:134: In function '_cast':
$CHPL_HOME/modules/internal/StringCasts.chpl:134: error: illegal use of module 'imag'
... (snip)
Source Code:
Each file contains only one line:
writeln( "hello" );
Compile command:
$ chpl real.chpl
chpl --version:chpl Version 1.15.0
Copyright (c) 2004-2017, Cray Inc. (See LICENSE file for more details)
$CHPL_HOME/util/printchplenv --anonymize:CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: gnu
CHPL_TARGET_ARCH: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_MAKE: gmake
CHPL_ATOMICS: intrinsics
CHPL_GMP: gmp
CHPL_HWLOC: hwloc
CHPL_REGEXP: re2
CHPL_WIDE_POINTERS: struct
CHPL_AUX_FILESYS: none
gcc --version:gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)
Tested on both Linux x86_64 (CentOS-6) and Mac OSX 10.11.6 (El Capitan).
In either case, the Chapel compiler was built from source.
In the case of Mac OSX, a case-sensitive harddisk partition
was used to distinguish upper and lowercases for the file names.
[Tagging @noakesmichael on this, as it seems loosely related to scopeResolve issues he's been wrestling with lately]
It appears to me that the source of the problem here is that built-in types like int, real, etc. are injected by the compiler into the "root module" scope which is further away from user code than a user module with the same name. If (for example) they were instead defined in a module that was 'use'd within each module like ChapelStandard.chpl, I'd guess that tests like this would work as expected (though I'm not sure that's the right solution). Or, if both the built-in type and the user module were inserted into the same scope, we could generate an error indicating that it's not legal to have two symbols with the same name at the same scope. This might suggest that it's a bad idea to have separate "root" and "theProgram" scopes/modules as we do today (an organization that was introduced early in compilation while we were wrestling with bigger issues and is probably ripe for reconsideration).
I'm less clear what's going wrong in the "user program named list.chpl" case since ChapelDistribution.chpl uses 'List' (which shouldn't conflict with a lower-case 'list'.chpl) and I'd guess that this should cause the standard 'list' type to be closer than a user file named 'list'. This is the case I'm hoping Mike will have some insight into.
Fortunately, as far as I experienced, there seems to be no more "bad" file names.
For example, the following file names run just fine:
Imag.chpl Real.chpl expr.chpl size.chpl use.chpl
Int.chpl class.chpl init.chpl this.chpl var.chpl const.chpl ref.chpl
Bool.chpl List.chpl deinit.chpl proc.chpl tuple.chpl module.chpl
Complex.chpl Range.chpl domain.chpl record.chpl type.chpl
So the problem seems to occur only for a very limited set of keywords.
This might suggest that it's a bad idea to have separate "root" and "theProgram" scopes/modules as we do today
Though I know little about the mechanism of compilers, it seems very reasonable
to make the namespaces for "internal" and "user" symbols orthogonal
(via "name mangling" etc??), though the use of unified namespace
might have different merits. But with the latter,
as the compiler grows, the need for avoiding internal symbols also increases,
so the former seems more preferable to me (sorry for a naive opinion...).
I will add more info if I hit some similar cases.
It's not surprising to me that keywords wouldn't conflict (like var.chpl, ref.chpl, const.chpl) as those are handled by the parser and not by symbol lookup and scoping. I think the only reason you're not seeing a problem with other type names (e.g., Imag, Real, etc.) are because you capitalized them. I'm guessing that making any of those lowercase would cause a problem again. The case of 'list' was particularly worrisome because it's not among a list of "built-in" types, but is defined in one of the standard modules, leading to questions about how many other standard module symbols could cause conflicts if used as module names.
Happily, since I last wrote, @noakesmichael has made good progress on a nice fix and code cleanup that should take care of the list.chpl issue. And we've also come up with a proposed plan for dealing with the int.chpl, real.chpl, etc. cases which should be tractable and clean some things up as well. We'll update this issue as those changes make it onto master.
I'm not sure the orthogonal namespace idea you propose makes sense for Chapel as it's currently defined, due to the fact that it's intentional in the language that 'int' _not_ be a built-in keyword, but rather be a general identifier that a user could redefine (though it's discouraged). So a user's reference to 'int' would need to resolve to either a user declaration or the built-in one, depending on which was more visible by our symbol resolution rules... Thus mangling the identifier is not possible due to the need for them to be the same.
As @bradcray suggested, I have recently merged a small set of changes that serve to dramatically
reduce the risk of unintentional name collisions e.g. the risk that a user module with the name "list"
will collide with uses of the type "list" that is defined in the module List. This error mode was
particularly frustrating because there were so many names to collide with.
However the implementation for a handful of primitive types such as bool and int remain prone
to a version of this unintentional collision. This remains a source of confusion but it is somewhat
satisfying to recognize the set of names at risk is relatively small and reasonably well defined.
It is hoped that this hole can be closed in the current release.
For archival purposes, I believe the changes @noakesmichael was referring to are #6248.
Most helpful comment
As @bradcray suggested, I have recently merged a small set of changes that serve to dramatically
reduce the risk of unintentional name collisions e.g. the risk that a user module with the name "list"
will collide with uses of the type "list" that is defined in the module List. This error mode was
particularly frustrating because there were so many names to collide with.
However the implementation for a handful of primitive types such as bool and int remain prone
to a version of this unintentional collision. This remains a source of confusion but it is somewhat
satisfying to recognize the set of names at risk is relatively small and reasonably well defined.
It is hoped that this hole can be closed in the current release.