Currently, when you do using M
and the code for M.jl
is automatically loaded, the file is evaluated in Main
and expected to define a module M
. It can, however, evaluate arbitrary code in Main
before or after the definition of M
and can fail to define M
entirely:
# M.jl
@show @__MODULE__ # => Main
module M # can not do this bit
@show @__MODULE__ # => Main.M
end # module M
@show @__MODULE__ # => Main
With https://github.com/JuliaLang/julia/pull/23579 and the plan outlined in https://github.com/JuliaLang/julia/issues/4600#issuecomment-327573232, this becomes increasingly awkward. Modules will no longer live inside of Main
so what module is code before and after the definition of M
evaluated in?
The simplest solution would be for the code in the file M.jl
to be evaluated inside of M
instead of defining M
. In other words, the file M.jl
would be reduced to this:
# M.jl
@show @__MODULE__ # => Main.M
When using M
or import M
ends up loading this file, it would implicitly wrap it in the equivalent of module M; include("M.jl"); end
. This way the module M
cannot help but be defined and the boilerplate module M
and corresponding end # module
at the beginning and end of most files is eliminated.
One fact makes this transition surprisingly straightforward: modules automatically have a constant, internal self-binding. Thus, module M; module M; end; end
causes a warning to be printed: WARNING: replacing module M
. Because of this, we can safely implement the above proposal and simply change the behavior of module M; module M; end; end
â the inner module M; end
would have no effect, simply continuing with the definition of the outer M
. To encourage people to change their code and eliminate the boilerplate, we can emit a warning that the module M
is no longer necessary and should be deleted.
The most common and official case of code evaluated in Main
before module code is __precompile__
calls in packages, controlling/indicating whether a package is precompilable or not. This would no longer work as expected, so we need an alternative. Since precompilability is a package-level property, it would make sense for this bit to be recorded in the new Pkg3-style as a precompile = [true|false]
entry in the Project.toml
file. This metadata would automatically be propagated to the Pkg3 registry metadata for various versions of each package.
Note that the Oh, I see, you addressed that at the end. But what about loading modules that are not in Pkg3 metadata? Do we disallow precompiling such modules?__precompile__(true)
statement is currently evaluated outside the module
scope, so that it can avoid parsing of the entire module when it is not needed.
Packages don't have to be registered to have a Project.toml
file, so I don't think that's a limitation.
To keep the module definition explicit, which I personally prefer, we could also disallow having any code outside of the module definition. The package file would have to start with module M
and end with end
.
I'm curious why you prefer keeping the module name explicit â what's the benefit?
The benefit of keeping it explicit is that I think users may be confused if the "same" code creates a module in a file vs. global definitions if pasted into the REPL. On the other hand, it doesn't seem to confuse Python people too much?
I thought it created an asymmetry/inconsistency with submodules, for which you still would need to write out the keyword. I also liked that using
a module is equivalent to just include
ing the code. It clears up the magic associated with loading modules (compared to other languages where you might not know what happens behind the scene).
But what about loading modules that are not in Pkg3 metadata?
I think we could make __precompile__(true)
the default now (and make it a no-op). Opting out with __precompile__(false)
at the top would still work.
Julia is a safe language by default, and __precompile__(true)
is not safe for arbitrary code.
I don't think there should be any safety issues with it that are specific to it, just additional warnings / errors.
I guess you could take the position that any code using ccall
is intrinsically unsafe and people should be expected to know that they may need __precompile__(false)
, but it still seems a bit unfriendly.
There is also the fact that new users creating modules probably don't want precompiling, because it slows things down if you are continually changing the module. Precompiling is mainly for experienced developers of larger packages.
probably don't want precompiling, because it slows things down if you are continually changing the module
That's a good point.
There is also the fact that new users creating modules probably don't want precompiling, because it slows things down if you are continually changing the module. Precompiling is mainly for experienced developers of larger packages.
I think this was solved by Revise.jl. I don't think there's any particular need to distinguish experienced and novice developers via some arbitrary keyword anymore.
any code using ccall
I'm not aware of any __precompile__
-specific issues with ccall. I know there were some initial changes to cfunction, but I think that's mostly all sorted out now. (I also generally assume that users needing a CFFI are pretty quickly headed towards being advanced users anyways.)
And making __precompile__
the default will greatly simplify the code logic and substantially improve the error messages.
@vtjnash, once you start using ccall
, it is pretty easy to get data structures back from C that have pointers hidden in them, and you have to know not to stick these into a global constant for precompiling.
Revise doesn't completely solve the problem because it can't handle e.g. changes in type definitions. In general, I wouldn't recommend using precompilation initially when starting new packages.
Once this and #4600 are implemented, how will it be possible to specify the absolute path to a module file that is outside the current working directory? Please find an example discussed here. Would the include
statement still be needed for absolute paths?
@stevengj: AFAICT https://github.com/JuliaLang/julia/pull/22721 should resolve the issue with changed type definitions (https://github.com/timholy/Revise.jl/issues/18 for Revise.jl
)
@lucatrv: I'm not seeing the issue. Calling include("/path/to/file.jl")
on a file containing an explicit module M ... end
would continue to work just as it does now. You cannot use using
to load code from arbitrary absolute paths currently â you have to use include
for that already.
@StefanKarpinski: ok, but in this case how would one decide whether to put an explicit module M ... end
statement in his M.jl
file when he first writes it? Because if he puts it, people evaluating using .M
will get the "warning that the module M
is no longer necessary and should be deleted". While if he does not put it, people evaluating include("/path/to/M.jl")
may potentially get weird side effects because all the code in script M.jl
will be evaluated in the current variable workspace.
IMHO an optional string passed to using
together with the module name would solve this issue. When passed, it would establish the reference root path which the relative path starts from.
To make some examples:
using M
looks for module M
in file M.jl
within the LOAD_PATH
using .M
looks for module M
in file current_working_directory/M.jl
using .M.M
looks for module M
in file current_working_directory/M/M.jl
using "/home/foo/srcdir".M
looks for module M
in file /home/foo/srcdir/M.jl
using A, .B, "path_to_C".C
would be a combination of above casesI'm not sure what the use case for being able to either load code via using
or include
is. These are fairly different mechanisms. include
is for loading pieces of some larger body of code from separate files, which is why it interprets relative paths relative to the current source file (not the current working directory). using
is for loading top-level external packages â the current source file is largely irrelevant.
Your proposal looks both more complicated and less backwards compatible than what's discussed in #4600. Relative using should not look in the current working directory, nor is the fact that the module path is relative the most salient point. Did you read through the latest proposal in #4600?
If one wants to simulate the behavior of using
with include
one can just do module M; include("/path/to/M.jl"); end
â that doesn't seem all that hard; as I mentioned before, one can have a helper macro as well to make it less verbose, although it's already pretty short.
Sorry I mistakenly wrote "current working directory" while I meant "current source file directory", so I am going to report here the examples above corrected (with a few additions):
using M
looks for module M
in file M.jl
within the LOAD_PATH
using .M
looks for module M
in file current_source_file_directory/M.jl
using A.M
looks for module M
in file module_A_source_file_directory/M.jl
using .M.M
looks for module M
in file current_source_file_directory/M/M.jl
using .path.to.M.M
looks for module M
in file current_source_file_directory/path/to/M/M.jl
using "/home/foo/srcdir".M
looks for module M
in file /home/foo/srcdir/M.jl
using A, .B, "path_to_C".C
is a combination of above casesAfter lots of reading and thinking on how Julia code should be organized into modules to be reusable, in my mind I got the idea that include
should be used as a sort of "reusable" source code cut and paste tool, such as in:
module Foo
include("file1.jl")
include("file2.jl")
end
while using
in my opinion is the command of choice to load functions implemented in modules (reusable code). In other words in my opinion to make source code reusable the best practice should be to put functions into modules, and load them where needed with using
(not with include
). I discussed a particular use case here.
A module source file could either:
LOAD_PATH
)JuliaConfig.jl
module discussed in the above reference, but in general this would happen any time the module source file is not easily reachable with path relative to the current source file directory)When I read the latest proposal in #4600, I though that it covers cases 1 and 2, but not 3. So my intent was actually to extend your proposal, with the following two differences:
M
in file current_source_file_directory/M/M.jl
would need to be referenced explicitly with using .M.M
(this change is optional, the benefit in my opinion would be to keep the "dotted" notation more aligned with the system's path notation, but maybe there are other reasons to prefer implicit using .M
=> src/M/M.jl
path notation)IMHO this proposal is still in line with #4600, and with similar degree of backwards compatibility (just more features). In case you do not like it, there can be of course other options, but in general I think that a way to provide absolute paths to using
is needed once #4600 and #23619 are implemented, so that include
is not required anymore to load modules.
Would this proposal prevent baremodule
from being used for top level modules?
It seems like several independent issues are getting mixed together here: (1) whether module Foo ... end
is implicit in module files, (2) whether precompilation is the default, and (3) how to specify different paths in the using
statement. Can we just focus on (1)?
Personally, I tend to agree with @jebej: module loading is much less magical if the module code looks the same no matter where you load it from. Code before and after the module
is executed in an undocumented module.
@stevengj: I think that points (1) and (3) are interdependent. If module Foo ... end
is implicit in module files, then include
alone cannot be used to load modules, so using
and import
should provide a way to load module files from specific paths (or otherwise module M; include("/path/to/M.jl"); end
has to be used). The choice on one point affects the other one, so IMHO it is better to ponder them together.
Regarding @jebej comment:
I also liked that
using
a module is equivalent to justinclude
ing the code"
on the other hand I subscribe the 13th aphorism of the Zen of Python:
There should be one-- and preferably only one --obvious way to do it.
which has been a main general goal of Python 3.
I think that having a clear distinction on what include
and using
do, with no doubts on when one or the other one should be used, is healthy. IMHO include
should be used as a sort of reusable source code "copy and paste" tool, while using
should be the standard way of loading modules. Then of course everything that using
does (which would be just a few things beyond what include
does) should be documented in detail to avoid any magic associated with loading modules.
then
include
alone cannot be used to load modules
This is not the case â include
can load files that have module M ... end
in them just fine. This would only require module M; include("/path/to/M.jl"); end
to use include
to load files that are intended to be loaded via using
or import
â which is a questionable thing to do in the first place.
IMHO
include
should be used as a sort of reusable source code "copy and paste" tool, whileusing
should be the standard way of loading modules.
Yet your above argument is that implicit module wrapping is bad because it prevents you from using include
and using
interchangeably? This seems like an incoherent position. If you don't like implicit module wrapping for imported code on aesthetic reasons, that's fine, but this whole argument seems to be self-contradictory.
@StefanKarpinski sorry if I was not able to express myself clearly. I actually like very much the implicit module wrapping proposal, for the reasons outlined in your first post of this thread. My position however is that together with the implicit module wrapping proposal, the capability to load modules from absolute paths (case 3 of my Oct 31st post above) should be added to the using/import
commands, unless you think that the need to load modules from absolute paths should never happen in normal conditions (see end of this post). One possibility could be for instance to specify absolute paths as proposed in my posts above (using "/home/foo/srcdir".M
), or otherwise another possibility could be to directly provide the full path of the module file (using "/home/foo/srcdir/M.jlâ
), or otherwise maybe there could be other better options to specify absolute paths that someone else can propose. If this capability is not added to the using/import
commands (unless loading from absolute paths should never happen, see the end of this post) then I would rather prefer the status quo, but I would like to make clear that the status quo is not my best preference.
I will try to better detail my position in the following. Please consider that all this is based
on my personal preference, I do not want to open a diatribe on this, my aim is just to give suggestions, we can make things work anyway whatever is decided about this.
So here are the possible cases in order of my own preference:
using/import
commands to load module files from absolute paths (unless this should never happen).using/import
commands are always âthe standard way of loading modulesâ in all possible cases, regardless if the module is (case 1) available in the LOAD_PATH or (case 2) reachable through relative path (once #4600 is implemented) or (case 3) reachable through absolute pathusing/import
commands (case 1, and case 2 once #4600 is implemented), while other times only the include
command can be used (case 3).using/import
commands to load module files from absolute paths.using/import
commands (case 1, and case 2 once #4600 is implemented), while other times only the include
command can be used (case 3). This is the same as the status quo but:include
command must be used (absolute path), it should be explicitly wrapped with module M; include("/path/to/M.jl"); end
. Apart from the additional boilerplate, in doing this one should take care not to write for instance module N; include("/path/to/M.jl"); end
.Now that I should have fully explained my opinion, if you do not agree with it, for my understanding I would like to ask why instead in your opinion it is better that the using/import
commands are able to load modules only from LOAD_PATH or relative paths, and not from absolute paths, in other words why in your opinion for case 3 of my Oct 31st post (absolute path) it is better to use module M; include("/path/to/M.jl"); end
rather than still relying on the using/import
commands with something like using "/home/foo/srcdir".M
or otherwise using "/home/foo/srcdir/M.jlâ
. Is it because normally for production code all modules should always be reachable either from LOAD_PATH or through relative paths, and never through absolute paths, and so you think that loading from absolute paths should be a very particular and temporary condition (not for production code), which justifies the need for a sort of "include
hackâ? If this is the case, then I would actually agree with your position not to add absolute paths to the using/import
commands.
Sorry for the long post but I hope it clarifies.
the capability to load modules from absolute paths should be added to the using/import commands, unless you think that the need to load modules from absolute paths should never happen in normal conditions
otherwise maybe there could be other better options to specify absolute paths that someone else can propose
I don't think it's the business of the code that loads a module via import Foo
to include the path to Foo
. The whole point of a statement like that is to say "give me the known resource Foo
wherever it may be". If you bake it into the usage location there are various problems:
Foo
should be loaded from different places; they can disagree on the location of the same Foo
â where should it be loaded from?Foo
from, if it moves they all have to change.The new package manager provides the ability to put absolute (or relative) paths in a project manifest file, telling Julia to load Foo
from that location. Any place in the code that needs that Foo
then knows where to look without coupling the code with a specific location, without repeating the location through the code, and without the ability for code to be written such that different places loading the same Foo
can disagree about where it comes from.
Even with that mechanism, I don't really think using absolute paths for this sort of thing is a great idea. Relative paths are ok since you may want to refer to a resource defined within the same project which is updated in sync with the code that uses it. Otherwise, you want to specify the identity of a package (i.e. package UUID) that you'd like to use and the identity of a particular state of that package (i.e. version tree hash) that you'd like to load. As long as it gets the right version of the right package, code shouldn't care where it came from or lives on disk.
@StefanKarpinski: this clarifies all my questions, now that I see the full picture I perfectly agree with you, thanks!
I think this will be out of scope for 1.0.
This could be added in a non-breaking way by just making the inner module in module M; module M; end; end
a no-op. Then we would just have to convince people to change their code in a manner other than deprecation warnings, for which we have various options.
Most helpful comment
I don't think it's the business of the code that loads a module via
import Foo
to include the path toFoo
. The whole point of a statement like that is to say "give me the known resourceFoo
wherever it may be". If you bake it into the usage location there are various problems:Foo
should be loaded from different places; they can disagree on the location of the sameFoo
â where should it be loaded from?Foo
from, if it moves they all have to change.The new package manager provides the ability to put absolute (or relative) paths in a project manifest file, telling Julia to load
Foo
from that location. Any place in the code that needs thatFoo
then knows where to look without coupling the code with a specific location, without repeating the location through the code, and without the ability for code to be written such that different places loading the sameFoo
can disagree about where it comes from.Even with that mechanism, I don't really think using absolute paths for this sort of thing is a great idea. Relative paths are ok since you may want to refer to a resource defined within the same project which is updated in sync with the code that uses it. Otherwise, you want to specify the identity of a package (i.e. package UUID) that you'd like to use and the identity of a particular state of that package (i.e. version tree hash) that you'd like to load. As long as it gets the right version of the right package, code shouldn't care where it came from or lives on disk.