Julia: `using` loads modules on workers but does not put exported bindings in Main

Created on 3 Dec 2014  ·  19Comments  ·  Source: JuliaLang/julia

This may be intended, but it seems a bit awkward to me:

$ julia -p 1
[...]
  | | |_| | | | (_| |  |  Version 0.4.0-dev+1922 (2014-12-02 23:10 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit e4e1688* (0 days old master)
|__/                   |  x86_64-apple-darwin14.0.0

julia> using StatsBase

julia> @everywhere assert(isa(StatsBase.sample, Function))

julia> @everywhere assert(isa(sample, Function))
exception on 2: ERROR: sample not defined
 in eval at /usr/local/julia/base/sysimg.jl:7
 in anonymous at multi.jl:1395
 in anonymous at multi.jl:820
 in run_work_thunk at multi.jl:593
 in run_work_thunk at multi.jl:602
 in anonymous at task.jl:6

Obviously the workaround is @everywhere using StatsBase, but I think that using X should probably be equivalent to @everywhere using X, or if not, it should act exclusively on the main process. Instead it seems we get using X on the main process and require("X") on the workers.

bug parallel

Most helpful comment

I don't think it's intentional, I just think that no one has yet decided that they care enough to fix it. You could be the one!

All 19 comments

I'm going to label this as a bug since it seems clear that this couldn't have been the intended semantics.

I've been bitten by this loaded-but-still-out-scope bug several times. And no one I've pointed this out to has any idea why this is the way this is.

I don't think it's intentional, I just think that no one has yet decided that they care enough to fix it. You could be the one!

@timholy Any idea where the definition of using is located? It's kind of hard to grep for.

IIUC require(::Symbol)

I think we need to fix this in the 0.5 timeframe itself. @everywhere using Mod or @everywhere require are problematic in themselves and responsible for https://github.com/JuliaLang/julia/issues/12381 and probably https://github.com/JuliaLang/julia/issues/16788

Which Julia/C function is called with using?

Base.require(:JSON) is equivalent of import JSON on all nodes.

It would be a bit annoying to go another release without fixing it. I once (long ago) spent a couple tens of minutes on this, but didn't get far enough to figure out how to do it or even to fully trace how using actually works.

@vtjnash Where is using implemented? It does a bit more than Base.require(Mod), just unable to trace it.

I think I can fix this pretty easily, but there seems to be some debate going on over the appropriate fix to make this consistent. It seems clear that, wherever using/import causes modules to be required, they should also alter the bindings. The question is whether using/import should require the module only on the worker they're being called on (and thus @everywhere using X would always be necessary to load modules if they will be used on workers), or whether using/import should both require the module and import the bindings on all the workers. Opinions?

+1 for require the module and importing the bindings on all the workers in terms of consistency.

However, a different debate is to whether have using / import load on all workers or only on the calling process. Specifically w.r.t. plotting / visualization packages which are irrelevant on the workers. In which case we should require an explicit @everywhere using whenever we want to load a package everywhere.

My issue with the current behavior is two-fold:

  1. Requiring using ModuleName prior to @everywhere using ModuleName seems weird and unintuitive (see https://github.com/JuliaLang/julia/issues/16189)
  2. @everywhere using ModuleName wouldn't be a problem if it threw a specific error pointing the user to the proper solution. Something like: "ModuleName is not loaded on worker X. Consider running @everywhere using ModuleName to load it on all workers."

Specifically w.r.t. plotting / visualization packages which are irrelevant on the workers.

Gadfly and company are really heavy and would be slow to load on all workers. I agree that an explicit @everywhere using call would be better due to the performance implications.

We'd have to see how the two versions work in practice with actual parallel usage, but I'd lean towards making using and import local-by-default unless annotated with @everywhere.

It looks like local-by-default is most people's preferred option. My main concern is https://github.com/JuliaLang/julia/issues/3680, which would mean that, in many situations, all your workers would die if you forget the @everywhere. Not a great experience, especially for people trying to do stuff in parallel for the first time.

3680 can be partially worked around by

  • having errors during deserialization send back a specific error, say DeserializationError before closing the connection.
  • The other end prints the error as a warning and simply reconnects.
  • This leads to a loss of messages already serialized, but is not an issue in the situation where folks forget the @everywhere before retrying again.

The other existing issue is a probable race condition with precompilation happening in parallel with @everywhere using

I'l submit a PR for the #3680 workaround.

Bump. #3680 is closed. Implement local-by-default loading?

Yes let's try it.

I'm curious - what's the status on this?

This bug tripped me in 0.7.0-beta and "using X; @everywhere using X" solves it, where I didn't need to use that trick on 0.6.3 (and all the previous 0.6's). Weird..

Was this page helpful?
0 / 5 - 0 ratings