Julia: Global variable not updated on remote workers in some cases

Created on 5 Mar 2019  路  5Comments  路  Source: JuliaLang/julia

Looks like in some cases, if I switch a global variable, then switch it back, its value on remote workers when captured is only updated after the first switch. Here is a MWE (Julia 1.1.0):

julia> using Distributed

julia> addprocs(1)
1-element Array{Int64,1}:
 2

julia> x = :a
:a

julia> @fetch x
:a

julia> x = :b
:b

julia> @fetch x
:b

julia> x = :a
:a

julia> @fetch x
:b # <--- should be :a here

This doesn't always happen though. E.g. if x is instead an Int, it works as I would have expected from the docs:

julia> using Distributed

julia> addprocs(1)
1-element Array{Int64,1}:
 2

julia> x = 1
1

julia> @fetch x
1

julia> x = 2
2

julia> @fetch x
2

julia> x = 1
1

julia> @fetch x
1

Note sure what triggers the bug, if it even has anything to do with the type of x, although is seems reproducible every time. I've found that e.g. x::Int and x::Vector work fine, but x::Symbol and x::String do not.

bug parallel

Most helpful comment

I was too scared to submit any PR since this is quite low level and I don't know what else relies on the ClusterSerializer struct.

Please don't be! The more contributors the better. Most of us have been new to the Julia codebase at one time.

All 5 comments

The logic here is wrong (you can redefine the function with @show s.glbs_sent to see what is going on):

https://github.com/JuliaLang/julia/blob/a1e41b96e71d36ec568440a8ad3430d862f363a4/stdlib/Distributed/src/clusterserialize.jl#L135

In the steps you show before (for a a non-bits type):

x=:a //objectid(x)=hash(:a)
s.globs_sent=Dict(hash(:a)=>hash(x,hash(:a))) //new oid, send x
x=:b //objectid(x)=hash(:b) 
s.globs_sent=Dict(hash(:a)=>hash(x,hash(:a)),hash(:b)=>hash(x,hash(:b))) //new oid, send x
x=:a
s.globs_sent=Dict(hash(:a)=>hash(x,hash(:a)),hash(:b)=>hash(x,hash(:b))) //key and value match from the first run, don't send x

This is a bug and will happen for many types (as long as you go back to a previous value for the same variable), and I am not sure this PR helps either :(

The issue is here -

https://github.com/JuliaLang/julia/blob/a1e41b96e71d36ec568440a8ad3430d862f363a4/stdlib/Distributed/src/clusterserialize.jl#L18

I think glbs_sent::Dict{UInt64, UInt64} should be glbs_sent::Dict{Symbol, Tuple{UInt64, UInt64}} i.e. (key,value) -> (global_symbol,(objectid, hash_value))`.

I'll submit a PR in a couple of days.

Yes, i figured that glbs_sent should be symbol => (oid,value), so the changing value can be overwritten. I was too scared to submit any PR since this is quite low level and I don't know what else relies on the ClusterSerializer struct.
Thanks for taking care of this!

I was too scared to submit any PR since this is quite low level and I don't know what else relies on the ClusterSerializer struct.

Please don't be! The more contributors the better. Most of us have been new to the Julia codebase at one time.

The linked PR that was merged claims to have fixed this.

Was this page helpful?
0 / 5 - 0 ratings