Julia: WeakKeyDict should not convert keys as otherwise it can drop them

Created on 6 Dec 2017 · 16Comments · Source: JuliaLang/julia

Consider:

julia> wkh = WeakKeyDict{Vector{Int}, Any}()
WeakKeyDict{Array{Int64,1},Any} with 0 entries

julia> v = Float64[1:10^5;];

julia> wkh[v] = 1
1

julia> gc()

julia> keys(wkh)
Base.KeyIterator for a WeakKeyDict{Array{Int64,1},Any} with 0 entries

I would argue that this is not the intended behavior and instead an error should be thrown on wkh[v] = 1. Although, maybe there is a use for this which escapes me, e.g. hash consing, see #3002.

Open questions:

is whether this strict behavior should also be enforced for getindex, i.e. no key conversion on getters either.
one inconsistency with non-conversion is that k1==k2 does not imply wkh[k1]==wkh[k2]. Not sure whether this is bad or not (or worse than above example).

I'm not sure whether this issue should be classified as breaking (I guess that depends whether the behavior is a bug or a feature).

I have a branch in which I changed this here, which I could turn into a PR. Alternatively, if this is a feature and not a bug, a test should be added. Could above example be used or is that too fragile due to gc()?

Source

mauro3

Most helpful comment

Thanks for the investigation, @mauro3! The conclusion from discussion on the triage call is that we should just make the minimal change here and make WeakKeyDict not do autoconversion anymore.

StefanKarpinski on 19 Jul 2018

👍2

All 16 comments

This should get the collections label and probably the bug label.

mauro3 on 11 Jul 2018

WeakKeyDict should really be WeakKeyIdDict, anything else is pretty broken.

StefanKarpinski on 11 Jul 2018

👍1

I realize it's very late in the game, but is it fundamentally broken for WeakKeyDict not to do egal-based key comparison? @JeffBezanson

StefanKarpinski on 17 Jul 2018

No, I don't think it's fundamentally broken. The weakness controls when keys get deleted, and the comparison controls which keys can be found. For example if [1] is a key, other [1] arrays will match it but the key will only be removed when the original [1] is freed. That might not be the behavior you want, but it doesn't seem totally broken to me. But on the whole I agree WeakKeyIdDict makes a bit more sense.

JeffBezanson on 17 Jul 2018

I totally agree that the ID-version makes more sense.

With the non-ID version, I cannot imagine a situation when conversion of the key upon insertion is the desired outcome. Do you have an example when that could be useful and not a bug? (I'm ok with comparison doing a "conversion")

mauro3 on 17 Jul 2018

Ok, the current version isn't "fundamentally broken" in the sense that it crashes or gives incorrect behavior, but I cannot imagine a situation where someone is using a WeakKeyDict and a WeakKeyIdDict (or WeakIdDict?) would not actually be more correct. Which I think is roughly the same as what @mauro3 is saying.

StefanKarpinski on 17 Jul 2018

A recent data point in favor of that here.

mlhetland on 17 Jul 2018

I implemented the using-object-id change in PR #28161, maybe this can help the triage-call tomorrow.

mauro3 on 18 Jul 2018

❤2

See #3002 --- Distributed uses the ==-based hashing behavior of WeakKeyDict. For now we should maybe just remove the conversion, and add WeakKeyIdDict later.

JeffBezanson on 18 Jul 2018

👍1

I found three occurences of WeakKeyDict in stdlib (and none in base/):

As per Jeff's comment, the first two need ==-based hashing. The third looks suspiciously similar. So, if julia itself only needs the == version of WeakKeyDict, then the Id version should probably go to the DataStructures.jl package. However, procrastinating I coded a WeakKeyDict+WeakKeyIdDict over in PR #28182, so you can see how a possible implementation would look like.

It would probably be good though to add tests which test the ==-needing features of above three WeakKeyDict occurrences (as my PR #28161 shows, changing to ===-hashing does not lead to any test failures in their tests).

mauro3 on 19 Jul 2018

Thanks for the investigation, @mauro3! The conclusion from discussion on the triage call is that we should just make the minimal change here and make WeakKeyDict not do autoconversion anymore.

StefanKarpinski on 19 Jul 2018

👍2

Cool, I can prepare a PR tomorrow. Did you discuss whether the no-conversion applies to getters as well?

mauro3 on 19 Jul 2018

We said getters should continue to convert – it's just setters that should type-assert.

vtjnash on 19 Jul 2018

get! is a bit odd then:

a = [1]
wkd = WeakKeyDict(a=>1)
get!(wkd, [1.0], 1) # works
get!(wkd, [2.0], 1) # errors

but I guess that is ok. Or maybe more consistent to throw on the first too? (The latter would be easier to implement ;-)

mauro3 on 19 Jul 2018

I would favor erroring on the first one too.

StefanKarpinski on 19 Jul 2018

For those following along at home and still wondering when the == behavior is needed. It's only used here where it stores RemoteRefs which are compared/hashed with the methods here. The other two WeakKeyDicts (here and here) take objects which fall back to === comparison/hashing. Sadly I couldn't come up with a test case for the RemoteRefs case which tests the behavior needing ==.