Julia: WeakKeyDict should not convert keys as otherwise it can drop them

Created on 6 Dec 2017  Â·  16Comments  Â·  Source: JuliaLang/julia

Consider:

julia> wkh = WeakKeyDict{Vector{Int}, Any}()
WeakKeyDict{Array{Int64,1},Any} with 0 entries

julia> v = Float64[1:10^5;];

julia> wkh[v] = 1
1

julia> gc()

julia> keys(wkh)
Base.KeyIterator for a WeakKeyDict{Array{Int64,1},Any} with 0 entries

I would argue that this is not the intended behavior and instead an error should be thrown on wkh[v] = 1. Although, maybe there is a use for this which escapes me, e.g. hash consing, see #3002.

Open questions:

  • is whether this strict behavior should also be enforced for getindex, i.e. no key conversion on getters either.
  • one inconsistency with non-conversion is that k1==k2 does not imply wkh[k1]==wkh[k2]. Not sure whether this is bad or not (or worse than above example).

I'm not sure whether this issue should be classified as breaking (I guess that depends whether the behavior is a bug or a feature).

I have a branch in which I changed this here, which I could turn into a PR. Alternatively, if this is a feature and not a bug, a test should be added. Could above example be used or is that too fragile due to gc()?

Most helpful comment

Thanks for the investigation, @mauro3! The conclusion from discussion on the triage call is that we should just make the minimal change here and make WeakKeyDict not do autoconversion anymore.

All 16 comments

This should get the collections label and probably the bug label.

WeakKeyDict should really be WeakKeyIdDict, anything else is pretty broken.

I realize it's very late in the game, but is it fundamentally broken for WeakKeyDict not to do egal-based key comparison? @JeffBezanson

No, I don't think it's fundamentally broken. The weakness controls when keys get deleted, and the comparison controls which keys can be found. For example if [1] is a key, other [1] arrays will match it but the key will only be removed when the original [1] is freed. That might not be the behavior you want, but it doesn't seem totally broken to me. But on the whole I agree WeakKeyIdDict makes a bit more sense.

I totally agree that the ID-version makes more sense.

With the non-ID version, I cannot imagine a situation when conversion of the key upon insertion is the desired outcome. Do you have an example when that could be useful and not a bug? (I'm ok with comparison doing a "conversion")

Ok, the current version isn't "fundamentally broken" in the sense that it crashes or gives incorrect behavior, but I cannot imagine a situation where someone is using a WeakKeyDict and a WeakKeyIdDict (or WeakIdDict?) would not actually be more correct. Which I think is roughly the same as what @mauro3 is saying.

A recent data point in favor of that here.

I implemented the using-object-id change in PR #28161, maybe this can help the triage-call tomorrow.

See #3002 --- Distributed uses the ==-based hashing behavior of WeakKeyDict. For now we should maybe just remove the conversion, and add WeakKeyIdDict later.

I found three occurences of WeakKeyDict in stdlib (and none in base/):

As per Jeff's comment, the first two need ==-based hashing. The third looks suspiciously similar. So, if julia itself only needs the == version of WeakKeyDict, then the Id version should probably go to the DataStructures.jl package. However, procrastinating I coded a WeakKeyDict+WeakKeyIdDict over in PR #28182, so you can see how a possible implementation would look like.

It would probably be good though to add tests which test the ==-needing features of above three WeakKeyDict occurrences (as my PR #28161 shows, changing to ===-hashing does not lead to any test failures in their tests).

Thanks for the investigation, @mauro3! The conclusion from discussion on the triage call is that we should just make the minimal change here and make WeakKeyDict not do autoconversion anymore.

Cool, I can prepare a PR tomorrow. Did you discuss whether the no-conversion applies to getters as well?

We said getters should continue to convert – it's just setters that should type-assert.

get! is a bit odd then:

a = [1]
wkd = WeakKeyDict(a=>1)
get!(wkd, [1.0], 1) # works
get!(wkd, [2.0], 1) # errors

but I guess that is ok. Or maybe more consistent to throw on the first too? (The latter would be easier to implement ;-)

I would favor erroring on the first one too.

For those following along at home and still wondering when the == behavior is needed. It's only used here where it stores RemoteRefs which are compared/hashed with the methods here. The other two WeakKeyDicts (here and here) take objects which fall back to === comparison/hashing. Sadly I couldn't come up with a test case for the RemoteRefs case which tests the behavior needing ==.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Keno picture Keno  Â·  3Comments

iamed2 picture iamed2  Â·  3Comments

yurivish picture yurivish  Â·  3Comments

musm picture musm  Â·  3Comments

wilburtownsend picture wilburtownsend  Â·  3Comments