Hash#put which sets the value, returns either the previous value (if exists) or the block result (invoked with the key).This behaviour seems counter-intuitive as expressed in https://github.com/crystal-lang/crystal/pull/8116#issuecomment-529382480 f.
I'm proposing:
#put to #replace keeping the same semantics.#replace without a block which returns nil if the value doesn't exit.#put method which sets the value only if it does not exits.Maybe there could also be an overload to #put which accepts a block. The block would be called when the value exists and receive the value. This would be an efficient method for the use case when a value needs to be either added or the existing value updated. For example when a hash collects multiple values per key in an array:
# currently:
hash[key] ||= [] of Foo
hash[key] << value
# more efficient: (thanks @bcardiff)
array = (hash[key] ||= [] of Foo)
array << value
# with #put:
hash.put(key, [value]) { |array| array << value }
@straight-shoota Mind if I take a stab at this (if the feature is finalized) and is there any ETA on this feature.
I'd wait for some more approval, maybe there'll be suggestions to change the proposed API.
I'm fine with this change.
I am not convinced
- Rename #put to #replace keeping the same semantics.
But the current semantics of #put allow us to set the value, even if the key wasn't set. Having a #replace with the current semantic is confusing to me
- Add a new #put method which sets the value only if it does not exits.
That sounds to me more like an explicit replacement (if there is something to replace, do it).
Finally, the use case in the OC seems more like a vivification (Ref: https://github.com/crystal-lang/crystal/issues/4376#issuecomment-299273442)
Or it can be coded in a way that there is no double mention of the key per initialize & append:
hash = {} of Int32 => Array(String)
array = (hash[0] ||= [] of String)
array << "A"
array2 = (hash[0] ||= [] of String)
array2 << "B"
hash # => {0 => ["A", "B"]}
Yeah, maybe the names are not ideal. But the current semantics for #put are not great either. The block behaviour seems unexpected.
Agreed, the example for what currently works could also be written more efficiently, I've added your suggestion to the OC.
But this is still fundamentally different from the behaviour my propsal provides where the block is only executed when a value exists. There's no way to distinguish this only with conditional assignment.
When adding an item to an array there is not much difference between initialize & append vs. direct initialization. Other use cases might have a more relevant performance impact when update is more costly than initialization.
I'm not sure anymore whether it's worth optimizing for that, though. When the overall cost for these operations is high, it might just be preferable to just do separate hash lookups. As a bonus this also allows skipping the potentially unnecessary initialization.
I don't understand this snippet in the original comment:
# with #put:
hash.put(key, value) { |array| array << value }
That doesn't compile in my head, value is an element or is it an array? It can't be both in those two locations.
I think put is fine. The docs are clear. I don't understand what's the problem. And what you can't do right now with Hash... maybe a method is missing, but put does not need to change because of that.
@asterite Ooops, sry I forgot the brackets, it's supposed to be hash.put(key, [value]) { |array| array << value }. Fixed that in the OP.
The documentation for #put is clear, that's true. But I think it is not intuitive because it combines two different mechanics: While the main purpose is to put a value in the hash, the block argument actually has nothing to do with that. Instead it behaves exactly as the block argument to #fetch.
That seems very confusing to me. Judging by its behaviour, the purpose of this method is more shifted towards fetching a value with a default provided by the block (exactly as #fetch) and in the process also setting a new value.
To illustrate: This is a non-optimized implementation of #put:
def put(key : K, value : V)
return_value = fetch(key) { yield }
self[key] = value
return_value
end
I think my main issue with the method is the mixture of concerns does not fit with the name.
I'd like something like the below, but obviously only looking up the entry once, to be the building block for Hash lookups, deletions, and inserts.
def lookup(key : K, & : V?, Bool -> V | Hash::Delete)
if exists?(key)
new_value = yield self[key], true
else
new_value = yield nil, false
end
if new_value.is_a? Delete
delete(key)
else
self[key] = new_value
end
end
This method is basically a sanitized, public, interface to performing an arbitrary option on the Hash::Entry. The second parameter is only needed to distinguish between nil and not present, and could likely be ignored in many external invocations. Hash::Delete is a sigil, like Iterator::Stop.
Everything else can be composed efficiently on top of this primitive:
def [](key)
lookup(key) do |value|
return value
end
end
def []=(key, value)
lookup(key) { value }
end
def exists?(key)
lookup(key) do |val, present|
return present
end
end
def delete(key)
lookup(key) { Hash.delete }
end
def put(key, value)
lookup(key) do |old_value, present|
if present
return_value = old_value
else
return_value = yield
end
value
end
return_value
end
This would allow Set to have efficient semantics without put. I actually consider put semantics to be a bit convoluted for the reasons @straight-shoota put above.
I can agree to put's semantics being a bit odd. But either it has to go completly or keep its alias for []= semantic in all cases, that is always setting the given value to the given key. This is what a method named put on a map does in every single language I've seen that features it. Doing anything else is handing a foot gun to our users.