Crystal: Hash#put and Hash#replace

Created on 7 Jan 2020 · 9Comments · Source: crystal-lang/crystal

8116 added `Hash#put` which sets the value, returns either the previous value (if exists) or the block result (invoked with the key).

This behaviour seems counter-intuitive as expressed in https://github.com/crystal-lang/crystal/pull/8116#issuecomment-529382480 f.

I'm proposing:

Rename #put to #replace keeping the same semantics.
Add an overload to #replace without a block which returns nil if the value doesn't exit.
Add a new #put method which sets the value only if it does not exits.

Maybe there could also be an overload to #put which accepts a block. The block would be called when the value exists and receive the value. This would be an efficient method for the use case when a value needs to be either added or the existing value updated. For example when a hash collects multiple values per key in an array:

# currently:
hash[key] ||= [] of Foo
hash[key] << value

# more efficient: (thanks @bcardiff)
array = (hash[key] ||= [] of Foo)
array << value

# with #put:
hash.put(key, [value]) { |array| array << value }

feature discussion topiccollection

Source

straight-shoota

👍2 👎1

All 9 comments

@straight-shoota Mind if I take a stab at this (if the feature is finalized) and is there any ETA on this feature.

KarthikMAM on 9 Jan 2020

I'd wait for some more approval, maybe there'll be suggestions to change the proposed API.

straight-shoota on 9 Jan 2020

👍1

I'm fine with this change.

RX14 on 9 Jan 2020

I am not convinced

Rename #put to #replace keeping the same semantics.

But the current semantics of #put allow us to set the value, even if the key wasn't set. Having a #replace with the current semantic is confusing to me

Add a new #put method which sets the value only if it does not exits.

That sounds to me more like an explicit replacement (if there is something to replace, do it).

Finally, the use case in the OC seems more like a vivification (Ref: https://github.com/crystal-lang/crystal/issues/4376#issuecomment-299273442)

Or it can be coded in a way that there is no double mention of the key per initialize & append:

hash = {} of Int32 => Array(String)

array = (hash[0] ||= [] of String)
array << "A"

array2 = (hash[0] ||= [] of String)
array2 << "B"

hash # => {0 => ["A", "B"]}

bcardiff on 9 Jan 2020

👍1

Yeah, maybe the names are not ideal. But the current semantics for #put are not great either. The block behaviour seems unexpected.

Agreed, the example for what currently works could also be written more efficiently, I've added your suggestion to the OC.
But this is still fundamentally different from the behaviour my propsal provides where the block is only executed when a value exists. There's no way to distinguish this only with conditional assignment.
When adding an item to an array there is not much difference between initialize & append vs. direct initialization. Other use cases might have a more relevant performance impact when update is more costly than initialization.
I'm not sure anymore whether it's worth optimizing for that, though. When the overall cost for these operations is high, it might just be preferable to just do separate hash lookups. As a bonus this also allows skipping the potentially unnecessary initialization.

straight-shoota on 9 Jan 2020

I don't understand this snippet in the original comment:

# with #put:
hash.put(key, value) { |array| array << value }

That doesn't compile in my head, value is an element or is it an array? It can't be both in those two locations.

I think put is fine. The docs are clear. I don't understand what's the problem. And what you can't do right now with Hash... maybe a method is missing, but put does not need to change because of that.

asterite on 17 Jan 2020

👍1

@asterite Ooops, sry I forgot the brackets, it's supposed to be hash.put(key, [value]) { |array| array << value }. Fixed that in the OP.

The documentation for #put is clear, that's true. But I think it is not intuitive because it combines two different mechanics: While the main purpose is to put a value in the hash, the block argument actually has nothing to do with that. Instead it behaves exactly as the block argument to #fetch.
That seems very confusing to me. Judging by its behaviour, the purpose of this method is more shifted towards fetching a value with a default provided by the block (exactly as #fetch) and in the process also setting a new value.
To illustrate: This is a non-optimized implementation of #put:

def put(key : K, value : V)
  return_value = fetch(key) { yield }
  self[key] = value
  return_value
end

I think my main issue with the method is the mixture of concerns does not fit with the name.

straight-shoota on 17 Jan 2020

👍1

I'd like something like the below, but obviously only looking up the entry once, to be the building block for Hash lookups, deletions, and inserts.

def lookup(key : K, & : V?, Bool -> V | Hash::Delete)
  if exists?(key)
    new_value = yield self[key], true
  else
    new_value = yield nil, false
  end

  if new_value.is_a? Delete
    delete(key)
  else
    self[key] = new_value
  end
end

This method is basically a sanitized, public, interface to performing an arbitrary option on the Hash::Entry. The second parameter is only needed to distinguish between nil and not present, and could likely be ignored in many external invocations. Hash::Delete is a sigil, like Iterator::Stop.

Everything else can be composed efficiently on top of this primitive:

def [](key)
  lookup(key) do |value|
    return value
  end
end

def []=(key, value)
  lookup(key) { value }
end

def exists?(key)
  lookup(key) do |val, present|
    return present
  end
end

def delete(key)
  lookup(key) { Hash.delete }
end

def put(key, value)
  lookup(key) do |old_value, present|
    if present
      return_value = old_value
    else
      return_value = yield
    end

    value
  end
  return_value
end

This would allow Set to have efficient semantics without put. I actually consider put semantics to be a bit convoluted for the reasons @straight-shoota put above.

RX14 on 17 Jan 2020

👍1

I can agree to put's semantics being a bit odd. But either it has to go completly or keep its alias for []= semantic in all cases, that is always setting the given value to the given key. This is what a method named put on a map does in every single language I've seen that features it. Doing anything else is handing a foot gun to our users.