There is no sorting method in Hash class. How should we handle this? Or, will this be included in the future?
While Hash is insertion ordered, depending on that is usually a smell IMO. Are you sure you don't want an Array({A, B}) actually?
I would sort a hash by its values like hash.sort_by { |_, v| v.size } but I could not do as your said.
I didn't say that you can sort a Hash, I said that wanting to usually hints that you're using the wrong data structure.
I understood. And, it would be better to use the structure you mentioned above. Thanks for help.
Hash#{sort, sort_by} are convenience methods for .to_a.sort and might be nice (they exist in ruby-land anyway, but I suppose forcing the call to .to_a might clarify things hmm)
For future reference,
to get a sorted array of pairs:
hash.to_a.sort_by! { |key, value| key }
I don't want to revive an old corpse, but are there technical reasons why Hash#sort_by is not in the standard lib ?
@jhass : are there any reading materials I could work with to find alternate data structures to bypass this need ? Because I find myself using @oprypin one liners so many times...
I don't think there's much reading material because it's largely based on opinionated philosophy. So it's definitely not a technical reason.
Crystal often favors a little bit more explicitness compared to Ruby, with the goal of reducing surprises. Here, I think it's easy to argue that calling some method on a Hash turns it into an Array, rather than changing the internal ordering a Hash has, can be a bit surprising offhand. In Ruby this happens because Hash is Enumerable and sort_by is an Enumerable method and the key thing to understand here is that in terms of Enumerable a Hash is a list of pairs and that's what sort_by operates on. But internalizing this connection is quite some cognitive overhead to ask for.
So in Crystal so far we preferred to make this conversion a little bit more explicit, and the code a little bit more intention revealing. It literally says there, I want an ordered list of pairs, ordered by the following criteria.
Reiterating my previous point, personally I had very few cases where I ended up wanting to have the features of a map structure, that is constant time lookup of a value by some unique key, and the features of a list structure, that is a defined ordering between some values, at the same time. In fact this feature combination is so seldom that many standard libraries don't offer any possibility for it, or at least do not provide it in the standard map type but rather a special one. Anyways, the few cases where I personally encountered this desire turned out to be quite indicative of me not having found the right datastructure for my problem yet, often simply realizing I didn't actually need one of the two features I talked about above and thus making the resulting program easier to reason about by using the other structure. This is what I meant when I said this is a code smell to me, it's an indication to me that I should be revisiting something.
Personally I wouldn't even use the term "one-liner" for the expression above, which I tend to use for common transformations on some data structure that usually have a much higher complexity than a call to to_a :)
So, why does that little to_a bother you so much? :) Maybe your data should never be a Hash in the first place?
Thanks for this extremely elaborate answer :-)
Personally I wouldn't even use the term "one-liner" for the expression above
Haha ! Yes, you're right. Replacing one line of code with one another line, is not a oneliner :-p
Regarding the sort_by issue, IMO it's just synthetic sugar, mental load reduction and a gift to ruby newcomers.
It's not a big issue if this is not in the standard lib, but the cost of adding it is minimal.
I fell in love with Crystal because of its extreme readability. Thanks from borrowing from Ruby syntax and "encouraging" to declare types it makes reading code almost like reading natural language.
I said that wanting to usually hints that you're using the wrong data structure.
So, why does that little to_a bother you so much? :) Maybe your data should never be a Hash in the first place?
Ok, you have my full attention. I'm a junior developer, and using the key => value structure of Hash seems so obvious and useful to me for many tasks.
For example attributing a score value to an Array or Set of values then picking the 3 most scored ones, what would be the most clean/efficient way to do this without using Hash and a sort_by like method ?
This is an old but recurrent example of using Hash for this problem
I think my concern here is that .to_a.sort_by introduces a conversion to an Array of Tuples first, then calls sort_by on that intermediate Array. A more direct sort_by method wouldn't need that intermediate Array and would instead work with .each directly. I'd have to experiment to see if that is an _actual_ performance problem, but it does feel like, at a minimum, you'd have an extra copy of the collection in memory at once.
I think the confusion arises because sort_by in Crystal is only defined on Array (and Slice?), but in Ruby it's defined on Enumerable. Was there a specific reason that sort_by was not defined on Enumerable in Crystal?
@Fryguy I just gave what I think the reason is above. Also I don't see the extra copy, sort algorithms need an array to operate, so when you chain sort_by! onto to_a, there's no extra. Hash#sort_bys implementation would very likely literally end up as to_a.sort_by!. Note that's what Ruby's Enumerable#sort_by is doing: https://github.com/ruby/ruby/blob/5445e0435260b449decf2ac16f9d09bae3cafe72/enum.c#L1259-L1260
@rmarronnier
For example attributing a score value to an Array or Set of values then picking the 3 most scored ones, what would be the most clean/efficient way to do this without using Hash and a sort_by like method ?
It's generally hard to answer this in the abstract because it just depends so much on what you're doing. So take the following as responses to the example rather than me saying it's like this in general and there's no single exception :)
So, in the example
Here the return value could stay as Array({String, Int32}), the only usage of this is passing it to get_distances which just iterates it:
get_distances too has no reason to convert it back into a hash, its result is soley passed to
which does not make any use of any hash features (min = !distances.values[0..].empty? ? distances.values[0] : 1 -> min = distances.dig?(0, 1) || 1). The pattern continues, the only usage of the result of normalize is
which could just be .dig?(0, 0) or [0][0].
In summary there's a lot of unneeded hashes in this example, getting rid of them would in my opinion make the program simpler (No question of "Why's this a hash?", no doubt of "does a hash have an order?") and objectively faster due to less allocations.
If you want to regain some readability after, it probably makes sense to turn the pairs into a small struct like
record Score language : String, distance : Int32
That should have the same performance as a Tuple and suddenly you know what all these "keys" and "values" actually represent!
Thanks @jhass for this fantastic and extensive answer !
I'll use #dig with Array(Tuple) now :-)
Also thanks for pointing out the record class : it's better than aliases or less overkill than a Struct.
No question of "Why's this a hash?"
Now I realize, the reason I used hashes so much in my different Crystal projects, was because it works (and looks some_hash[key]) the same as a JS object, and the json output would be easier to work with on the frontend side of my web work.
json output would be easier to work with on the frontend side of my web work.
That's a totally legit reason to convert it into a hash, I would just delay it as long as there's no benefit, in other words separate the presentation logic from the business logic :) Just keep in mind the JSON spec does not explicitly say that the order of keys in an JSON object is of semantic value, so that happening in any JSON encoder or decoder would be an implementation detail and something I wouldn't rely on when designing a JSON API :)
Most helpful comment
For future reference,
to get a sorted array of pairs: