I stumbled on this by chance of having a large list as part of a record and something in WPF calling GetHashCode.
Run this code, with or without tail calls enabled
[0..5000000].GetHashCode()
I would expect a result, perhaps even if only based on the first n items in the list.
Execution terminates with StackOverflowException
VS 2015 Update 3
FSharp.Core 4.4.0.0
.NET Framework 4.6.1
I would expect a result, perhaps even if only based on the first n items in the list.
Yes we should compute a result - but it needs all items.
@dsyme any hints to where I can find the impl?
does it fall back to IEnumerable.GetHashCode()?
@forki maybe we need all the items for backwards compatibility, but without a backwards compatibility requirement there might be a case for limiting the items included in the calculation
e.g. (Seq.initInfinite id).GetHashCode() returns in a finite amount of time :)
@forki I think this will be hard to fix. Check the implementation of the GetHashCode override for the list type, generated by src\fsharp\AugmentWithHashCompare.fs. It should ideally be a tail-calling loop (turned into a real loop by IlxGen.fs)
tbh I can't fin the list case. I assume it uses mkUnionHashWithComparer!?
tbh I can't fin the list case. I assume it uses mkUnionHashWithComparer!?
Yes
Can we confirm this is not a regression please? I'm pretty sure it isn't, indeed it is really a feature request?
It is a feature request that something doesn't blow up in stack overflow? ;-)
To be more serious: I think if we just return 0 instead of blowing up (somehow try catch the things) it would probably already solve many issues.
@forki The bahviour has to be backwards-compat, without changing the hash codes.
IIRC it's hard to make this tail-recursive but I can't remember the details.
Can we confirm this is not a regression please? I'm pretty sure it isn't, indeed it is really a feature request?
For the record, I can confirm this blows up in both debug and release builds using F# 4.0, VS2015, Win7, x86 and x64 builds (if that'd make a difference).
I don't know why this needs to be hard to be tail-recursive. In fact, I think we owe it to the users to be tail-recursive, even if it is hard. Didn't Jon Harrop once claim that _every_ recursive function can be turned into a recursive one? I'm no expert, but if that's true..., we should, right?
The current algorithm (as I see it in Reflector on FSharpList<T>) is first calculating hash of tail (the recursion), then combines that with head and some bit shifting. Unless the requirement is that it must behave exactly the same (except for the uncatchable SO exception, sorry @forki, you cannot catch this one and then return 0), it shouldn't be too hard to reverse the order, or add a continuation.
I don't know why this needs to be hard to be tail-recursive.
One easy solution is to hand-write the GetHashCoe implementations for F# lists, rather than rely on the generated augmentation code.
The code should be written in such a way that the hash codes don't change from today
The code should be written in such a way that the hash codes don't change from today
@dsyme, Not sure I follow. You mean with "from today" that historical hash codes should be equal, that is (hash [1;2] in vX must be integer-equal to hash [1;2] in vNext) or that BCL hash code must be integer-equal to whatever we come up with for F# lists?
Yes otherwise we would break existing programs really badly.
Am 05.12.2016 2:15 nachm. schrieb "Abel Braaksma" <[email protected]
:
The code should be written in such a way that the hash codes don't change
from todayNot sure I follow. You mean with "from today" that historical hash codes
should be equal, that is (hash [1;2] in vX must be integer-equal to hash
[1;2] in vNext) or that BCL hash code must be integer-equal to whatever
we come up with for F# lists?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/visualfsharp/issues/1838#issuecomment-264850880,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADgNMJVGbunnLnatUoJfyZUi4e538_mks5rFA5WgaJpZM4K8i_a
.
@dsyme "The behaviour has to be backwards-compat": may I ask the reason for this? The MSDN docs at https://docs.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=netframework-4.7.2 are explicit that backwards-compatibility is not required of GetHashCode: "The GetHashCode() method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's System.Object.Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again."
Yes I've seen @dsyme say this also on a issue regarding hash collisions. I'm very curious who it is we're making these back-compat for, not to sound conspiratorial. I just mean like how common is this actually a real need because I thought it was common knowledge not to depend on the hashes being the same. If you're going to store the hashes you should really be using an industry standard hash function anyway.
I now agree we don't need to be backwards compatible, and I've been quite surprised that the lack of determinism/stability for .NET hash codes (e.g. of strings) has not seemed to have been a problem in practice.
Still, I'm not particularly keen on changing things unless we really, really have to and we address a number of known issues simultaneously. An RFC would be needed I think... And several more people than just myself approve it....
My gut feeling is that any variation on the current solution will have either perf or correctness issues on massive recursive data structures, and in these situations it's better to implement your own hashing according to your needs.
@dsyme The existing implementation also causes hash collisions with larger sets of data which leads to performance impacts. So anyone using massive recursive data structures are probably already using their own hash, however good defaults are a good idea. No reason to keep a bad default if nobody needs back-compat. I do agree though that this is something that is going to take some community involvement from major players.
https://github.com/fsharp/fsharp/issues/343
related... https://github.com/fsharp/fslang-suggestions/issues/366
on massive recursive data structures
@dsyme, But lists aren't recursive (unless you mean the internal datastructure) and they blow up as well! And on the current (15.9.2) version of VS + F# they blow up much sooner than the OP mentioned:

That is a list of only 60k items. That can hardly be considered a large list, let alone huge.
I agree that we cannot anticipate each and every (recursive) data structure, but the F# Core library should have decent, stable and performant implementations for its own datatypes that don't blow up. I'm thinking of list, array, option, map, seq etc. I think only list is a very surprising one that blows up very ugly. And you cannot even override it unless you would create your own type. A little much for just letting list play nice with BCL functions that require GetHashCode.
My gut feeling is that any variation on the current solution will have either perf or correctness issues
Perhaps, but can we special-case list? After all: _Nobody expects the Spanish Inquisition_ ;).
@abelbraaksma I don't think we should just solve list if we're going to do this. If we're worried about correctness perhaps we can collaborate with the F* team. We can give a build to people who use F# in production at scale and they can help us see if it's slow, and if it is we'll adjust the approach.
I don't think telling people to roll their own hash as we have done in the past is a real solution, especially now that we're starting to doubt that backwards compatibility is strictly required.
This seems to be a related issue: https://github.com/fsharp/fsharp/issues/884
And it blows up with StackOverflowException between the call to a very large (generated) function update and the first line in it (printfn).
In addition, the firs call to it (for a smaller amount of code) takes a very long time, like 10-15 seconds. However, all subsequent calls are much faster. If this is indeed hash related, then in that particular case I can't even rewrite it!
I now agree we should fix this.
Re-labeling this now that we won't commit to generating an identical hash code as before
Any progress on this? We were hit by this in production, and it was a bit of a detective's work to find the root cause.
Yeah, I've all kinds of hacks to prevent these to blow up, and still sometimes miss a case (you must wrap lists into a new type that implements IEquatable/IComparable whenever you use it with any of the default functions that require comparison or equation). It's really easy to forget doing that.
I've looked at this before and a fix doesn't seem to be too hard for the trivial case of lists. But I think the discussion in this thread suggests it should be fixed for the general case of any recursive standard data type (DU or F# record, I assume). That's probably a bit harder (if at all possible, perhaps instead we should raise a warning when GetHashCode is not implemented for a recursive type).
Hey, I ran into that issue a few days ago. Any progress on that one? The problem is that it's really hard to circumvent when hashing composed values that contain lists.
Basically one would need a whole code-generator for custom hashes.
@krauthaufen, looking at the linked #9070, which was started to fix this, a fix lead to unexpected changes in code generation. I've had the intention to give it another look or use a different approach. I'll see what I can do.
Workarounds involve things like wrapping List in a new DU where you override the GetHashCode method, or doing so in every type that includes list types. It's unf. easy to forget and easy to get wrong.
Yeah, unfortunately this one will be tricky without risking a source breaking or binary breaking change on one of the most-used data types in FSharp.Core: https://github.com/dotnet/fsharp/pull/9070#issuecomment-640319517
The alternative fix is to "flatten" the structural hash function generated for lists. Since a list is defined as recursive DU, this change would also affect all other recursive DUs unless it were specialized only to list. Either way it's tricky business.
Hey, personally I think that changing HashCode values for types is not really a breaking change but if you're commited to keep the hashcodes identical I think there will not really be a way to solve this with tail-recursion/loops.
Generally speaking shouldn't all DU hashes be implemented accumulator-style s.t. cases like lists 'accidentially' are tail-recursive?
Note that my concerns have nothing to do with the actual hash code values. It's that the emitted API shape is different than if you explicitly implement those interfaces in List.
Most helpful comment
@dsyme "The behaviour has to be backwards-compat": may I ask the reason for this? The MSDN docs at https://docs.microsoft.com/en-us/dotnet/api/system.object.gethashcode?view=netframework-4.7.2 are explicit that backwards-compatibility is not required of
GetHashCode: "The GetHashCode() method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's System.Object.Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again."