Runtime: System.HashCode creates a different hash code every time an application is run

Created on 14 Nov 2018  路  6Comments  路  Source: dotnet/runtime

The new HashCode struct currently gives a different result for hash codes every time an application is run. This is caused by the static random seed for xxHash32: https://github.com/dotnet/corefx/blob/dcc1befcbccf7459bfd612a0465948a2301b961b/src/Common/src/CoreLib/System/HashCode.cs#L55

Even though it has always been stated that one should not rely on GetHashCode being stable across framework versions and platforms, to my understanding it actually has been quite stable (at least across framework versions on the same platform).

The reason I would like to discuss this is related to Service Fabric's requirement of stable hash codes for keys in reliable collections. As Service Fabric documentation states:

While you can modify the schema of a key, you must ensure that your key鈥檚 hash code and equals algorithms are stable. If you change how either of these algorithms operate, you will not be able to look up the key within the reliable dictionary ever again.

This basically makes the new HashCode functionality useless when creating custom keys for Service Fabric, unless one would copy the code from .NET core and initialize the random seed with a constant value.

I would like to propose one of the following changes:

  1. Use a constant seed.
  2. Add documentation to HashCode that explicitly states that its hash codes are different every time an application runs.

I would prefer option 1, since it actually makes HashCode useful for situations that require a stable HashCode, but also because it is not really meant as a source for semi-random values, so the added randomness seems pointless. It might also produces faster code, since it won't have to access the static seed anymore.

area-System.Runtime question

Most helpful comment

This basically makes the new HashCode functionality useless when creating custom keys for Service Fabric

Great! These hash codes should not be relied upon. If they were mostly stable across restarts and framework versions then people would start to incorrectly (!) take a dependency on those hash code values. This would make it hard to upgrade the hash code in the future.

It has been a .NET version 1.0 mistake to not have randomized hash codes. They should have been random from the beginning. Since this was not the case people have indeed taken a dependency on concrete hash code values which then indeed has made upgrades much harder.

If you need a stable hash code use a library for that which provides stable hash codes. City hash and Murmur hash are popular names that come to mind. They might have been superseded by now.

All 6 comments

This isn't a new behaviour, it's been like this for some years since hashcode portability was used in an asp.net related attack, the randomisation was introduced in desktop and presumably simply brought over to core as-is. The GetHashCode documentation does cover the scenario you have excplicitly:

鈥ou should never persist or use a hash code outside the application domain in which it was created, because the same object may hash across application domains, processes, and platforms.

I think the word differently is missing in there but it's still fairly clear.
If you want stable hashcodes and are on desktop and not in an attackable position then you could use the configuration element to turn this off, UseRandomizedStringHashAlgorithm. I don't know if core supports this but you can check the source if you like. The other option is to implement hashcodes for all your objects to ensure that they are stable between domains.

Maybe Service Fabric needs to change to not rely on Hashcodes. 馃檭

I don't know if core supports this

You cannot disable randomized string hashing in .NET Core.

@Wraith2 You seem to be talking about the behavior of string.GetHashCode, but this issue is about System.HashCode. While both produce a randomized hash codes (and the same security concerns can apply to both), they're otherwise mostly unrelated. In particular, UseRandomizedStringHashAlgorithm has no effect on System.HashCode.

@LeroyK

Add documentation to HashCode that explicitly states that its hash codes are different every time an application runs.

It's already documented:

HashCode uses a statically initialized random seed to enforce this best practice, meaning that the hash codes are only deterministic within the scope of an operating system process.

As for Service Fabric, since the requirement for stable hash codes seems to be specific to it, maybe it could offer its own helper type for calculating hash codes, that's guaranteed to stay stable?

HashCode uses a statically initialized random seed to enforce this best practice, meaning that the hash codes are only deterministic within the scope of an operating system process.

As for Service Fabric, maybe it could offer its own helper type for calculating hash codes, that's guaranteed to stay stable?

There are two types of hashing in Service Fabric

  1. Hashing to choose partition, where you either map to a predefined set ("Named partitioning"); or n-slices of a Int64 hash (where n is partition count)

Select a hash algorithm

An important part of hashing is selecting your hash algorithm. A consideration is whether the goal is to group similar keys near each other (locality sensitive hashing)--or if activity should be distributed broadly across all partitions (distribution hashing), which is more common.

The characteristics of a good distribution hashing algorithm are that it is easy to compute, it has few collisions, and it distributes the keys evenly. A good example of an efficient hash algorithm is the FNV-1 hash algorithm.

A good resource for general hash code algorithm choices is the Wikipedia page on hash functions.

This part resolves which partition of a microservice holds the data; and is use to determine which instance to communicate with. This needs to be a stable hash; however it is also domain specific as it determines how data is balanced across the system.

  1. Hashing within a partition for a IReliableDictionary<TKey,TValue>

Which has the warning highlighted

While you can modify the schema of a key, you must ensure that your key鈥檚 hash code and equals algorithms are stable. If you change how either of these algorithms operate, you will not be able to look up the key within the reliable dictionary ever again.

ReliableDictionaries are persisted to disk; can transfer process and host and maintain their state across restarts, which is why it needs to remain a stable hash. The hash is determining how the index; for essentially a no-SQL database, is constructed.

Outside of the numeric types; which are their own hashcodes; .NET's standard hash codes are more ephemeral and only valid for the lifetime of the process (e.g. object.GetHashCode is not based on the value of the object); so hashcode equality generally doesn't exist across restarts for almost any type (except as noted where the hashcode is the same as the value).

Saying all that... Maybe an InsecureHashCode or UnsafeHashCode type that was stable would be useful

This basically makes the new HashCode functionality useless when creating custom keys for Service Fabric

Great! These hash codes should not be relied upon. If they were mostly stable across restarts and framework versions then people would start to incorrectly (!) take a dependency on those hash code values. This would make it hard to upgrade the hash code in the future.

It has been a .NET version 1.0 mistake to not have randomized hash codes. They should have been random from the beginning. Since this was not the case people have indeed taken a dependency on concrete hash code values which then indeed has made upgrades much harder.

If you need a stable hash code use a library for that which provides stable hash codes. City hash and Murmur hash are popular names that come to mind. They might have been superseded by now.

Closing this issue as the question has been answered above by several people. The randomized ephemeral hashcode is indeed by design and not cross-version tolerant.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

v0l picture v0l  路  3Comments

nalywa picture nalywa  路  3Comments

btecu picture btecu  路  3Comments

GitAntoinee picture GitAntoinee  路  3Comments

matty-hall picture matty-hall  路  3Comments