Roslyn: Tuples and Dictionaries - Mutable fields change the GetHashCode result

Created on 8 Feb 2017  路  26Comments  路  Source: dotnet/roslyn

Multable value types are evil (https://blogs.msdn.microsoft.com/ericlippert/2008/05/14/mutating-readonly-structs/)

It breaks Dictionaries keys. There is any guidance in this case? Would be nice to have a compiler warning.

        var x = GetLatLng("");

        var dic = new Dictionary<(double lat, double fff), string>();
        dic[x] = "123";

        Console.WriteLine(x.GetHashCode());
        Console.WriteLine(dic[x]);

        x.lat = 3;
        Console.WriteLine(x.GetHashCode());
        Console.WriteLine(dic[x]); //KeyNotFoundException

Thanks

Area-Language Design

Most helpful comment

You can write an analyzer if you'd like to catch this sort of thing. Note that the above seems to be working properly. You're looking up with a different key, so you don't find the item. That's a good thing :)

All 26 comments

You can write an analyzer if you'd like to catch this sort of thing. Note that the above seems to be working properly. You're looking up with a different key, so you don't find the item. That's a good thing :)

You are right, mutable value types are evil and making the value tuples was a dreadful decision by the language team.

Sadly, having let the genie out of the bottle on this one, there is little the rest of us can do about it to ensure it is never a problem. You could try just being disciplined and not mutate them, but a mutation by a naive member of your team could be missed even with code reviews. You could, as @CyrusNajmabadi says, create an analyzer to try and detect such mutations, but it would be complex to write. And even the best analyser is unlikely to be able to detect eg such keys being passed around using ref parameters or ref returns and then mutated, so the risk of bugs being introduced in your code will remain.

Isn't the example kind of like

        var x = 1;

        var dic = new Dictionary<(int val), string>();
        dic[x] = "123";

        Console.WriteLine(x.GetHashCode());
        Console.WriteLine(dic[x]);

        x = 3;
        Console.WriteLine(x.GetHashCode());
        Console.WriteLine(dic[x]); //KeyNotFoundException

For it to work you'd want to use the same value

        var x = GetLatLng("");

        var dic = new Dictionary<(double lat, double fff), string>();
        dic[x] = "123";

        Console.WriteLine(x.GetHashCode());
        Console.WriteLine(dic[x]);
        var y = x; // copy x's value before changing it
        x.lat = 3;
        Console.WriteLine(y.GetHashCode());
        Console.WriteLine(dic[y]); // OK

@benaadams,

No that isn't an equivalent example. int is an immutable value type, so the tuple equivalent of your code would be:

````cs
var x = (1, 1);

var dic = new Dictionary<(int, int), string>();
dic[x] = "123";

Console.WriteLine(x.GetHashCode());
Console.WriteLine(dic[x]);

x = (3, 3);
Console.WriteLine(x.GetHashCode());
Console.WriteLine(dic[x]); //KeyNotFoundException
````

In this case (as with your int example), a whole new value is being used and of course the key isn't found. Because value tuples are mutable though (going against the language's own advice on structs), a more insidious problem exists, whereby the value gets copied to the dictionary, so modifying the original copy doesn't modify the dictionary's key. This problem doesn't occur with reference types, nor does it occur with immutable values. It only occurs with the toxic mix of structs that are mutable that use those mutable sub-values to determine their own equality.

structs that are mutable that use those mutable sub-values to determine their own equality.

There is no other way to identify a struct, a struct is its sub-values; it is nothing else.

Reference types are generatlly comparing an IntPtr value not their contents.

Strings are a bit of a weird one, being a reference types; as they generally aren't reference equal, but content equal.

Hmm, I phrased that badly. Let me try again:

The problem only occurs with the toxic mix of structs and mutability. As you say, structs have to use their sub-values to determine their own value. Change that sub value, and the value of the value changes. The fact that "the value of the value changes" sounds nonsensical neatly summarises why allowing values to change themselves is a bad thing to have happen.

And that is why structs should be immutable and why allowing value tuples to be mutable was a bad decision: they are values that can change their value; ergo they are nonsensical.

Making them immutable doesn't really change much. You could do the following which is the same meaning; just more inefficent to write and to execute

var x = GetLatLng("");

var dic = new Dictionary<(int lat, int long), string>();
dic[x] = "123";

x = (3, x.long);
Console.WriteLine(dic[x]); //KeyNotFoundException

@benaadams,

That example now works the same as your int example. It's the way value tuples should work.

Ints aren't a good comparison though. Strings are better. The way value tuples can mutate their value is equivalent to be able to write something like:

var x = "123";
x.Append("4");
// x is now "1234"

Strings are effectively value types, they are just classes for purposes of efficiency in passing them around the system. Despite being references, they contain a value and it makes no sense for that value to change its value, thus they are immutable.

I'm aware of the problem, in fact it isn't a bug that I found, I forced it just to exemplify.

I'm only concerned with most developers, who do not know this and the problem may occur in another part of the application, not related to the bug itself.

Mutable structs do not cause a problem with dictionary keys because you are never modifying the key in the dictionary after you put it in the dictionary because you only put a copy of the struct in the dictionary. On the other hand, mutable reference types do have the exact problem you are pointing out, because you are modifying the same instance the dictionary is storing. And by doing so you are changing its computed hash code so that no matter if you continue to use the same instance to look into the dictionary you will no longer find the item using that instance as a key because the dictionary uses the hash codes first when looking up the key to find the value.

Do you mean in case you override GetHashCode in a reference type?

If you have a reference type who's GetHashCode is computed based on its current member values and those values are mutable ... that will get you into the dictionary key problem. Structs don't cause this problem.

Yeah, I just made a test.

ValueTuple<int, int> x = new ValueTuple<int, int>(10, 20);

var dic = new Dictionary<ValueTuple<int, int>, string> ();
dic[x] = "123";

Console.WriteLine($"X HashCode: {x.GetHashCode()}");
Console.WriteLine($"X Value from Dictionary: {dic[x]}");
Console.WriteLine();

Console.WriteLine("Changing X");
x.Item1 = 3;
Console.WriteLine($"X HashCode: {x.GetHashCode()}");

try
{
    Console.WriteLine($"X Value from Dictionary: {dic[x]}"); //KeyNotFoundException
}
catch(KeyNotFoundException ex)
{
    Console.WriteLine(ex.Message);
}
Console.WriteLine();

var y = new ValueTuple<int, int>(10, 20);

Console.WriteLine($"Y HashCode: {y.GetHashCode()}");
Console.WriteLine($"Y Value from Dictionary: {dic[y]}");//It works

Console.ReadLine();

So the problem isn't GetHashCode (or just it). Dictionary just can't find the key, because it changed (and consequently the Hash Code).

Yes, the key on the outside of the dictionary changed. So it can't be used to find the value in the dictionary. The problem you were originally describing only happens when the hash code of the key held inside the dictionary changes, and it no longer matches the bucket that the hashing algorithm put it in.

Mutable tuples are generally safe from unwanted tampering until you start passing them around by ref.

@mattwar Is correct here. Mutable tuples are generally considered bad because when you copy them around, your mutations to one won't affect the other. However, that lack of external mutability is precisely what you want from a tuple.

i.e. i can make a tuple for myself, do whatever i want with it, and then pass it out to something else (like a dictionary). If i then make any changes to the tuple, or the other side makes any changes to the tuple, we are both insulated from each other.

However, each of us is free to do whatever we want with our copy of the tuple. Tuples combine the safety of copying, with the ease of use of being able to just easily (and cheaply) operate on the tuple you currently have.

Alternate options would have been:

  1. Immutable struct tuples. This would have been 100% safe, but would have been very non-ergonomic in usage, and would likely have a perf cost if each time you wanted to locally mutate that a copy would be necessary.
  2. Mutable reference tuples. These would have been terrible for the reason outlined in the OP. You'd have multiple people pointing at the same reference, but anyone could arbitrarily mutate data out from underneath you.
  3. Immutable reference tuples. These would have been 100% safe, but would have been non-ergonomic to use, and would be quite costly due to the memory allocations.

Of the four possible options (mutable crossed with struct/reference type), mutable-structs were the best. They had the following virtues:

  1. You are safe from your data being changed by another party. i.e. only you can mutate your tuples.
  2. They're ergonomically quite nice to use (i.e. no need to continually make new instances, which you need to continually copy data into).
  3. They're extremely lightweight. Commonly you should see no overhead to them at all.
  4. They don't incur any heap/GC costs.

@CyrusNajmabadi I didn't realize ergonomics was a consideration for programmers. Most just slouch in the chairs anyway.

@mattwar that's why the ergonomics need to be very good; to compensate 馃槈

Mutable tuples might be "ergonomic", ie easier to write. However, as we all know, code tends to only get written once and then read dozens, even hundreds, of times thereafter.

Code isn't just read though. A developer will (try) to execute the code in their head to understand what it is doing. I recently did a very informal experiment with a small group of 12-13 year olds, that were only just being exposed to programming. I showed them two pieces of C# code: one written in a declarative style; one in an imperative way. I then talked them through what each piece did and then asked them questions about their understanding of it. Having experience of algebra, they were initially confused by mutability. But they quickly grasped the concept, but really struggled to keep track of mutating values through the code. They had no such problems with the declarative code: they found it more intuitive and easier to reason.

It's no coincidence that those that using mutation a lot have to reach for the debugger, or a piece of paper, to help them understand what a piece of code is doing. Keeping track of changing values in your head is hard; code is much easier to read when the value of a variable doesn't change.

So yes, mutable tuples do offer a small ergonomic benefit; but that benefit is totally eclipsed by the complexity cost it incurs. It isn't a price we should have had to pay. Tuples should be immutable.

It isn't a price we should have had to pay. Tuples should be immutable.

If tuples are immutable, you also pay the other costs that i mentioned.

You are welcome to only use tuples in an immutable fashion in your code. You're also welcome to provide an analyzer for others like you that want to keep things totally immutable.

We cannot make mutable tuples out of immutable tuples. So if we took your approach, then everyone would be SOL. On the other hand, by being mutable, people who benefit from mutability can have it. And those that want immutability can get it too.

remember that these are structs. So being mutable + the analyzer never harms you. You can pass your tuples to whoever you want, and they only get copies. So your tuples are always assured to be 100% immutable if you write this analyzer.

but that benefit is totally eclipsed by the complexity cost it incurs

I disagree (as did the rest of the LDM). Hobbling tuples from the start with a perf problem is a surefire way to make things more difficult for them down the line.

Again, we had to consider:

  1. ergonomics
  2. safety
  3. performance

You don't get to ignore some parts of this in your assessment.

Keeping track of changing values in your head is hard; code is much easier to read when the value of a variable doesn't change.

Making tuples immutable would not help with that. For example:

```c#
var v = (x, y);
dict[v] = a;

v = (y, x);
var c = dict[v]; // fails unless x == y

Here is a place where i never mutated a tuple, but where the exact same problem occurs.  If you wan to solve this by making a new value.  i.e.:

```c#
var v1 = (x, y);
dict[v1] = a;

var v2 = (y, x);
var c = dict[v2]; // fails unless x == y
```

Then you can still do that today. Nothing about tuples forces you to use their mutability. but their mutability is there when you need it so that you're not hurting performance in many cases.

We cannot make mutable tuples out of immutable tuples. So if we took your approach, then everyone would be SOL

Incorrect, for the same reason that #16869 isn't a simple change: "the definition of ValueTuple can be controlled by the user". If tuples being immutable were genuinely a problem for someone, then they would simply link to a nuget package that offered mutable versions of ValueTuple, rather than the "official" one.

Of course the saving grace for those of us that take the "mutable value types are evil" stance is to create a nuget package that contains versions of ValueTuple that are immutable. Job done. Just as shame that the "official" version takes the premature optimisation approach and that an alternative is needed to improve readability, rather than improve performance.

Non-ref ValueTypes are kinda immutable anyway; in the "immutable" collections definition. You get passed one and the only it will change is if you change it, it won't effect the caller's version or the version of anything you passed it to.

They aren't just readonly, where you'd get a compile error if you tried to change it.

Just as shame that the "official" version takes the premature optimisation approach

A framework is where you want the most optimizations. It it the upper bound on performance.

Every additional bit of application code you add will generally make it slower. A high performing framework is the best of both worlds; its both fast, and you can make it slower without changing it. A low performing framework, you can still make slower, but you can only make it faster by changing it (or throwing it out).

In a "cloud"/hyper-scale world where you can size your deployments to the right size on demand and pay for only what you need; the performance of the framework is even more important. If your hot-path is 2x as slow; you need twice the resources to run it, which is now directly double the financial cost.

Or put another way, for the same spend a lower performing framework means you can do less.

simply link to a nuget package that offered mutable versions of ValueTuple, rather than the "official" one.

That will not work, while achieving all those goals i mentioned. You will not be able to mix and match tuples in this manner. If i get a tuple back and it is immutable, then linking to a mutable value-tuple API won't suddenly make those tuples mutable. I'll still have to take the hit of an additional copy to put it in mutable form.

Just as shame that the "official" version takes the premature optimisation approach

It's not a premature optimization. it's a very real perf hit. We decided on this precisely because of the perf overhead we saw.

We are the framework. Any perf hit we impose is hit by everyone. The framework needs to not have a poor perf story. If people want poorer perf, they can opt into it themselves.

Of course the saving grace for those of us that take the "mutable value types are evil"

Perhaps that's the issue. Mutable value types aren't evil. They actually work quite well for numerous tasks. For example, when you foreach over most framework types, a mutable struct is involved. This happens when foreach'ing over a List<T> for example. The mutable struct provides all those benefits i outlined above. You can easily enumerate these types, while getting practically no perf overhead. An immutable struct would not have allowed for this. Neither would a mutable-reference type.

Mutable-structs serve a very real, and very useful need. And it happens that Tuples precisely fit into the category that mutable-structs solve.

--

Also, to be clear: mutable structs are effectively immutable in the way that most people want. Namely that they can pass these values out to other domains, and be assured that their tuples will not change.

You of course aren't the framework; you are the language. And value tuples aren't part of the framework either, as they exist in a separate nuget package. Of course it would be very naive of me to think this won't suddenly change, without warning of course, as per what happened to .NET Core and project.json. But until that happens, people can create their own implementations of ValueTuple (and it's clear other members of the team expect this to happen) and as you say this could lead to nasty surprises when two projects, using different versions of ValueTuple are linked together.

Both @CyrusNajmabadi and @benaadams argue that performance is king, but then both argue that mutable structs are "kinda immutable anyway", in a squinting eyes, waving hands in front of face, kind of way: because they are copied, not passed by reference. However, that copying is exactly the thing that both are arguing shouldn't have to happen to mutate them. Bit of an oxymoron going on there, In reality, that contradiction won't occur: those concerned with performance above all else will of course use ref parameters and returns to pass around a reference to a single instance of this mutable struct. As Ben says, not doing so will increase the financial cost.

Value tuples create an fascinating situation. Currently, as it's nuget based, those of us lucky enough to be able to prioritise clarity over performance can use our own immutable versions, but run the risk of linking to APIs that rely on mutability. If the package is suddenly made part of the framework, code reliant on non-MS versions will likely break, but it will mean optimisations, such as #16869 become completely safe to implement.

If the package is suddenly made part of the framework, code reliant on non-MS versions will likely break

I think if you are using a strong named implementation, it shouldn't break? (And don't use same strong name as the MS nuget ValueTuple one)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

NikChao picture NikChao  路  3Comments

nlwolf picture nlwolf  路  3Comments

MadsTorgersen picture MadsTorgersen  路  3Comments

vbcodec picture vbcodec  路  3Comments

AdamSpeight2008 picture AdamSpeight2008  路  3Comments