Docs: Customer questions on reference semantics with value types

Created on 11 Dec 2017 · 19Comments · Source: dotnet/docs

From a live fyre comment on this topic

This sounds cool but I'm confused about a bunch of things.

Suppose a method takes a readonly struct as an argument. Should that also be an "in" parameter? That seems redundant and unnecessary.

Does this change the recommendations on when to define something as a Struct versus a Class? The recommendations on Struct say it should be a small thing to avoid a lot of copying, but if you define it as readonly then won't it always get passed by reference in which case the size doesn't matter?

If my general practice is to define structs as ReadOnly, then can't the compiler be smart about ALWAYS passing by reference without having to use "in" or "ref readonly".
Overall I'm using a lot of "readonly struct" since I like immutability. Do I really need to do anything else to make things efficient?

/cc @VSadov @jaredpar @jcouv @stephentoub

Source

BillWagner

👍2

Most helpful comment

Whoever is interested in in modifier and readonly structs might find this blog post from Sergey Teplyakov interesting as well.

Quote conclusions from there:

The readonly structs are very useful from the design and the performance points of view.

If the size of a readonly struct is bigger than IntPtr.Size you should pass it as an in-parameter for performance reasons.

You may consider using the in-parameters for reference types to express your intent more clearly.

You should never use a non-readonly struct as the in parameters because it may negatively affect performance and could lead to an obscure behavior if the struct is mutable.

pkulikov on 12 Mar 2018

👍3

All 19 comments

Here are my thoughts:

Suppose a method takes a readonly struct as an argument. Should that also be an "in" parameter? That seems redundant and unnecessary.

Without any modifiers on the method declaration, all arguments are passed by value. You need in, to specify pass-by-reference.

Does this change the recommendations on when to define something as a Struct versus a Class? The recommendations on Struct say it should be a small thing to avoid a lot of copying, but if you define it as readonly then won't it always get passed by reference in which case the size doesn't matter?

I would think that it could. I'd like clarification from the product folks before updating any docs to include this.

If my general practice is to define structs as ReadOnly, then can't the compiler be smart about ALWAYS passing by reference without having to use "in" or "ref readonly".

That is an interesting idea, but I don't think that's what happens now. Open question: should this be a feature request for a future version?

Overall I'm using a lot of "readonly struct" since I like immutability. Do I really need to do anything else to make things efficient?

All arguments and return values in C# are by value. Furthermore, I don't think it is a universal truth that pass by reference is faster than pass by value. You should always measure performance for that kind of work. I would experiment and measure. The fact that the in modifier is only needed at the method declaration should make that reasonably easy to do.

BillWagner on 11 Dec 2017

Suppose a method takes a readonly struct as an argument. Should that also be an "in" parameter? That seems redundant and unnecessary.

Without any modifiers on the method declaration, all arguments are passed by value. You need in, to specify pass-by-reference.

Couple other reasons why this isn't done by default:

Making a struct readonly is designed to be a completely safe, non-breaking change. If it also changed argument passing from value to ref then that would be breaking source + binary change.
Moving from value to in is not a slam dunk win. It's generally better for readonly structs but it can have negative side effects in uncommon cases. Hence a decision that needs to be made explicitly.

Does this change the recommendations on when to define something as a Struct versus a Class? The recommendations on Struct say it should be a small thing to avoid a lot of copying, but if you define it as readonly then won't it always get passed by reference in which case the size doesn't matter?

I'd like input from @terrajobst here as well.

For me the answer is no. This is about adding an extra layer of correctness to an existing decision + allowing the compiler to avoid a lot of unneeded copies. It wouldn't influence my struct / class decision but rather help me enforce behaviors I already want on my struct.

jaredpar on 18 Dec 2017

To me the easiest reason to choose between class and struct is a matter of instance identity.

If you have something called “Customer” and it has “Name” or “Id”, it better be a class, especually if the entity is mutable. Struct would be incinvenient due to byval copying - you can easily end up with two customers named “Bob”, one copy of another but with different orders or something.

If you have some measure-like thing like “Complex” or “Color” where all instances are _the same_ as long as they have the same content, then instance identity that cimes with class is useless - having two distinct instances of “Red” with exactly same values is confusing.

Of course there are some “gray area” cases and immutable/sungleton patterns complicate the choice.

It is still most often a question whether having _distinct instances_ of the type is a desirable thing or not.

VSadov on 19 Dec 2017

We had more relevant feedback on the topic:

It seems that specifying "in" at the call site is optional--or rather, its allowed. However doing so when the parameter is a temporary (like new Point3d() in the second example) generates compiler error CS8156: "An expression cannot be used in this context because it may not be passed or returned by reference."

var fromOrigin1 = CalculateDistance(in pt1, in pt2); // OK
var fromOrigin2 = CalculateDistance(pt1, in new Point3d()); //NOT OK CS8156

Similarly, assigning to a local and passing the result of the assignment suffers the same.

Point3d pt3 = default;
var fromOrigin3 = CalculateDistance(pt1, (pt3 = new Point3d())); // Ok
var fromOrigin4 = CalculateDistance(pt1, in (pt3 = new Point3d())); // Not OK CS8156

Specifying default also exhibits the same behavior

var fromOrigin5 = CalculateDistance(pt1, default); // OK
var fromOrigin6 = CalculateDistance(pt1, in default); // Also Not OK, CS8156

Also, it seems that specifying any literal/constant value along with the in modifier generates the error:

static void TestMethod(in int A) { /* ... */ } 

TestMethod(3); // OK
TestMethod(in 3); // Not Ok

int x = 3;
TestMethod(x); // OK
TestMethod(in x); // OK

Not that I would necessarily do any of the above, but I was just playing around and noticed that there seems to be an inconsistency.

Also somewhat frustrating because R# insists that you specifying the "in" modifier otherwise it generates a warning squiggle and an obscure error message (though probably a bug there and some setting I just need to find)

BillWagner on 2 Jan 2018

👍1

Is there any reason why the compiler could not automatically add "in" if it can be determined that the reference would not be modified in the method? This is assuming that passing a value type by reference would be faster than copying it (and hence a valid optimisation).

long DoStuff(long x, long y)
{
    return x + y;
}

The compiler could determine that the following would be equivalent:

long DoStuff(in long x, in long y)
{
    return x + y;
}

This makes the assumption that the latter would be more performant. My rough test shows it is faster.

rolandh on 10 Jan 2018

👍2

This is assuming that passing a value type by reference would be faster than copying it (and hence a valid optimisation).

How would the compiler deal with such an assumption?

This makes the assumption that the latter would be more performant. My rough test shows it is faster.

Your tests are either incorrect or you ran them on x86. It's practically impossible for long DoStuff(in long x, in long y) to be faster on 64 bit platforms, on the contrary, it will very likely be slower.

The compiler could determine that the following would be equivalent:

Until you add a call inside DoStuff and then the compiler fails to determine that the in version would be equivalent. And all of the sudden you get a breaking API change due to a change that in theory affects only the body of the method.

mikedn on 10 Jan 2018

I have another question on the topic, which I'll rephrase from what I've already asked at the docs page.
Consider the following struct:

    readonly struct MyStruct
    {
        private readonly int value;

        public MyStruct(int value)
        {
            this.value = value;
        }

        public override string ToString()
        {
            // Defensive copy!
            return value.ToString();
        }
    }

Here I use primitive type for a field, but still get defensive copies when calling methods on that primitive type. That obviously happens because Int32 structure is not marked as readonly. Thus, readonly structs have a limitation which may not be apparent at the first look: to avoid defensive copies fields must be readonly structs OR, in case of primitive value types, you must use only primitive type operations (which end up with corresponding IL instructions and not method calls).

Having that, my question is: why aren't primitive value type structs (such as Int32) marked as readonly? In case of Int32, even its m_value field is not readonly though its not modified in the code.

andriysavin on 19 Feb 2018

👍2

bump and +1 to the notion/ask of optimizing the use of in for readonly structs. Seems like there should be a way to check this and add it automatically. Yeah I hear that it's complicated to do, but WE'RE DEVELOPERS MAN THIS IS WHAT WE DO!! 😄

Especially when you consider how oft-used ValueTuple is (or will be used), it seems like this should be a problem to solve to transparently add better performance across the board for those who wish to opt into it.

Of course, showing how much of a newb I am to this topic, I just now looked it up and ValueTuple is using writable fields for its "properties." My impression was that it was/is immutable, boh. It would seem that with the rise/popularity of immutability, along with the new staple of value-based tuples, there would be a further addition here to help enlist into the performance magic.

So yes, I think that's a feature request? 😆

Mike-E-angelo on 12 Mar 2018

Whoever is interested in in modifier and readonly structs might find this blog post from Sergey Teplyakov interesting as well.

Quote conclusions from there:

The readonly structs are very useful from the design and the performance points of view.

If the size of a readonly struct is bigger than IntPtr.Size you should pass it as an in-parameter for performance reasons.

You may consider using the in-parameters for reference types to express your intent more clearly.

You should never use a non-readonly struct as the in parameters because it may negatively affect performance and could lead to an obscure behavior if the struct is mutable.

pkulikov on 12 Mar 2018

👍3

Overall I think that's a pretty reasonable set of recommendations.

jaredpar on 13 Mar 2018

I still have my question unanswered:

why aren't primitive value type structs (such as Int32) marked as readonly? In case of Int32, even its m_value field is not readonly though its not modified in the code.

Please, can somebody comment on this?

andriysavin on 24 Oct 2018

@andriysavin Thanks for pointing this out.

I'll make another PR to address that. In the meantime, here's the text I need to add:

While the unmanaged types are not explicitly marked as a readonly struct the compiler treats them as such and are considered readonly from the perspective of the language. The compiler knows them to be immutable which is even a stronger guarantee than readonly can provide.

BillWagner on 24 Oct 2018

@BillWagner I'm not sure we're talking about the same thing. I mean primitive types like Int32.

andriysavin on 24 Oct 2018

We are talking about the same concept. (I need to make an issue to clearly surface the term "unmanaged type" better). The term is defined in the Pointer types section of the C# language spec. The pertinent paragraph is this:

An unmanaged_type is any type that isn't a reference_type or constructed type, and doesn't contain reference_type or constructed type fields at any level of nesting. In other words, an unmanaged_type is one of the following:

sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, or bool.

Any enum_type.

Any pointer_type.

Any user-defined struct_type that is not a constructed type and contains fields of unmanaged_types only.

BillWagner on 24 Oct 2018

why aren't primitive value type structs (such as Int32) marked as readonly? In case of Int32, even its m_value field is not readonly though its not modified in the code.

They are in current .NET Core builds: https://github.com/dotnet/coreclr/blob/master/src/System.Private.CoreLib/shared/System/Int32.cs

mikedn on 24 Oct 2018

Ping @jaredpar @agocke

See the above two comments. How do we want to resolve this for the docs and the spec?

BillWagner on 24 Oct 2018

@mikedn Yes, saw that too, and that's strange this wasn't done at the same time as readonly feature was introduced.
@BillWagner thanks for explaining. Why I was concerned with this in the first place is because I was observing defensive copies with primitive types at IL level when calling something like Int32.ToString(). Maybe compiler (JIT Compiler?) removes those defensive copies because, as you said, it handles them specially, but this doesn't look obvious.

andriysavin on 24 Oct 2018

Why I was concerned with this in the first place is because I was observing defensive copies with primitive types at IL level when calling something like Int32.ToString().

AFAIK that's the reason why there made readonly in the end. Though the C# compiler should now that they are readonly, it seems that it didn't always take advantage of that and it probably made more sense to make these types readonly than to add more special casing to the compiler.

Maybe compiler (JIT Compiler?) removes those defensive copies because, as you said, it handles them specially, but this doesn't look obvious.

The JIT can't do much about this. It does not know about readonly and it doesn't know that ToString does not modify the value unless ToString inlines. And even then, it may have problems figuring out that copies aren't needed. Once the address of variable is taken things can get complicated.

mikedn on 24 Oct 2018

👍1

In theory C# compiler is even further than JIT from seeing the actual implementation of int.ToString() or double.GetHashCode(), so it cannot assume the methods are not mutating.

Even if mutations would not make sense, there are no specs either C# or CLR that demand or promise that.

We actually went on the route of treating Nulable<T> as readonly. Nullable<T> instances can be large (if T is large), so readonliness would be desirable. And then we had to dial that back since it was demonstrated that it is in fact possible to write code that (ab)uses writeability of the T field.

As I remember it was about at that point that we changed our mind about specialcasing primitives as well.
Basically we realized that there is _cost_ :

additional burden of specing/justifying the special casing in the language spec.
if CLR/FX would annotate primitives as readonly, the above hack is unnecessary but has to stay in the language spec forever.
there is a risk of our assumptions being wrong in some corner case or on some runtime.
After seeing a good deal of alternative/experimental CLR runtimes, mutable System.Int32 or System.Double would amuse me, but would not be very shocking to see :-)

At the same time the _benefits_ from avoiding copying tiny register-size values is diminishingly small.

Annotating primitive types as readonly (if implementation indeed never writes the fields) is the right thing to do IMO.

VSadov on 24 Oct 2018

👍2

Was this page helpful?

0 / 5 - 0 ratings