Runtime: Proposal: Array.Clone for creating a copy of an array.

Created on 5 Nov 2016 · 33Comments · Source: dotnet/runtime

Sometimes you need to create a copy of an array. For example, one reason is arrays are mutable and sometimes you need to create a defensive copy, or you may want to modify some of the indices in it but keep the original array intact. There are a couple of ways of doing this right now:

Approach 1

var array = new T[original.Length];
Array.Copy(original, 0, array, 0, original.Length);
return array;

The downside of this approach is that it's verbose (compared to the second approach below), and can't be used in things like expression-bodied methods where you want a one-liner to clone an array. It is, however, fast since we go straight to allocating an array and calling Copy.

Approach 2

return (T[])original.Clone();

While this can be used in expression-bodied methods, the programmer has to specify T[] again to cast, which is prone to error since Clone returns an object. In addition, Clone is slightly slower than Copy and we have to make that extra typecast.

Approach 3 (Linq)

return original.ToArray();

This can also be used in expression-bodied methods and is type-safe, however it is slow. Several casts are made by ToArray, then it makes virtual method calls to Count/CopyTo which do the same thing as approach 1 under the hood.

Proposal

Add a static Array.Clone method:

namespace System
{
    public abstract class Array : ...
    {
        public static T[] Clone<T>(T[] array);
    }
}

Usage:

class Customer
{
    Order[] _orders;

    public Order[] GetOrders() => Array.Clone(_orders);
}

The implementation would be trivial and do the same thing as example 1:

public static T[] Clone<T>(T[] array)
{
    var array = new T[original.Length];
    Array.Copy(original, 0, array, 0, original.Length);
    return array;
}

api-needs-work area-System.Runtime up-for-grabs

Source

jamesqo

👍6

Most helpful comment

@mellinoe

To me, the higher order bit to consider is whether a given enforcement of non-null is helping or hurting. The case where missing validation is hurting is when it hides bugs. For example, operations that have side effects should generally validate arguments instead of silently no-oping.

Operations like Array.Clone that return null values are hardly hiding bugs. If subsequent code can't deal with null, the user gets a clean null-reference exception.

On the other hand, accepting and returning null avoids having to special case situations that would be perfectly well defined anyways.

terrajobst on 16 Nov 2016

👍2

All 33 comments

Sounds reasonable to me. Not sure how many people would use it though ...
Let's get API review feedback on the proposal.

karelz on 5 Nov 2016

Makes sense.

terrajobst on 15 Nov 2016

What are the semantics when the array is null? Should it return null or raise an exception? If only non-null values are permitted, then rather than static, this could be an instance method. That way the null check as well as potential exception could be avoided.

bendono on 16 Nov 2016

I know accepting null as the this to an extension isn't the common pattern, but I think it would be more useful.

JonHanna on 16 Nov 2016

To be clear, Array.Clone would be a static method and not an extension method. It can't be an extension method because an instance method with the same signature exists and thus will be preferred by the compiler.

I think accepting and returning null would be useful.

terrajobst on 16 Nov 2016

@JonHanna, @terrajobst Why do you think returning null for null would be useful? If this is meant to supplant usages of (T[])array.Clone(), and if that throws a (nullref) exception today if array is null, then wouldn't it make sense for Array.Clone(array) to also throw if array is null?

jamesqo on 16 Nov 2016

I was thinking of cases where I've implemented Clone() for compound objects. It's much easier if you can simply call the Array.Clone(array) method without having to worry whether the array is null or not. Cloning a null array is well-defined by returning null.

terrajobst on 16 Nov 2016

👍2

Yes, generally how I'm going to treat null falls into one of the following:

I've my own special case rule — the approaches are equal, because I have to test anyway.
I want null for null — just passing null and getting it back is easier than having to test to avoid the ANE.
A null shouldn't be there and will eventually error. ANE sooner than later is slightly better, but not a lot; it's still going to error either way.
The null won't be there. The ANE check is a waste. A tiny waste, but a waste all the same.

More philosophically, I prefer things to work with null if it makes any sort of conceptual sense. To my mind it's perfectly sensible that a clone of null is null.

JonHanna on 16 Nov 2016

@terrajobst @JonHanna Makes sense. I'll update the PR to return null, then.

jamesqo on 16 Nov 2016

👍1

I'll throw in a dissenting opinion here and say that null shouldn't be accepted as input. It's a small distinction and obviously we could define the semantics/document it however we want, but it seems semantically and conceptually cleaner to require the parameter to be a real, non-null array.

To my mind it's perfectly sensible that a clone of null is null.

This is wandering into personal preference territory, I guess, but I feel sort of the opposite. null isn't actually an array, so passing it to a method taking an array doesn't make conceptual sense to me.

mellinoe on 16 Nov 2016

👍1

@jkotas @stephentoub Do you have an opinion on this?

jamesqo on 16 Nov 2016

@mellinoe

Operations like Array.Clone that return null values are hardly hiding bugs. If subsequent code can't deal with null, the user gets a clean null-reference exception.

On the other hand, accepting and returning null avoids having to special case situations that would be perfectly well defined anyways.

terrajobst on 16 Nov 2016

👍2

Sure, we can define the semantics however we want. And like I said, it's a pretty small distinction anyways. But to me, Clone()ing a null object doesn't make conceptual sense. If I saw some code relying on that behavior, I might be slightly confused for a bit, and have to re-evaluate my assumptions. I'm just throwing my opinion in here, if others feel differently then it's no big deal.

If subsequent code can't deal with null, the user gets a clean null-reference exception.

I dunno; maybe. That's rather simplifying things. Perhaps your code can handle a large variety of circumstances involving null, empty, various-sized arrays, and you make the assumption that Array.Clone doesn't give you back a null value. Maybe you're in a weird state now that you didn't anticipate. I think it's more nuanced than either "everything works fine" or "you get a clean, easy-to-track-down NRE right where you expect it".

Anyways, I'm happy to agree with whatever others feel is the right call; just wanted to get my thoughts out there.

mellinoe on 16 Nov 2016

Can we take a step back? Why are we adding this method? At this point, its implementation is just going to call the instance Clone and do a cast, so the only value the method provides is doing the cast for you. But it also has to do a null check, it incurs generic overheads, and as a new method it's not portable. Doesn't seem like enough value to me.

All that said, if everyone else still wants to add it, I'd vote for null being an error.

stephentoub on 17 Nov 2016

Yes, the primary value this method offers is syntactic sugar by preserving the type of the array instead of returning object.

terrajobst on 17 Nov 2016

syntactic sugar

In this particular case, it is a syntactic sugar mixed with salt because of it is not extension method (for source compat) that it should really be for good flow. I agree with @stephentoub that it does not have enough value.

jkotas on 17 Nov 2016

👍1

Why do you say that? Especially on array instance methods aren't common. Most of the functionality is provided as static methods on Array.

terrajobst on 17 Nov 2016

@jkotas, if we were to add an extension method, does array.Duplicate sound like a good name?

jamesqo on 17 Nov 2016

Why do you say that? Especially on array instance methods aren't common

Clone is existing instance method on array and many other types. If you got int[] a and start typing a.Clone(), the intelisense will give you the existing one that returns object. The static Clone would be pretty hard to discover.

The existing Clone() with cast does not look that bad, considering all alternatives. There are number of instance Clone() methods in the framework that return less derived type, and it is not uncommon to see them together with cast.

The array surface was designed in v1 before framework designed guidelines, generics, extension methods, and other goodies. I believe it would have been quite different if it is designed today.

does array.Duplicate sound like a good name

It has the same discoverability problem.

jkotas on 17 Nov 2016

@jkotas

If you got int[] a and start typing a.Clone(), the intelisense will give you the existing one that returns object.

I get that. But what I'm trying to say is that we've trained developers to look for static methods on Array. Of course, an instance method would probably be more discoverable, but at the same time I think not having to type the extra parentheses and the type names is still worth it -- you just have to learn which method you need to call. Less ideal, but given how important arrays are not outrageously bad either.

Does array.Duplicate sound like a good name?

We should most certainly not introduce a method using a different name. Being consistent with terminology is key. We've learned that the hard way with things like Close vs. Dispose.

terrajobst on 17 Nov 2016

In well written code, a developer should know if a value could be null.
It's a nice goal to aspire to, but real world code is not always so clean as it should be.
If this static method does not accept nulls, I imagine that it will generally need to be used as follows:
var copy = array != null ? Array.Clone(array) : null;
It's not too bad--and there are shorter ways to write it--, but I see it as advantageous to avoid this this boiler-plate code and simply write the following, knowing full well that null is possible and handle it later if/when needed:
var copy = Array.Clone(array);
Essentially, I want a clone of whatever I currently have. If it is null, then so be it. I'll figure it out later.

bendono on 17 Nov 2016

👍1

@stephentoub and @jkotas

Any push back on the API? I know you guys said you don't think it adds enough value, but it' clear folks disagree. So unless you strongly think we shouldn't add this API I'm inclined to accept this API as proposed.

terrajobst on 17 Nov 2016

👍1

Any push back on the API?

I still think it's silly to add. Any new API we add increases the differences between the various .NET implementations. It's one thing to do so for things that really move the needle; this doesn't. There's also the extra costs involved in it being a generic method. There's potential confusion around instance vs static Clone. Etc. Maybe these are minor concerns, but the value of the API is also so incredibly minor, just doing a cast for you.

stephentoub on 17 Nov 2016

I do not think we should add this API.

jkotas on 17 Nov 2016

👍1

I'm really interested if we can get a real boost (which if possible would likely need to be at the level of the array instance method C++ code). Otherwise I'd rather have the syntactic vinegar of an explicit cast to the sugar of a hidden cast.

JonHanna on 17 Nov 2016

@jkotas As @JonHanna mentions, it doesn't have to be a wrapper around the existing Clone. I don't know as much about the current Clone as you do, but presumably there are things like checks in the C++ code to handle multi-dim arrays, non-zero based arrays, etc. If we added this API, we could remove those checks since we know the input is an SZ array. Also, since this would be implemented in C++ we would have direct access to the Array.Copy implementation for example, and be able to shave off a few method calls.

jamesqo on 18 Nov 2016

If the implementation can be made meaningfully faster (and the instance method can't), then I'd understand the method being added when those improvements come, not before.

stephentoub on 18 Nov 2016

Agree.

IMHO, it would pretty hard to demonstrate that the generic API has significant performance advantage over the non-generic one, everything else being equal.

If we added this API, we could remove those checks since we know the input is an SZ array

The existing non-generic implementation does not have these checks already. Also, the existing non-generic implementation can be optimized quite a bit.

this would be implemented in C++

C++ does not help you much here. And we want to move to a place where this is implemented in C# for the most part anyway.

jkotas on 18 Nov 2016

BTW: This shares similar set of problems as the generic version of Array.Copy. I believe it is hard to demonstrate significant performance advantage for the generic Array.Copy for similar reasons.

jkotas on 18 Nov 2016

@jkotas

I believe it is hard to demonstrate significant performance advantage for the generic Array.Copy for similar reasons.

Is it? GetLowerBound(0) removed?

jamesqo on 18 Nov 2016

You can restructure the implementation a bit to make the GetLowerBound(0) call go away, without changing the public surface.

jkotas on 18 Nov 2016

Better performance is really the only good reason to add this API, but there is no evidence that it really does that. We can reconsider if the generic implementation is proved to provide significant performance advantage over the existing non-generic one (and equivalent optimization cannot be reasonably applied to the existing non-generic implementation).

jkotas on 18 Nov 2016

It sounds like while the API looks fine, the implementation would not provide any performance improvements over the current solution.

I am going to close this for now but feel free to reopen if you come up with a way to make the new method faster than the existing solution.