Runtime: Codegen for multi-dimensional arrays is poor

Created on 22 Apr 2020  Â·  9Comments  Â·  Source: dotnet/runtime

The overall codegen for multi-dimensional arrays is poor and has worse performance than a jagged array, even though the former is linear in memory.
This has resulted in multiple StackOverflow posts on the topic and even Unity recommending that multi-dimensional arrays not be used: https://docs.unity3d.com/2019.3/Documentation/Manual/BestPracticeUnderstandingPerformanceInUnity8.html

It looks like there are a few obvious issues that should hopefully be trivially handled:

  1. The Array.GetLength() method currently generates fairly unoptimal code. It actually winds up as a call rather than just a lookup to the underlying array data. Ideally these would be straight lookups when the dimension being looked up is constant and definitely in bounds of the rank.
  2. The JIT doesn't appear to be eliding bounds checks against Array.GetLength(), whether cached or otherwise
  3. The JIT doesn't recognize that for (y = 0; y < yLength; y++) { for (x = 0; x < xLength; x++) { data[x, y]; } } results in linear access of the underlying data

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHunGBLAHYYGAZWz4ADgBsY5ABSCMAbTQBdBgBNsGbAEoOXHkb4AzBnK06GfXAwEBXKVP3EA7AwAMAbkNHuvv0UGWFxHYQBeTx9qPx4AoxNocyCATwZI7wY0gB5NbWwAOgBxGAwAGRgBAHMMAAs5cl0vLIBqFt14nk4Y2L9EqGShBgR0qO4R3MtCkvLKmvqPJuG2jp7ers717hCwhhbIqaUENCzVaK2eAF9Nhmu1vxu3YJhQqQxzq4CA+mshsUkZKQFEIVOopvpuutTOYptZbA4nC53N4bjcgjs3qMUfcjDcAG7YAYpCrVOqjKbFUok+YNJr4wnDalkg75SmzUkLOk4uLc7j9QbCNIZZo5LJM2oilY3SEXBj84HCEbC8YMXIIcXNBBS3l+GWyngY4T7PI6I4nFJnG5GO7rG29R7uQ0fbh2/z3H5Bf7SGB0BWgk16Aw66EWfJwuyOZxMZHOh7BoaGrGx3E6gkDdVzZkBtniuSLZM8NNizO1cmsma5xoFt3reVBJVjYaqxklzXa9Z6rZ1oZCsai4mt1rtK0bHXrRPGw7HU7VvyuueonVPJ0BefffhDCsljx+tTDCEO4Y5kt5rnWr7ujfCLcc+SKf0IA9L9wIY8c2mxu6XIA

category:cq
theme:md-arrays
skill-level:expert
cost:large

area-CodeGen-coreclr

Most helpful comment

@jkotas It's my understanding that neither C# nor VB allow non-zero lower bounds. Could we consider a breaking change to .NET Core to eliminate the non-zero lower bounds for multi-dimensional arrays?

All 9 comments

The JIT doesn't appear to be eliding bounds checks against Array.GetLength()

Multidimensional arrays have lower-bounds. It makes eliding bounds checks against Array.GetLength() impossible (without loop cloning for arrays with zero lower bounds). It would need to be against Array.GetLowerBound/GetUpperBound.

It looks like the same issues exist with GetLowerBound and GetUpperBound. They are calls when they could be direct reads (like is done for the bounds checks) and there is no eliding happening.

@jkotas It's my understanding that neither C# nor VB allow non-zero lower bounds. Could we consider a breaking change to .NET Core to eliminate the non-zero lower bounds for multi-dimensional arrays?

I think we can entertain that option. We would need data about the impact and benefits. https://apisof.net/catalog/System.Array.CreateInstance(Type,Int32(),Int32()) shows these arrays may be used by 3.2% apps.

If completely eliminating non-zero-based multidimensional arrays is being considered, a softer breaking change could be just to prevent casting them to zero-based multidimensional arrays. This way, such arrays could still be used as Array, but would no longer be valid C# multidimensional arrays, so bounds check elimination when GetLength() is used would become possible for those.

E.g. consider this code:

```c#
var a = (int[,])Array.CreateInstance(typeof(int), new[] { 1, 1 }, new[] { 1, 1 });
_ = a[0,0];

The relevant parts of [the IL](https://sharplab.io/#v2:C4LglgNgPgAgTARgLAChXwAQGFUG9UaEYwAsGAsgBQCUBR+KRTGAbgIYBOGbGAvBpTAA7YAG0ANAF1qAQQ4c2ATwB0WDgFM2wdQEkhAZ2BshAY3WVgigA7qA9gDNBI6uIxD1Ad1GSMuDAlcEDABfV3cvHz8A/xDqAG46ZgwAfT5uUQAGcQzJBMYiYNRgoA==) are:

```il
…
call class [System.Private.CoreLib]System.Array [System.Private.CoreLib]System.Array::CreateInstance(class [System.Private.CoreLib]System.Type, int32[], int32[])
castclass int32[0..., 0...]
…
call instance int32 int32[0..., 0...]::Get(int32, int32)

Today, this throws at the call to the indexer/Get: notice that even though castclass specifies a zero-based array, this is ignored and the cast succeeds.

If non-zero-based arrays were eliminated, this would presumably already throw at the call to CreateInstance.

Under my suggestion, this would throw at the cast to int[,].

The multidimensional arrays with and without lower bounds use the same type. The lower bounds in the signatures have no meaning at runtime, they are for compile-time only (e.g. like nullable annotations). Preventing casting would require introducing a distinct type for these that seems hard to reason about.

If we were to consider introducing new types, we may also look at multidimensional spans (e.g. Span2, Span3, Span4, 5+ dimensions are probably too rare to bother with) that would also address the problem with accessing unmanaged multidimensional memory efficiently.

See also #5481. We don't recognize and optimize invariant parts of md array address computations early enough; doing so would at least bring perf back in line with what one could get with jit64. Impact of that is probably more significant than eliding bounds checks.

I'm going to mark this as future. We may have some opportunity to look at improvements here for 5.0, but not sure yet if this will make the cut.

This need to happen in .NET 6.0

Was this page helpful?
0 / 5 - 0 ratings

Related issues

omajid picture omajid  Â·  3Comments

noahfalk picture noahfalk  Â·  3Comments

aggieben picture aggieben  Â·  3Comments

iCodeWebApps picture iCodeWebApps  Â·  3Comments

bencz picture bencz  Â·  3Comments