The overall codegen for multi-dimensional arrays is poor and has worse performance than a jagged array, even though the former is linear in memory.
This has resulted in multiple StackOverflow posts on the topic and even Unity recommending that multi-dimensional arrays not be used: https://docs.unity3d.com/2019.3/Documentation/Manual/BestPracticeUnderstandingPerformanceInUnity8.html
It looks like there are a few obvious issues that should hopefully be trivially handled:
Array.GetLength() method currently generates fairly unoptimal code. It actually winds up as a call rather than just a lookup to the underlying array data. Ideally these would be straight lookups when the dimension being looked up is constant and definitely in bounds of the rank.Array.GetLength(), whether cached or otherwisefor (y = 0; y < yLength; y++) { for (x = 0; x < xLength; x++) { data[x, y]; } } results in linear access of the underlying datahttps://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHunGBLAHYYGAZWz4ADgBsY5ABSCMAbTQBdBgBNsGbAEoOXHkb4AzBnK06GfXAwEBXKVP3EA7AwAMAbkNHuvv0UGWFxHYQBeTx9qPx4AoxNocyCATwZI7wY0gB5NbWwAOgBxGAwAGRgBAHMMAAs5cl0vLIBqFt14nk4Y2L9EqGShBgR0qO4R3MtCkvLKmvqPJuG2jp7ers717hCwhhbIqaUENCzVaK2eAF9Nhmu1vxu3YJhQqQxzq4CA+mshsUkZKQFEIVOopvpuutTOYptZbA4nC53N4bjcgjs3qMUfcjDcAG7YAYpCrVOqjKbFUok+YNJr4wnDalkg75SmzUkLOk4uLc7j9QbCNIZZo5LJM2oilY3SEXBj84HCEbC8YMXIIcXNBBS3l+GWyngY4T7PI6I4nFJnG5GO7rG29R7uQ0fbh2/z3H5Bf7SGB0BWgk16Aw66EWfJwuyOZxMZHOh7BoaGrGx3E6gkDdVzZkBtniuSLZM8NNizO1cmsma5xoFt3reVBJVjYaqxklzXa9Z6rZ1oZCsai4mt1rtK0bHXrRPGw7HU7VvyuueonVPJ0BefffhDCsljx+tTDCEO4Y5kt5rnWr7ujfCLcc+SKf0IA9L9wIY8c2mxu6XIA
The JIT doesn't appear to be eliding bounds checks against Array.GetLength()
Multidimensional arrays have lower-bounds. It makes eliding bounds checks against Array.GetLength() impossible (without loop cloning for arrays with zero lower bounds). It would need to be against Array.GetLowerBound/GetUpperBound.
It looks like the same issues exist with GetLowerBound and GetUpperBound. They are calls when they could be direct reads (like is done for the bounds checks) and there is no eliding happening.
@jkotas It's my understanding that neither C# nor VB allow non-zero lower bounds. Could we consider a breaking change to .NET Core to eliminate the non-zero lower bounds for multi-dimensional arrays?
I think we can entertain that option. We would need data about the impact and benefits. https://apisof.net/catalog/System.Array.CreateInstance(Type,Int32(),Int32()) shows these arrays may be used by 3.2% apps.
If completely eliminating non-zero-based multidimensional arrays is being considered, a softer breaking change could be just to prevent casting them to zero-based multidimensional arrays. This way, such arrays could still be used as Array, but would no longer be valid C# multidimensional arrays, so bounds check elimination when GetLength() is used would become possible for those.
E.g. consider this code:
```c#
var a = (int[,])Array.CreateInstance(typeof(int), new[] { 1, 1 }, new[] { 1, 1 });
_ = a[0,0];
The relevant parts of [the IL](https://sharplab.io/#v2:C4LglgNgPgAgTARgLAChXwAQGFUG9UaEYwAsGAsgBQCUBR+KRTGAbgIYBOGbGAvBpTAA7YAG0ANAF1qAQQ4c2ATwB0WDgFM2wdQEkhAZ2BshAY3WVgigA7qA9gDNBI6uIxD1Ad1GSMuDAlcEDABfV3cvHz8A/xDqAG46ZgwAfT5uUQAGcQzJBMYiYNRgoA==) are:
```il
…
call class [System.Private.CoreLib]System.Array [System.Private.CoreLib]System.Array::CreateInstance(class [System.Private.CoreLib]System.Type, int32[], int32[])
castclass int32[0..., 0...]
…
call instance int32 int32[0..., 0...]::Get(int32, int32)
Today, this throws at the call to the indexer/Get: notice that even though castclass specifies a zero-based array, this is ignored and the cast succeeds.
If non-zero-based arrays were eliminated, this would presumably already throw at the call to CreateInstance.
Under my suggestion, this would throw at the cast to int[,].
The multidimensional arrays with and without lower bounds use the same type. The lower bounds in the signatures have no meaning at runtime, they are for compile-time only (e.g. like nullable annotations). Preventing casting would require introducing a distinct type for these that seems hard to reason about.
If we were to consider introducing new types, we may also look at multidimensional spans (e.g. Span2, Span3, Span4, 5+ dimensions are probably too rare to bother with) that would also address the problem with accessing unmanaged multidimensional memory efficiently.
See also #5481. We don't recognize and optimize invariant parts of md array address computations early enough; doing so would at least bring perf back in line with what one could get with jit64. Impact of that is probably more significant than eliding bounds checks.
I'm going to mark this as future. We may have some opportunity to look at improvements here for 5.0, but not sure yet if this will make the cut.
This need to happen in .NET 6.0
Most helpful comment
@jkotas It's my understanding that neither C# nor VB allow non-zero lower bounds. Could we consider a breaking change to .NET Core to eliminate the non-zero lower bounds for multi-dimensional arrays?