Runtime: Add Unsafe.ByteOffset<T>(object obj, ref T target)

Created on 7 Nov 2016 · 30Comments · Source: dotnet/runtime

In order to support

Span.DangerousCreate(object obj, ref T rawPointer, int length);

for general objects (not just arrays), we'll need a pair of Unsafe api to compute that offset and derefence it.

I expect they would look like this:

// Returns the byte offset of "target" (presumably a field or array element within "origin"
// wrt to the actual start of the object including the runtime-dependent object header
// (i.e. we start counting from the address of the vtable pointer, not the "first field.")
IntPtr ByteOffset<T>(object origin, ref T target)
{
    ldarg.1
    ldarg.0
    sub
    ret
}

ref T AddByteOffset<T>(object source, IntPtr byteOffset)
{
    ldarg.0
    ldarg.1
    add
    ret
}

In addition to supporting DangerousCreate(), this would be needed to support non-unified empty spans since

I can't use the expression "ref a[a.length]". The ECMA standard says its legal to talk about such a reference but ldelema still throws an IndexOutOfRangeException.
I can't use my existing workaround of setting the base to null since we decided we want the base to reflect the actual and distinct location even in the case of empty spans.

api-needs-work area-System.Runtime.CompilerServices

Source

ghost

All 30 comments

I do not think that the IL that you have suggested to implement this is valid. There is no implicit conversion from object to pointer in ECMA spec. Our checked JIT will assert on it, and retail JIT will likely crash in some cases. The ECMA spec says that object references and managed pointers are not interchangeable, e.g.: "In particular, references shall only be used on operations that indicate that they operate on reference type.".

I think this API would need to go through extra hops to get around it (pin it, etc.). But once you do that it may lose its nice performance characteristics and become useless for Span.

jkotas on 7 Nov 2016

I was playing with this last night and it seems like we'd have to do something special for different types of objects but I'm not 100% sure if any of this is portable as it depends on the layout of different objects.

``` C#
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static ref T GetDataRef(object obj)
{
if (obj is Array)
{
return ref Unsafe.As(ref Unsafe.As(obj).Data);
}

if (obj is string)
{
    return ref Unsafe.As<char, T>(ref Unsafe.As<StringData>(obj).Data);
}

return ref Unsafe.As<byte, T>(ref Unsafe.As<RawData>(obj).Data);

}

private class RawData
{
public byte Data;
}

private class ArrayData
{
public IntPtr Length;
public byte Data;
}

private class StringData
{
public int Length;
public char Data;
}

Then:

``` C#
public static Span<T> DangerousCreate(object obj, ref T pointer, int length)
{
    var offset = Unsafe.ByteOffset(ref GetDataRef(obj), ref pointer);
    return new Span<T>(obj, offset, length);
}

davidfowl on 7 Nov 2016

@jkotas Hmm... do you have an suggestions on how we might support DangerousCreate() then? The current code in corefxlab seems to be relying on this principle

(https://github.com/dotnet/corefxlab/blob/master/src/System.Slices/System/UnsafeUtilities.cs#L78)

albeit with some major "I can't believe I'm doing this" comments. If this is incorrect (and prone to crashes on JIT), which I could believe, what's the best alternative?

ghost on 7 Nov 2016

😄1

@davidfowl

I'm already doing something similar for arrays except that rather than hard-code the shape of the array header, SpanHelpers does a one time measurement using a pinned byte array as a sample.

This still involves a non-portable assumption (that the array elements are available by skipping past a fixed sized header from the "root" of the object and the size of that header is the same for all T[]'s, regardless of T. But that's slightly safer than hardcoding the structure of the header itself.

ghost on 7 Nov 2016

👍1

The most reasonable compromise I can think of is:

private class RawData
{
    public byte Data;
}

public static Span<T> DangerousCreate(object obj, ref T pointer, int length)
{
    var offset = Unsafe.ByteOffset(ref Unsafe.As<RawData>(obj).Data, ref pointer);
    return new Span<T>(obj, offset, length);
}

jkotas on 7 Nov 2016

As a litmus test on what works vs. does not work, we may want to test on mono - if something works on all mono, coreclr and full framework, it should be good enough.

jkotas on 7 Nov 2016

This won't help in the short term but if we standardize on the public ref T DangerousGetPinnableReference() maybe this gets easier? We can abstract away the layout differences and have Span<T> work on anything that implements this contract. Maybe we need an interface:

C# public interface IPinnable<T> { public ref T DangerousGetPinnableReference(); }

We could implement this on the objects we care about.

Thoughts?

davidfowl on 7 Nov 2016

👍1

Yes, we will need standardized name like DangerousGetPinnableReference to get more compact pinning C# syntax for Spans (and other types). It should not be wrapped into interface though.

jkotas on 7 Nov 2016

This wouldn't be for pinning, it would be a way to avoid types like RawData.

C# public static Span<T> DangerousCreate<T>(object obj, ref T pointer, int length) { IPinnable<T> p = object as IPinnable<T>; // TODO: Error handling or fallback to RawData var offset = Unsafe.ByteOffset(ref p.DangerousGetPinnableReference(), ref pointer); return new Span<T>(obj, offset, length); }

davidfowl on 7 Nov 2016

👎1

Seems cleaner than RawData....

davidfowl on 7 Nov 2016

@AtsushiKan I assume you didn't mean 1.1 which is now in deep Escrow. Are you currently working on the issue? (If not please unassign it from yourself)

karelz on 7 Nov 2016

Looks like I'll go with a combination of RawData (to avoid the need to write the apparently unwritable ByteOffset<> oveload) and sampling for the other data types (arrays, strings) to get their offset relative to RawData.Data.

ghost on 7 Nov 2016

Looks like I'll go with a combination of RawData (to avoid the need to write the apparently unwritable >ByteOffset<> oveload) and sampling for the other data types (arrays, strings) to get their offset relative >to RawData.Data.

Umph - on further thought, that approach sounds just as invalid as "computing a ref" to the root of the object. In the array case, we'd be storing a managed pointer that points to the hidden "Length" field inside the array (on the MS CLR, that is. On 3rd party runtimes, who knows what's there.) rather than to any element. It worked on my quick test but we already know this is one of those nondeterministic bugs. @jkotas, can you weigh in on this?

If this isn't usable, and we don't want to hard-code the layout of array and string headers, I'll have to go with DangerousGerPinnedReference doing "if (_obj as T[])" checks, then doing "ref a[0]" as the "base" of the offset for arrays. But this causes a problem for zero-length arrays. We could just "re-base" this to a dummy object but then the Span won't maintain its identity for equality purposes. That might be an acceptable edge case (how often do people care about the "identity" of an empty array?)

Thoughts, anyone?

ghost on 7 Nov 2016

What is the physical array and string layout in Mono? Is it different from .NET Core/.NET Framework? We may end up with split runtime specific implementations if there is no way to unify it.

jkotas on 7 Nov 2016

Just checked and it seems the array layout might be different (or I have buggy code):

https://gist.github.com/davidfowl/fe60827399423eb6b62a302a94052d35

Output on CLR:

Output on Mono

davidfowl on 7 Nov 2016

OK Much simpler

``` C#
class Program
{
static unsafe void Main(string[] args)
{
byte[] b = new byte[10];

    for (int i = 1; i <= 10; i++)
    {
        b[i - 1] = (byte)i;
    }

    for (int i = 0; i < 20; i++)
    {
        ref byte data = ref Unsafe.AddByteOffset(ref Unsafe.As<RawData>(b).Data, new IntPtr(i));
        Console.WriteLine(data);
    }
}

private class RawData
{
    public byte Data;
}

}

CLR:

10
0
0
0
1
2
3
4
5
6
7
8
9
10
0
0
0
0
0
0

Mono:

0
0
0
0
10
0
0
0
1
2
3
4
5
6
7
8
9
10
0
0
```

davidfowl on 7 Nov 2016

I believe the extra 4 bytes on Mono is the pointer to the bounds array:

https://github.com/mono/mono/blob/master/mono/metadata/object-internals.h#L113

Looks like Mono has a unified representation for Sz and Md Arrays.

ghost on 7 Nov 2016

Ah cool. @AtsushiKan what direction did you decide to go on this one? Are you going to have a specific check for mono and a different layout there? I spoke to @migueldeicaza today and he wants to implement this in mono itself so we may not need to do anything special here.

davidfowl on 8 Nov 2016

👍1

I think we'll have to go with the runtime-specific layouts for array and strings. I don't see a better alternative. So this isn't really the "portable" Span, but the "legacy MS CLR" Span. If all the newer platforms (and Mono) are rolling their own fast versions anyway, portability isn't as critica.

ghost on 8 Nov 2016

Maybe same people as from previous thread should also be cc'ed? cc @nietras @benaadams @omariom @GSPP @mikedn @adamsitnik

jamesqo on 8 Nov 2016

👍1

@jkotas, @davidfowl's idea of having an IPinnable<T> interface would seem reasonable if there was an extra generic parameter constrained to that interface for the pinnable, and so no runtime overhead. For example:

public static Span<T> DangerousCreate<TPinnable, T>(TPinnable p, ref T pointer, int length)
    where TPinnable : IPinnable<T>
{
    var offset = Unsafe.ByteOffset(ref p.DangerousGetPinnableReference(), ref pointer);
    return new Span<T>(p, offset, length);
}

jamesqo on 8 Nov 2016

So how bad would pinning actually be?

Without these primitives, the perf-critical DangerousPinnedReference method has to switch based on whether _object is a T[] or a plain old object (thanks to DangerousCreate()) and string (on ReadOnlySpan) plus whatever else we decide to build Spans around. And then there's the issue of hardcoding object header layouts and possibly having to accomodate Mono.

With them, DangerousPinnedReference is a one-liner that doesn't even have to special case for the _object == null case.

Is pinning really worse than that?

.method public hidebysig static native int ByteOffset<T>(object origin, !!T& target) cil managed aggressiveinlining
{
        .custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = ( 01 00 00 00 )
        .locals init(object pinned p)
        .maxstack 2
        ldarg.0
        stloc.0
        ldarg.1
        ldloc.0
        sub
        ret
} // end of method Unsafe::ByteOffset

    .method public hidebysig static !!T& AddByteOffset<T>(object source, native int byteOffset) cil managed aggressiveinlining
{
        .custom instance void System.Runtime.Versioning.NonVersionableAttribute::.ctor() = ( 01 00 00 00 )
        .locals init(object pinned p)
        .maxstack 2
        ldarg.0
        stloc.0
        ldloc.0
        ldarg.1
        add
        ret
} // end of method Unsafe::AddByteOffset

ghost on 10 Nov 2016

So how bad would pinning actually be?

You would need to measure ... the full framework JITs won't inline anything with pinning.

ByteOffset, AddByteOffset

What about calling these GetFieldOffset and GetFieldAtOffset ?

jkotas on 10 Nov 2016

What about calling these GetFieldOffset and GetFieldAtOffset ?

We'd be using these for array elements and string characters as well as "fields."

ghost on 10 Nov 2016

We'd be using these for array elements and string characters as well as "fields."

Yes, and perhaps other scenarios so I think naming is fine.

However, I do not understand the need for int ByteOffset<T>(object origin, !!T& target). As far as I can tell this can be solved with the existing Unsafe API surface (via As and ByteOffset(ref,ref)), which also appears to be what is done in the current implementation of "portable" Span.

@jkotas did show you could do something like:

public struct Span<T> 
{
    [StructLayout(LayoutKind.Sequential)] 
    private sealed class Pinnable
    {
        public T Data;
    }

    private readonly Pinnable _pinnable;
    private readonly IntPtr _byteOffset;


    public ref T DangerousGetPinnableReference
    {
        get
        {
            return (_pinnable != null) ? Unsafe.AddByteOffset<T>(ref _pinnable.Data, _byteOffset) : Unsafe.AsRef<T>((void*)_byteOffset);
        }
    }
}

if byte offsets are always stored for managed types relative to _pinnable.Data then what do we need the int ByteOffset<T>(object origin, !!T& target) for?

nietras on 11 Nov 2016

There is no implicit conversion from object to pointer in ECMA spec.

@jkotas just to be sure, is there really no way to convert an object o to a ref directly? Given you know the offset/header size of object one can easily create a ref to the same "location" as the object itself i.e. via SubtractByteOffset. So perhaps the implicit conversion is "illegal" but is the ref valid?

Additionally, can the ref be null?

nietras on 11 Nov 2016

So perhaps the implicit conversion is "illegal" but is the ref valid?

Depends on the runtime implementation. It should be valid in the current implementations of .NET Framework / .NET Core, but it is untested behavior. There are not tests to verify that everything works well with it.

Additionally, can the ref be null?

Yes, in IL. You cannot get it via regular C#.

jkotas on 11 Nov 2016

Yes, in IL. You cannot get it via regular C#.

Hmm, so in principle we could eliminate the branch (_pinnable != null) in the index, if that is we could convert a object to ref directly or avoid a null check when getting ref for .Data on _pinnable?

nietras on 11 Nov 2016

Closing old issue (came up because I was working on Span - Span work has gone on for months without my involvement and it presumably has what it needs.)

ghost on 31 Aug 2017

This was added in https://github.com/dotnet/corefx/pull/12895 and https://github.com/dotnet/corefx/pull/12446 just for reference.

nietras on 1 Sep 2017

Was this page helpful?

0 / 5 - 0 ratings