Runtime: Proposal: List<T>.AsSpan()

Created on 16 May 2017  路  30Comments  路  Source: dotnet/runtime

Proposed API Additions

public class List<T>
{
    public Span<T> AsSpan<T>();
    public ReadOnlySpan<T> AsReadOnlySpan<T>();
}

Sample Usage and Rationale

var list = new List<byte>();
foreach(var b in SomeThing)
{
    list.Add(b);
}

var span = list.AsSpan();

Arrays are fixed sizes; so creating an array over an unknown sized set involves manual resizing and copying (or upfront over allocation). List already covers this scenario, resizing array behind the scenes.

Proposal is to add a method that returns a correctly sized Span window over the List's array.

i.e. A non allocating version of .ToArray() that's also better as it can be made immutable with ReadOnlySpan.

/cc @jkotas @terrajobst @stephentoub @davidfowl

api-needs-work area-System.Memory

Most helpful comment

is wrong, isn't it? It should output 1,2,3,4,5 as the list contains 5 now.

I was highlighting how different folks could have different expectations. It would be reasonable for someone to expect that since 5 was added to the list and the 0th element was changed to 0 via the span that it would then contain 0,2,3,4,5. As you point out, the list itself would actually contain 1,2,3,4,5 even though the span would show 0,2,3,4. If you simply flipped the order of setting the 0th element and adding the 5, then the list would contain 0,2,3,4,5 and the span would contain 0,2,3,4.

A warning in docs that Adding or TrimExcess to the List may disconnect the Span from live List?

It's not just Adding and TrimExcess that would cause problems. Consider:
C# var list = new List() { 0, 1, 2, 3 }; var span = list.AsSpan(); list.RemoveAt(3); list.RemoveAt(2);
Now the list has a Count of 2 and contains 0,1, but the span still has a length of 4 and contains 0,1,0,0 (though the latter two would depend on whether the list clears elements that don't contain any refs).

I'm simply pointing out it's not a snapshot, nor is it a live representation. It's some frankenbeast.

All 30 comments

What's the expected behavior in a case like:

var list = new List<byte>() { 1, 2, 3, 4 };
var span = list.AsSpan();
list.Add(5); // will cause list's array to be replaced by a bigger one
span[0] = 0; // updates old array
foreach (byte b in list) Console.WriteLine(b); // outputs 1, 2, 3, 4, 5 rather than 0, 2, 3, 4, 5

where the span would then be wrapping the "wrong" array?

What's the expected behavior in a case like:

A warning in docs that Adding or TrimExcess to the List may disconnect the Span from live List? And you should recapture the span if you do.

Nice. Would need clear warnings about competing access (as above, but also reminding people that the length won't change if they Remove etc), but: nice idea

"snapshot" in the name would be selfdescriptive

Just a minor observation: the string equivalent of this ignores As[Read-only]Span, and jumps straight to Slice(...), IIRC (I could be misremembering, I'm not in that code).

Related idea: Slice over MemoryStream, encapsulating all the GetBuffer / TryGetBuffer (depending on framework version)

Btw, I have a story about GetBuffer and how people confuse its meaning.
There is a mistake in serializing DataTable. I filed a bug 12 or more years ago but MS refused to fix it. I can't remember the exact explanation, something like "BinaryFormatter is not the coolest kid in the block any more" :smile:

@stephentoub your last comment in

foreach (byte b in list) Console.WriteLine(b); // outputs 1, 2, 3, 4, rather than 0, 2, 3, 4, 5

is wrong, isn't it?
It should output 1,2,3,4,5 as the list contains 5 now.

For an initial 5 element list it would work as the list currently would have an array size of 8 initially

var list = new List<byte>() { 1, 2, 3, 4, 5 };
var span = list.AsSpan();
list.Add(6); // no replacing the array
span[0] = 0; // updates old array
foreach (byte b in list) Console.WriteLine(b); // 0, 2, 3, 4, 5, 6

I'd say the API would be too dangerous too put on list, simply because it requires the understanding of the implementation details of list to use fully/correctly.
Maybe only put ReadOnlySpan on ReadOnlyList ?

A warning in docs

People don't read the docs since they got intellisense :)

@Tornhoof by that logic, MemoryStream shouldn't have GetBuffer() / TryGetBuffer(); that has exactly the same problems (the byte[] you're given can become detached if someone appends more data, and the .Length snapshot is only valid at that instant), yet it is a very valuable tool.

I can see immediate use cases for this including serialization and vectorization.

As for ReadOnlyList (ReadOnlyCollection<T> ?) - that's fine in principle, but : a tiny tiny minority of APIs expose their lists as that, so ... it is pure but unhelpful. Additionally, ReadOnlyCollection<T> wraps a more general IList<T> implementation - impossible to get a span for that, as you don't know that the inner implementation is contiguous, and even if it is: you don't have access to that data, because it is abstracted.

@omariom re that DataSet bug: oof, that's a fun catch. I see people doing this all the time, sadly (I get a lot of questions about that re protobuf-net; it is right up there next to people using MemoryStream and forgetting to rewind it between usages). TryGetBuffer makes it harder to screw up (it exposes ArraySegment<byte>), but: never underestimate the resourcefulness of a determined fool.

@mgravell Yes I admit, it's similar to MemoryStream and yes ReadOnlyList does not exist and only IReadOnlyList exists (I mixed that up somewhere). Your point about contiguous memory is important and only works by relying on the actual implementation details, without some funny interfaces like IContiguousMemory (and that one defining the Span API).

May be then AsDetachedSpan(), AsDetachedReadonlySpan().

Even if the name itself doesn't click, then the name length should be a warning :-)

AsDetachedSnapshot() AsDetachedReadonlySnapshot()

GetCurrentSpan() would allude to the potentially transient nature of the result. Alternatively, GetCurrentElements().

is wrong, isn't it? It should output 1,2,3,4,5 as the list contains 5 now.

I was highlighting how different folks could have different expectations. It would be reasonable for someone to expect that since 5 was added to the list and the 0th element was changed to 0 via the span that it would then contain 0,2,3,4,5. As you point out, the list itself would actually contain 1,2,3,4,5 even though the span would show 0,2,3,4. If you simply flipped the order of setting the 0th element and adding the 5, then the list would contain 0,2,3,4,5 and the span would contain 0,2,3,4.

A warning in docs that Adding or TrimExcess to the List may disconnect the Span from live List?

It's not just Adding and TrimExcess that would cause problems. Consider:
C# var list = new List() { 0, 1, 2, 3 }; var span = list.AsSpan(); list.RemoveAt(3); list.RemoveAt(2);
Now the list has a Count of 2 and contains 0,1, but the span still has a length of 4 and contains 0,1,0,0 (though the latter two would depend on whether the list clears elements that don't contain any refs).

I'm simply pointing out it's not a snapshot, nor is it a live representation. It's some frankenbeast.

To get a real Span snapshot or detached you can use the existing api and do new Span(list.ToArray()) but that mostly defeats the purpose; both allocating and not being a live copy (for interop), also more apis current exist that take array so you probably wouldn't bother with the Span.

Though you may do new ReadOnlySpan(list.ToArray()) to pass a read only array snapshot.

Its a less dangerous api than MemoryStream GetBuffer() / TryGetBuffer() as anything passed the Span can't take a copy as its a stack only type, which they can with the array returned via GetBuffer(); which is equally a frankenbeast 馃槈

Unlike an array the Span would be more localized in used, due to its stack only nature, so overlapped use between List and Span should be more obvious and .AsSpan should be a fairly lightweight, non-allocating operation so isn't a reason not to do it at the point of use.

In the spirit of DangerousGetPinnableReference(): DangerousGetSpan() :) although GetFrankenspan() also has a curious appeal...

I totally agree with both the valid issues that @stephentoub raises, and the mediating factors that @benaadams makes (i.e. it being on the stack so the overlapped usage is far more obvious / contained). But from the selfish viewpoint of the problems it solves rather than the problems it causes, I really like this idea - and even if it doesn't make it into the official API, I'm totally going to be adding a TryGetSpan() API to the span-enabled versions of at least one of my libraries, that abuses reflection when available on the platform (and when the private field names check out, etc)

(wonders idly about creating an EvilUnsafe nuget package that exposes the innards of things like List<T> and MemoryStream without having to worry about things like "being sensible" or "considering the consequences" :p)

@mgravell

Additionally, ReadOnlyCollection<T> wraps a more general IList<T> implementation - impossible to get a span for that, as you don't know that the inner implementation is contiguous, and even if it is: you don't have access to that data, because it is abstracted.

Is that an argument to add a something like TryGetSpan as a default interface method (https://github.com/dotnet/csharplang/issues/52) to IList<T> when that becomes possible? It could look something like:

```c#
interface IList
{
bool TryGetSpan(out Span span)
{
span = default;
return false;
}

bool TryGetReadOnlySpan(out ReadOnlySpan<T> span)
{
    span = default;
    return false;
}

}

class List : IList
{
public bool TryGetSpan(out Span span)
{
span = _items.AsSpan();
return true;
}

public bool TryGetReadOnlySpan(out ReadOnlySpan<T> span)
{
    span = _items.AsReadOnlySpan();
    return true;
}

}

class ReadOnlyCollection : IList
{
public bool TryGetReadOnlySpan(out ReadOnlySpan span)
{
return list.TryGetReadOnlySpan(out span);
}
}
```

What about:

void WithSpan(Action<Span<T>> callback)

for List<T>?

@LYP951018 That doesn't really solve the issue, you can still do e.g.:

```c#
list.WithSpan(span =>
{
list.Add(42); // or whatever

// span is now "frankenspan"

});
```

OMG... That's why rust's mutation rules are important...

Also, most of the time when the span would be useful, it would be extremely inconvenient to have to force the work into a delegate, plus that has overhead of the delegate instance, the capture context, and the virtual dispatch.

If we're going to end up with frankenspan, we might as well do it efficiently :)

Would ReadOnlyFrankenSpan be better?

How about adding explicit interface like this? This could be added to StringBuilder as well.

interface IDangerousArrayAccess<T>
{
    T[] Array { get; }
    int Count { get; }
}

Plus add Obsolete attribute to MemoryStream's GetBuffer method and add this interface to MemoryStream. So many mistakes with that :)

How about adding explicit interface like this?

Adding interfaces is much harder to deploy, unless we get "interface implementation via extensions"; can't see it happening, and even if it did: would be restricted to certain targets.

This could be added to StringBuilder as well.

StringBuilder doesn't use a single array origin; it uses a chunk chain IIRC - and even that is an implementation detail. IIRC it used to use an oversized string that it mutated in-place!

Did somebody solve this by adding Add extension method on top of Memory<T>? Or just implemented own struct List?

.20 cents.
Currently List makes an "additional bounds check" to the one array does. In other words when accessing List indexer or list iterator it must make sure the index given is less than list size. Plus, iterator checks "version". That is relatively big overhead that makes accessing List much slower than array. If we could have AsSpan or at least AsReadOnlySpan this will help to significantly improve list access perf. And, even, potentically could result in complete array bounds check elimination vs current state when we have at least 2 bounds checks and 1 version check.

@benaadams, how does this compare to dotnet/corefx#31597?

Do we need both open?

Video

Let's start with no :)

Was this page helpful?
0 / 5 - 0 ratings