When not constructed with a specific byte[]
, MemoryStream
allocates byte[]s
every time it needs to grow. It would be tempting to just change the implementation to use ArrayPool<byte>.Shared.Rent
to get that array, but this is problematic for a few reasons, mostly to do with existing code:
ArrayPool<byte>
, it's important to release the currently used buffer back to the pool when the stream is disposed. And it would be expensive to make MemoryStream finalizable to deal with this.One solution would just be to introduce a new Stream type, e.g.
```C#
namespace System.IO
{
public sealed class ArrayPoolStream : Stream
{
public ArrayPoolStream() : this(ArrayPool
public ArrayPoolStream(ArrayPool
~ArrayPoolStream();
public ArraySegment<byte> DangerousGetArray();
... // override all the relevant members on Stream
}
}
```
That has its own downsides, though. In particular, there's something nice about this just being a part of MemoryStream, being easily used in existing code that uses MemoryStreams, etc. Another option would be to just plumb this into MemoryStream, but in an opt-in manner, and accept some of the resulting limitations because they're opt-in:
public MemoryStream(ArrayPool<byte> pool)
.I'm leaning towards the second option: just add the new MemoryStream(ArrayPool<byte>)
ctor, and allow TryGetBuffer to work; devs just need to know that when they create the stream with this ctor, they should dispose the stream, and they shouldn't use the array retrieved from TryGetBuffer after doing something else to the stream.
It's fairly common to not dispose MemoryStreams, as currently the Dispose is effectively a nop.
BTW, I have recently added a note to the documentation of MemoryStream
that says that it doesn't have to be disposed. So changing that now would be breaking a promise the documentation makes. (And if it's not an appropriate promise, the documentation should be changed back.)
Would it be less dangerous to return ReadOnlySpan in another overload of TryGetBuffer ?
Would it be less dangerous to return ReadOnlySpan in another overload of TryGetBuffer ?
A little maybe, but many of the same problems apply, e.g.
```C#
stream.TryGetBuffer(out ArraySegment
stream.Write(...);
Use(buffer);
```C#
stream.TryGetBuffer(out ReadOnlySpan<byte> buffer);
stream.Write(...);
Use(buffer);
In both cases, the stream may have replaced the array referenced by buffer
, in which case buffer
is now pointing to something old.
Plus, many of the cases for which TryGetBuffer really need access to the array.
We use MemoryStream
all the time, it would be great if bytes array come from ArrayPool
instead of allocating . Please plan this feature for your next milestone.
@svick If a particular class that implements IDisposable
does not _need_ to be disposed, I would always consider that an implementation detail, never to be part of the public documentation. Clients should honor the IDisposable
interface, or ignore it at their (our) own peril. The implementation must be able to change, in my opinion.
How about having a MemoryStreamPool
that provides a MemoryStream
backed by a pooled array, and having very clear Return()
or Dispose()
requirements? The MemoryStreamPool.Shared
instance could use arrays from ArrayPool<byte>.Shared
, and constructed instances could create their own pool.
@Timovzl Are you saying you Dispose()
all Task
s in your async
code?
I think that in the cases where Dispose
does not do anything, it's just unnecessary ceremony. But since it's hard to find out whether that's the case, and whether it will stay that way, I think documentation should make this clear.
Just adding another voice to this, given the widespread use of MemoryStream
in our apps the ability to reduce (generally quite large) allocs would be great. I'm comfortable with an opt-in approach if required, although the 2.1 release showed how awesome it is when 'things just get faster'.
re: Disposing memory streams: There's so much tooling, helpers, VS Add-Ins, guidance etc that all falls to a pit of 'always dispose your disposables'. We treat any undisposed (especially I/O) class as a red flag in code reviews, and feel dirty when forced to hand such things off with no concrete confirmation of cleanup ( MVC's FileResult() is a great example there)
@Timovzl Are you saying you Dispose() all Tasks in your async code?
@svick I uh... always use ValueTask
! :P Just kidding, you had me there.
Still, I maintain my position, exceptions notwithstanding. As this very issue demonstrates, allowing our class documentation to say, "hey, you may use _this_ class without dispose, just for laziness", puts us in this awkward position when the implementation can no longer support that liberty.
Even if disposing is not necessary, why share this in the documentation at all? Why let people skip dispose on _some_ Stream types? Just have them disposed as usual, even where it was not strictly necessary.
And then we can keep the exceptions minimal, such as for Task
. The fewer exceptions, the less to remember. The fewer exceptions, the fewer questions raised by code reviewers, such as when they spot undisposed Streams. And our implementations are free to change.
@markrendle , if you need MemoryStreamPool - you can just use
Microsoft.IO.RecyclableMemoryStream.
Looks like it has all that you want:
@imanushin RecyclableMemoryStream
looks interesting. Note that its GetBuffer()
method returns the actual (current) buffer. It matches the docs' current documentation and gives you the power to do things - to do things wrong, too. Of course, in part, this was already the case with MemoryStream
, but now you can retain access to a buffer that will be returned to the pool and used by something else. I'm not too fond of this potential for misuse. Granted, the alternative isn't great...
It uses chunks internally (which is nice to re-send huge files)
Maybe something that could be considered here also.
It uses chunks internally (which is nice to re-send huge files)
Maybe something that could be considered here also.
Using chunks internally seems inconsistent with supporting GetBuffer().
I do think using chunks internally is interesting to consider, but I'm not sure how to square this with the GetBuffer() API; seems like we might need a separate type for this.
It gets a new larger contiguous buffer if necessary to present a single buffer with all content:
The only problem there is that the buffer might be from the pool, and now you have a second potential owner of said buffer. Would be safer to just treat GetBuffer() like ToArray() with a fresh non-pooled array.
Would be safer to just treat GetBuffer() like ToArray() with a fresh non-pooled array.
In that case, at the very least an array from the array pool could be returned. That leaves the user with the option of returning it, making the code mostly allocation-free. As for users unaware of this... renting an array from the array pool and not returning it has no downsides over just allocating a new array, right?
Good point, as long as the MemoryStream doesn't return it when Disposed in that case. That's the tricky part of it... the buffer can't be owned by both the MemoryStream and the consumer because it's fine if neither returns it to the pool but bad if one returns it and the other continues using it.
Good point, as long as the MemoryStream doesn't return it when Disposed in that case [...] because it's fine if neither returns it to the pool but bad if one returns it and the other continues using it.
Agreed. Once you call GetBuffer()
, you become the owner of the array. Either you return it neatly with ArrayPool<byte>.Return()
(best reuse) or you let it be garbage collected (nothing goes wrong, at least).
This does require some attention to the single-block case too. Do we implement a special case to relinquish ownership of the single block? That seems hard to keep track of. We might need to switch to "large buffers" as soon as GetBuffer()
is invoked.
Any clever ideas? It would be so nice if the single-block case could stay copy-free...
Would be safer to just treat GetBuffer() like ToArray() with a fresh non-pooled array.
GetBuffer
allows you to also change the data that is in the MemoryStream
, which would not be the case if a fresh non-pooled array was returned.
Using chunks internally seems inconsistent with supporting GetBuffer().
GetBuffer
could be disallowed, similar as when MemoryStream
is constructed with publiclyVisible
to false using:public MemoryStream (byte[] buffer, int index, int count, bool writable, bool publiclyVisible);
public MemoryStream(ArrayPool<byte> pool, bool publiclyVisible)
When publiclyVisible
is set to false
, the implementation can use chunks.
GetBuffer
, the implementation could automagically switch to non-chunked mode.The more I've thought about this, the more I think it needs to be done separately from MemoryStream:
Changing the MemoryStream behavior from growing an array to maintaining a linked list of arrays has observable performance impact
Also the GetBuffer() issues mentioned above.
@tmds
GetBuffer allows you to also change the data that is in the MemoryStream, which would not be the case if a fresh non-pooled array was returned.
Actually, with the above discussion about this, the intent was that the returned array _becomes_ the stream's own buffer (at least until you write to the stream), so changing the data certainly works. Hopefully the discussion makes more sense with that in mind.
When backed by a chunks, GetBuffer could be disallowed [...]
I imagine a less ideal GetBuffer
(that sometimes needs to do copying) is preferable to one that sometimes simply does not work. The caller doesn't necessarily know when it's using small vs. large buffers. Having the method throw _sometimes_, in a way that is nontransparent to the caller, would render the method a lot less usable.
Or, when someone calls GetBuffer, the implementation could automagically switch to non-chunked mode.
That would be the easiest. It is a bit disappointing that the simple case of a single chunk would then involve copying, so I'm still holding out hopes that we can come up with something more clever. But it might be acceptable.
The more I've thought about this, the more I think it needs to be done separately from MemoryStream:
I think a separate, more suitable API would certainly be worth pursuing. I'm wondering, when we wish to supply a low-allocation implementation, in how many cases we _must_ pass a MemoryStream
. Can we practically always get away with passing any Stream
? Or do we have example use cases where MemoryStream
is a requirement?
If a MemoryStream
tends to be the requirement, we should probably tackle both issues, but separately. If any Stream
will do, then let's forget about MemoryStream
and make something ideal.
I'm not fond of the idea of honoring API misuse. If you neglect to dispose an IDisposable
object, invoke methods on a disposed object, or concurrently access an object that is not thread-safe... then there may be consequences - now, or when the implementation changes.
If a type unrelated to MemoryStream
turns out to be the way to go anyway, then it's a plus that API misuse isn't punished. But should that consideration be a priority?
Admittedly, as for not disposing, the documentation says that's allowed, so that needs to be addressed. (Strange implementation detail to mention in the documentation, but I digress.) If we can make sure that _not_ disposing merely degenerates to the equivalent of using non-pooled streams (analog to not returning arrays rented from an array pool), then this is a non-issue, right? Potential gains, with no potential losses.
If we can make sure that not disposing merely degenerates to the equivalent of using non-pooled streams (analog to not returning arrays rented from an array pool), then this is a non-issue, right?
Arrays from the pool are more "valuable" than others, as they're much more likely to have been around for longer, be in higher generations, etc. There's also been discussion about using pinned/aligned arrays in the pool. So taking arrays from the pool and not returning them is generally worse on a system than allocating new ones, degrading other unrelated components using the pool.
I'm not fond of the idea of honoring API misuse. If you neglect to dispose an IDisposable object, invoke methods on a disposed object, or concurrently access an object that is not thread-safe... then there may be consequences - now, or when the implementation changes.
Not disposing a MemoryStream is not misuse, nor is using ToArray after disposal.
Concurrent use is misuse, but at the same time, MemoryStream is used by millions of libraries and applications; there is guaranteed to be misuse. And we can't afford to introduce changes into such a type that could introduce a myriad of difficult to diagnose race conditions in entirely unrelated parts of the app, e.g. misuse of a MemoryStream over here causes sensitive data to be leaked to an http response over there. We need to be cognizant of such issues for new APIs, but it's a much bigger deal for existing ones, especially ones as core s MemoryStream.
Arrays from the pool are more "valuable" than others, as they're much more likely to have been around for longer, be in higher generations, etc. There's also been discussion about using pinned/aligned arrays in the pool. So taking arrays from the pool and not returning them is generally worse on a system than allocating new ones, degrading other unrelated components using the pool.
I did not realize! That does limit our options... and support your proposition to not inherit from MemoryStream
.
I'm still very curious if we even have examples where inheriting from MemoryStream
poses an advantage. I don't seem to recall any APIs that take one as a parameter.
I don't seem to recall any APIs that take one as a parameter.
To my knowledge there aren't any public APIs in the core libraries strongly-typed to accept a MemoryStream. No doubt there are cases outside of the core libraries.
Most helpful comment
@svick I uh... always use
ValueTask
! :P Just kidding, you had me there.Still, I maintain my position, exceptions notwithstanding. As this very issue demonstrates, allowing our class documentation to say, "hey, you may use _this_ class without dispose, just for laziness", puts us in this awkward position when the implementation can no longer support that liberty.
Even if disposing is not necessary, why share this in the documentation at all? Why let people skip dispose on _some_ Stream types? Just have them disposed as usual, even where it was not strictly necessary.
And then we can keep the exceptions minimal, such as for
Task
. The fewer exceptions, the less to remember. The fewer exceptions, the fewer questions raised by code reviewers, such as when they spot undisposed Streams. And our implementations are free to change.