https://github.com/dotnet/corefx/issues/25354#issuecomment-352854380
```c#
public partial class SendPacketsElement
{
// EXISTING APIs
public SendPacketsElement(byte[] buffer)
public SendPacketsElement(byte[] buffer, int offset, int count)
public SendPacketsElement(byte[] buffer, int offset, int count, bool endOfPacket)
public SendPacketsElement(string filepath)
public SendPacketsElement(string filepath, int offset, int count)
public SendPacketsElement(string filepath, int offset, int count, bool endOfPacket)
public byte[] Buffer { get; }
public bool EndOfPacket { get; }
public string FilePath { get; }
// These properties will throw if constructed from a file with a long offsets.
// NotSupportedException("Count/Offset is larger than int.MaxValue. Use CountLong/OffsetLong instead.");
public int Count { get; }
public int Offset { get; }
// NEW APIs
public long OffsetLong { get; }
public long CountLong { get; }
public SendPacketsElement(string filepath, long offset, long count);
public SendPacketsElement(string filepath, long offset, long count, bool endOfPacket);
// We only take the long ones as this as an advanced scenario and the int
// ones don't buy you anything really.
public SendPacketsElement(FileStream file);
public SendPacketsElement(FileStream file, long offset, long count);
public SendPacketsElement(FileStream file, long offset, long count, bool endOfPacket);
public FileStream FileStream { get; }
}
# Original proposal
`SendPacketsElement`, `SendFile` and `BeginSendFile` currently take a `string` filename to indicate the file (or part thereof) to send. Under the hood, the implementation of these methods creates and manages `FileStream`s. I suggest that you add overloads that take `FileStream`.
This makes it possible for applications to manage file handles directly - for example, to manage the lifetime of open handles, control how the file is accessed, shared and file options, buffer size. For example, turning off the file system cache on read is critical for performance for file servers. In addition, through the `FileStream` constructors it becomes possible to interop with file handles coming from unmanaged code and to use these with the socket API directly.
Concretely:
```csharp
public partial class Socket : System.IDisposable
{
// irrelevant methods omitted
// existing send file API
public IAsyncResult BeginSendFile(string fileName, AsyncCallback callback, object state) { throw null; }
public IAsyncResult BeginSendFile(string fileName, byte[] preBuffer, byte[] postBuffer, TransmitFileOptions flags, AsyncCallback callback, object state) { throw null; }
public void SendFile(string fileName) { }
public void SendFile(string fileName, byte[] preBuffer, byte[] postBuffer, TransmitFileOptions flags){ }
// suggested API additions
public IAsyncResult BeginSendFile(FileStream file, AsyncCallback callback, object state) { throw null; }
public IAsyncResult BeginSendFile(FileStream file, byte[] preBuffer, byte[] postBuffer, TransmitFileOptions flags, AsyncCallback callback, object state) { throw null; }
public void SendFile(FileStream file) { }
public void SendFile(FileStream file, byte[] preBuffer, byte[] postBuffer, TransmitFileOptions flags){ }
}
public partial class SendPacketsElement
{
// existing API - ctors
public SendPacketsElement(byte[] buffer) { }
public SendPacketsElement(byte[] buffer, int offset, int count) { }
public SendPacketsElement(byte[] buffer, int offset, int count, bool endOfPacket) { }
public SendPacketsElement(string filepath) { }
public SendPacketsElement(string filepath, int offset, int count) { }
public SendPacketsElement(string filepath, int offset, int count, bool endOfPacket) { }
// suggested API - ctors
public SendPacketsElement(FileStream file) { }
public SendPacketsElement(FileStream file, int offset, int count) { }
public SendPacketsElement(FileStream file, int offset, int count, bool endOfPacket) { }
// existing API - properties
public byte[] Buffer { get { throw null; } }
public int Count { get { throw null; } }
public bool EndOfPacket { get { throw null; } }
public string FilePath { get { throw null; } }
public int Offset { get { throw null; } }
// suggested API - property
public FileStream FileStream { get { throw null; } }
}
As for the behavior of SendPacketsElement.FilePath vs FileStream, I suggest that depending on which constructor is used only one of them returns a non-null result. This is pretty much the behavior of the other properties (i.e. if a filepath ctor is used, Buffer returns null).
Sounds good as proposed to area expert group. The APIs with FileStream are a bit more advanced - require carefull resource management, but we don't see a reason for not exposing them.
Marking for final API review
The proposal makes sense. We've combined this with dotnet/runtime#22993. Notes:
Count and Offset will have to throw when SendPacketsElement is constructed with long based values that exceed int.MaxValue.FileStream overloads that operate on both int and long so we've only added the long ones.FileStream property will return null when not constructed with a file stream (similar to Buffer and FilePath properties behavior).public partial class SendPacketsElement
{
// EXISTING APIs
public SendPacketsElement(byte[] buffer)
public SendPacketsElement(byte[] buffer, int offset, int count)
public SendPacketsElement(byte[] buffer, int offset, int count, bool endOfPacket)
public SendPacketsElement(string filepath)
public SendPacketsElement(string filepath, int offset, int count)
public SendPacketsElement(string filepath, int offset, int count, bool endOfPacket)
public byte[] Buffer { get; }
public bool EndOfPacket { get; }
public string FilePath { get; }
// These properties will throw if constructed from a file with a long offsets.
// NotSupportedException("Count/Offset is larger than int.MaxValue. Use CountLong/OffsetLong instead.");
public int Count { get; }
public int Offset { get; }
// NEW APIs
public long OffsetLong { get; }
public long CountLong { get; }
public SendPacketsElement(string filepath, long offset, long count);
public SendPacketsElement(string filepath, long offset, long count, bool endOfPacket);
// We only take the long ones as this as an advanced scenario and the int
// ones don't buy you anything really.
public SendPacketsElement(FileStream file);
public SendPacketsElement(FileStream file, long offset, long count);
public SendPacketsElement(FileStream file, long offset, long count, bool endOfPacket);
public FileStream FileStream { get; }
}
FYI: The API review discussion was recorded - see https://youtu.be/ZrT0uOsqQlI?t=393 (48 min duration)
Awesome. Thanks!
hi @kurtschelfthout , I have some questions to better understand the rationale.
This makes it possible for applications to manage file handles directly - for example, to manage the lifetime of open handles
I guess one way of using this is for files that are sent often can be opened once and then the same FileStream can be passed multiple times (even concurrently). Is that is what is meant here?
, control how the file is accessed, shared and file options, buffer size. For example, turning off the file system cache on read is critical for performance for file servers.
Can you elaborate more on this? What default settings (file access, sharing, options, buffer size) used now when opening the file can be improved?
Can you also give some more info on how disabling the file system cache improves performance (what OS)? How do you disable the cache?
@terrajobst @karelz I have some concerns around the addition of CountLong to SendPacketsElement. It seems the main drive is a vague concern about "future proofing". If I had a crystal ball...I'd bet that CountLong will be noise for future API reviewers to rant about ;)
This API is a direct wrapper around a winsock API. It is possible that that API would change to take a long count, but depending on what the Windows API engineers decide - if they even ever touch it - it might take a wholly different form altogether. After all, this API is a Microsoft-specific extension to the specification. This part of town is not exactly densely populated.
In case you were thinking about xplat issues, as far as I know there is no direct counterpart of this API in unix -backed up by e.g. this comment - but interestingly there is for sendfile which on Linux _can_ take a long offset, but the count is of type size_t which is an unsigned type. Same for splice. You scrapped the SendFile additions from the original proposal, but the argument to add a wider count type imo makes more sense there than for SendPacketsElement.
In addition it raises a few implementation problems - not insurmountable but unnecessary. Please keep in mind this is an API for high performance applications exclusively and SocketAsyncEventArgs is sufficiently hard to use that no one is going to accidentally stumble into it. As an example, the idea is that you pre-allocate a pool of SocketAsyncEventArgs objects and then rent them from the pool to do each operation to keep allocations to a minimum. There are also clever tricks you can play to reuse sockets themselves on the OS level, again reducing allocations. I'm just giving these examples because imo this API screams out "big boy license" and needs to stay very close to the metal.
Anyway, if the count is a long greater than int.MaxValue (which needs to be checked now every time the API is used) then we need to chunk it into pieces of at most int.MaxValue size, and create under the hood n transmit packets structures. We also need to check that the end offset is not larger than the file size; we don't want to go OOM creating structures when someone asks for an int64.MaxValue piece to be sent.
Furthermore now how do we interpret the endOfPacket parameter - is it set on each chunk or only on the last chunk?
As a data point, for the file server which I originally investigated this for, count being an int is just not an issue. And if it would ever become an issue, it's easy enough to chunk on the application level.
Thoughts?
@tmds
I guess one way of using this is for files that are sent often can be opened once and then the same FileStream can be passed multiple times (even concurrently). Is that is what is meant here?
Yes, that's what I meant, though in the API review discussion people implied that was nonsense because once you pass the file handle to windows, it essentially owns it. If true, I'd appreciate some elaboration/documentation on that. E.g. the management of disposal of FileStreams in the implementation - see references to this field- would indicate otherwise.
Can you elaborate more on this? What default settings (file access, sharing, options, buffer size) used now when opening the file can be improved?
The current implementation opens like this.
I'm not suggesting that the default settings as currently set in the implementation are changed, to be clear. This is just about getting control in particular scenarios.
A few examples:
Can you also give some more info on how disabling the file system cache improves performance (what OS)? How do you disable the cache?
This is not OS specific - the file system cache keeps blocks that are read in memory in all OS's I know of (ok, that's just windows and linux...). In other words, if you read a lot of data, larger than system memory, this caching is harmful - it starts thrashing - and you'll see slow effective I/O and high CPU usage. Both nix and windows have options to turn off file system cache by opening the file in a special mode. It' s called FILE_FLAG_NO_BUFFERING on windows.
because once you pass the file handle to windows, it essentially owns it. If true, I'd appreciate some elaboration/documentation on that. E.g. the management of disposal of FileStreams in the implementation - see references to this field- would indicate otherwise.
On Linux, the sendfile implementation leaves the current state of the handle. That is: the offset doesn't change, and the handle isn't closed either.
This is important to enable this re-use.
The concurrency aspects and how the operation affects FileStream state need to be defined.
For files that are read from while they are written to, and the server is responsible for keeping writers out of the way of readers, I'd like to be able to open with FileShare.ReadWrite.
Does that means you are sending parts of files which are being modified concurrently?
The documentation specifies that performance can be improved using FILE_FLAG_SEQUENTIAL_SCAN.
It seems meaningful to add this flag as the default then?
If you pass -1 as offset, the offset to send from is the current position of the file; this may be useful when you're already manipulating the file by reading/writing to it.
We need to consider cross-platform aspects here and how this works when doing concurrent sends of the same FileStream.
if you read a lot of data, larger than system memory, this caching is harmful - it starts trashing - and you'll see slow effective I/O and high CPU usage
Do you have some more info (links? Windows/Linux). It sounds counter-intuitive that a cache causes slowdowns. Also, do we know this to affect the system sendfile functionality?
Do you have some more info (links? Windows/Linux). It sounds counter-intuitive that a cache causes slowdowns.
Ha. Yes, caching seems like an easy problem :joy: Just search for file system cache thrashing.
@kurtschelfthout are you interested in this implementation?
@danmosemsft Yes, I've been planning to attempt a PR. There's some stuff I want/need to do first though, so I can't give you a timeframe.
That said, I don't want to give the impression I "claim" this issue; if someone else gets to it first I'd be happy :) (not that I think that's likely, this is pretty niche...)
Sounds good. Nobody else will take it without posting here.
@kurtschelfthout, you mentioned some options to be used with the new FileStream overload: FILE_FLAG_SEQUENTIAL_SCAN, FILE_FLAG_NO_BUFFERING. Should these be set in the existing string filepath implementation?
The state of the FileStream (i.e. offset) should not change to allow concurrent use/reuse of the file handle. This should behave the same across all platforms.
@kurtschelfthout Do you plan to handle non-Windows in your PR? If not, I can implement it.
you mentioned some options to be used with the new FileStream overload: FILE_FLAG_SEQUENTIAL_SCAN, FILE_FLAG_NO_BUFFERING. Should these be set in the existing string filepath implementation?
No, I don't think so. You only want to set those in particular use cases, esp. NO_BUFFERING.
I'm not planning to do any non-Windows implementation; the SendPackets API is pretty much a wrapper around the Windows API. From what I've seen (which may be the wrong place or whatever), to get the linux implementation even to what there is today would require pretty significant work to begin with.
No, I don't think so. You only want to set those in particular use cases, esp. NO_BUFFERING.
What about FILE_FLAG_SEQUENTIAL_SCAN? Would that make sense?
From what I've seen (which may be the wrong place or whatever), to get the linux implementation even to what there is today would require pretty significant work to begin with.
It looks like its implemented here:
It looks like its implemented here
Yes. For any new overloads to be exposed here, we'll need them implemented for both Windows and Unix. Thanks.
What about FILE_FLAG_SEQUENTIAL_SCAN? Would that make sense?
Not to me...
It looks like its implemented here:
Cool. Totally overlooked that.
What about make next properties private?
c#
internal string _filePath;
internal byte[] _buffer;
internal int _offset;
internal int _count;
internal SendPacketsElementFlags _flags;
And add public SendPacketsElementFlags Flags { get; } property?
Can I start to implement this?
And I have one question!
It seems that internal struct TransmitPacketsElement does not support to transmit packets of long length max is uint length.
So should I limit CountLong property to uint.MaxValue or may be change this property to uint type or something else?
@AlexRadch based on @kurtschelfthout he is fine with someone else (you) taking it. Assigned.
Just a warning that we are locking down API surface for 2.1 for next few weeks. Your PR may be blocked to wait for master to reopen for post-2.1 API additions.
I created PR for this issue. @karelz can you suggest me some work for API 2.1?
Here's up-for-grabs 2.1 query. I don't think we have any API work for 2.1 at this moment.
The LongCount is a mismatch with the platform APIs.
Windows and 32-bit Linux accept a 32-bit length parameter.
Perhaps we should drop the CountLong from this API? The original requests from @kurtschelfthout were to provide OffsetLong and FileStream overloads. There was no request for a CountLong.
CC @geoffkizer @AlexRadch
Perhaps we should drop the CountLong from this API? The original requests from @kurtschelfthout were to provide OffsetLong and FileStream overloads. There was no request for a CountLong.
That seems entirely reasonable to me.
The PR dotnet/corefx#27839 was abandoned. Marking the issue as up-for-grabs again. If anyone picks it up, feel free to pick up the unfinished PR.
I'm picking this up now.
@karelz Now the PR is merged, would you mind giving some color on how/when this will get released? Does it automatically go into .NET Core 3.0, for example?
Fixed in dotnet/corefx#33665, closing.
It will be part of .NET Core 3.0 - that's where master flows.
Great contribution @AlexRadch and @kurtschelfthout !
Most helpful comment
Fixed in dotnet/corefx#33665, closing.
It will be part of .NET Core 3.0 - that's where master flows.