When using application/octet-stream and raw byte[] input to controller, one must use an InputFormatter. Processing an incoming stream on the request body and passing that as a byte array to the InputFormatterResult provides some significantly poor performance. Please see my code below.
First, a simple controller method using a byte array.
``` C#
[HttpPost]
[Consumes("application/octet-stream")]
public async Task
Second, our custom InputFormatter simply takes the incoming Request body, writes it out to a memory stream to get a byte array that we pass in the result
```C#
var fileByteArray = new byte[request.Body.Length];
const int bufferLength = 4096;
var buffer = new byte[bufferLength];
var ms = new MemoryStream();
int len;
while ((len = await request.Body.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
ms.Write(buffer, 0, len);
}
fileByteArray = ms.ToArray();
await ms.FlushAsync();
ms.Close();
return await InputFormatterResult.SuccessAsync(fileByteArray);
Now here is the bad part. When uploading a 120 MB file, the time it takes to go from return await InputFormatterResult.SuccessAsync(fileByteArray) to the intput of the controller is 150 seconds!
To compare, I have another endpoint on this API that using multipart/form-data and the IFormFile interface. Here is the controller signature below:
C#
[HttpPost]
[Consumes("multipart/form-data")]
public async Task<IActionResult> UploadFile(IFormFile file)
The same exact test uploading a 120 MB file to this endpoint processes in 7 seconds.
Our question then is this: why the significant difference? Once the code leaves InputFormatterResult.SuccessAsync it is handled in the .net core plumbing somewhere. Why such poor performance?
I can add this. We have analyzed down to it takes about 1 second per megabyte for the code to return from InputFormatterResult.SuccessAsync to the entry of the controller.
When using application/octet-stream and raw byte[] input to controller, one must use an InputFormatter
You don't have to use a formatter. You can write the code directly into your controller.
Processing an incoming stream on the request body and passing that as a byte array to the InputFormatterResult provides some significantly poor performance. Please see my code below.
Right, it's recommended that you should process large payloads as a Stream not as a byte[]. That requires copying the entire payload into memory first, putting it in the large object heap and other strain on the GC.
That code you wrote is extremely inefficient:
MemoryStreamMemoryStream will need to re-size as you add more data to itToArray() which will allocate another array which copies all of the data from the internal MemoryStream array to the one returned. Ignoring the original 120MB allocation (fileByteArray)On top of that, allocating an object bigger than 85KB lands it onto the large object heap which is only collected with generation 2 GCs (the one that pauses everything). You can imagine that a 120MB byte[] won't scale well if a couple of these happen concurrently.
Now here is the bad part. When uploading a 120 MB file, the time it takes to go from return await InputFormatterResult.SuccessAsync(fileByteArray) to the intput of the controller is 150 seconds!
That's likely several things, small reads, lots of allocations, (client speed?) but in the end you really shouldn't be using a byte[] as convenient as it is to program against, the performance will be horribad.
The same exact test uploading a 120 MB file to this endpoint processes in 7 seconds.
Yea, IFormFile doesn't buffer anything into memory that will result in a large object heap allocation and it does what it can to reduce copies (though it could be made more efficient still)
I can add this. We have analyzed down to it takes about 1 second per megabyte for the code to return from InputFormatterResult.SuccessAsync to the entry of the controller.
I don't think this has anything to do with it. It's all the other things that compound. It's possible there's some step happening in the pipeline before it reaches your method but I'm not sure it explains anything.
Oh another question, why did you want the file upload as a byte[] in the first place?
@davidfowl Thanks for the response. Yes, I realize the code isn't optimized for efficiency. This is a prototype code I have in place. The requirement to use byte[] instead of IFormFile is something out of my hands. And, I am aware that it is recommended to use a Stream for file uploads. And, yes, I am aware of the GC situation.
However, even given the ineffiencies in the code, I have tried no buffer at all and simply copying the request.Body into a byte array in its entirety and then passing it to the InputFormatterResult. That _still_ takes 150 seconds to hit the controller, regardless of my method (either using buffering or not). Reading your comments, I wonder if the 150 seconds it's taking to get to the controller is a symptom simply of the strain on the GC that you mention.
You mention that, "You don't have to use a formatter. You can write the code directly into your controller." Do you have an example of that? I have tried to get a hold of the byte[] directly in the controller without using an InputFormatter with no luck.
Thanks again.
The requirement to use
byte[]instead ofIFormFileis something out of my hands.
That means this project won't scale then? Anything this big is usually streamed somewhere else. Using a byte[] here will destroy the performance of this system. I'd try to figure that part out before trying to "fix" this.
However, even given the ineffiencies in the code, I have tried no buffer at all and simply copying the request.Body into a byte array in its entirety and then passing it to the InputFormatterResult. That still takes 150 seconds to hit the controller, regardless of my method (either using buffering or not).
This sounds like it might have something to do with model binding, or validation running on your byte[] between your formatter and controller? @pranavkm does validation run on byte[]?
You mention that, "You don't have to use a formatter. You can write the code directly into your controller." Do you have an example of that? I have tried to get a hold of the byte[] directly in the controller without using an InputFormatter with no luck.
Copy the code you wrote for the formatter the body of your controller action. Remove the byte[] from the argument of the controller and just handle it manually.
@davidfowl Agree with you 100%. Thanks for the input, your direction on the controller helped me refactor and remove the reliance upon the input formatter. That has gotten me where I need to be. I would certainly be curious what @pranavkm says about the model binding because there is definitely something significant going on between the input formatter result call and entrance to the controller.
This sounds like it might have something to do with model binding, or validation running on your byte[] between your formatter and controller? @pranavkm does validation run on byte[]?
It doesn't - not since 2.2. I'm not entirely sure what could account for the difference. Have you tried setting the log level for Microsoft.AspNetCore to Debug and including timestamps? It should tell you where time's being spent.
@davidfowl you appropriately identified
Anything this big is usually streamed somewhere else.
It is - we are storing the contents to blob storage - could this be a use case for pipelines?
It is - we are storing the contents to blob storage - could this be a use case for pipelines?
You definitely shouldn't be using byte[]. What client APIs are you using?
What client APIs are you using?
We are using UploadFromStreamAsync(stream) from either WindowsAzure.Storage or Microsoft.Azure.Storage.Blob
Perfect, just use the Request.Body then?
C#
public async Task<IActionResult> Upload()
{
await blobClient.UploadFromStreamAsync(Request.Body);
}
Assuming you're not using multipart file content (It didn't look that way from your code).
Thanks for the quick response - we were discussing essentially that simple approach today
Assuming you're not using multipart file content (It didn't look that way from your code).
Correct we are using content-type application/octet-stream for semantic reasons
Glad to see this working out, it'll make your application soar!
@davidfowl thanks for your help - we ended up in a good place. There was no need to work through the input formatter and foist a byte[] on the controller - the arbitrary binary payload is transparent for us (we only need to validate the content-length) and working with the stream is ideal.
Thanks again
Feel free to close this issue
@davidfowl Thanks for the input and help. We're good to go. Closing this issue.
Most helpful comment
Glad to see this working out, it'll make your application soar!