DEBUG and RELEASE mode@pekspro
I changed my code to use your new LoadAsync overload tosurprisingly find, that your's is noticeably slower.
When loading & resizing 2 random pictures (SaveAsync vs. SaveAsJpeg makes no difference), here is my "manual" benchmark (average after 30 executions):
Mine/Yours (ms):
300/550
180/300
I'm not so much into async I/O, so I have no explanation, but perhaps you have one and are able to further optimize this beautiful library.
I ignored initial file loading from disk, so what the benchmarks show is probably loading from Windows filesystem cache.
The environment is a ASP.NET Core MVC app.
The calls:
// My implementation: return File(await image.LoadFromDisk(requestDimensions), "image/jpeg", $"{image.Name}-{requestDimensions}.jpg");
// Yours: return File(await image.LoadFromDiskAsync(requestDimensions), "image/jpeg", $"{image.Name}-{requestDimensions}.jpg");
The code:
-Integration of your overload:
var outputStream = new MemoryStream();
using (var image = await Image.LoadAsync(Configuration.Default, Url, Config.JpegDecoder)) {
image.Mutate(i => i.Resize(new ResizeOptions {
Size = new Size(requestedDimensions[0], requestedDimensions[1]),
Mode = ResizeMode.BoxPad,
Sampler = Config.ImageResampler,
}));
await image.SaveAsync(outputStream, Config.ImageJpegEncoder);
outputStream.Seek(0, SeekOrigin.Begin);
-Mine:
await using var fs = File.OpenRead(Url);
var result = new byte[fs.Length];
await fs.ReadAsync(result, 0, (int)fs.Length);
using var image = Image.Load<Rgba32>(result, Config.JpegDecoder);
image.Mutate(i => i.Resize(new ResizeOptions { /* same as above*/ }));
var outputStream = new MemoryStream();
image.SaveAsJpeg(outputStream, Config.ImageJpegEncoder);
outputStream.Seek(0, SeekOrigin.Begin);
I don't think this is a fair comparison...
Your version is a single read to a buffer from the file then loading the image via that buffer.
Our version will involve several incremental reads from a filestream which will involve a lot more OS overhead.
I'd like to see a proper benchmark doing the same operation in BenchmarkDotNet before taking this any further.
@JimBobSquarePants
Maybe you're right, my apologies if I'm on the wrong track.
Perhaps I miss too much what the result of a Load(Async) is.
Feel free, to close this issue then.
My use case is to read a picture, resize it, and send it to some browser.
So I thought it would be better to hand over the stream handling to ImageSharp.
But that seems not to be the trick for my use case then.
But to clarify: my "benchmarks" include reading the file into the buffer.
A slightly better profiling shows me, that the CPU times spent are nearly the same for both implementations, but the sync time is doubled using pekspro's LoadAsync. That should be the OS overhead you mentioned, right?
@JimBobSquarePants I think it's a valid user concern if our out of box async version is significantly slower than the sync variant.
Our version will involve several incremental reads from a filestream which will involve a lot more OS overhead.
If that is the root cause of the slowdown, we may consider increasing the buffer size when copying the stream.
I think it's a valid user concern if our out of box async version is significantly slower than the sync variant.
@antonfirsov I don't think this is what the benchmark is highlighting. Both approaches are using sync under the cover, it's just that @TedStryker 's approach does a single read from the filestream which means we read a buffer back memorystream instead of a filestream.
If that is the root cause of the slowdown, we may consider increasing the buffer size when copying the stream.
You mean when copying to our internal buffer in the buffered reader? It's sitting at 8192 bytes just now. wasn't sure how big I could go with it. .NET BufferedStream uses 4096 by default and Java's BufferedReader uses 8192.
Old Implementation
https://github.com/SixLabors/ImageSharp/blob/c0594f7031e76ac862ea5f077d707df76ed6560b/src/ImageSharp/IO/DoubleBufferedStreamReader.cs#L22
New implementation.
https://github.com/SixLabors/ImageSharp/pull/1269/files#diff-3c4ba2ce2938362e5332e5fe097a7b91R20
@JimBobSquarePants based on your comment, I mean more this line:
https://github.com/SixLabors/ImageSharp/blob/dd4835f77ce5bcd56a57301822097577f486c1d8/src/ImageSharp/Image.FromStream.cs#L717
I would say, let's keep this open, since there is likely a perf issue that is solvable, but doesn't seem like a critical thing blocking V1.0.
@TedStryker if you could make us BenchmarkDotNet benchmarks comparing the two variants you have + a sync-only variant, that would help us a lot adressing this!
@antonfirsov Yes, I'll try. Have to familiarize myself with BenchmarkDotNet first, though.
Wow, that one wasn't too easy. I've learned things... :-)
A BDN console app project was easy to develop.
That one showed no differences anymore! "Can't be!" I thought, and digged deeper.
Finding: Stopwatch is "accurate" in Debug and Release config concerning the configs, showing noticeably relative timing differences.
I opened this issue probing around with my ASP.NET Core project.
So my next try was to use BDN with my ASP.NET app. Uh, kinda hassle to get that fish swimming.
It seems not to be enough to dotnet run -c Release, but one has to set the environment variable "ASPNETCORE_ENVIRONMENT" to "Release" also. Yeah, know your tooling ;-)
However. I managed to bring the differences to show up in BDN.
I ran BDN with no VS and Windows Defender disabled.
Main finding (processing the same JPEG picture):
-Your sync, async, and my impl are equal fast processing on a local storage (I tested only HDD, not SSD).
-Your sync, async, and my impl are noticeably different processing the resource on my NAS.
Pecularities (have a look at the histograms):
-Those "rhythms" when accessing resources on the NAS, e.g.: (triplet 500/800/800)ms with ImageSharp's implementation (but not with my impl.) in the console app and the (quadruplet 200/300/300/300)ms with my implementation in ASP.NET and a similar rhythm in your impl (triplet 400/400/700)ms.
Hopefully you have an idea why this is the case. I don't.
ConsoleApp_BenchmarkDotNet.Artifacts.zip
AspNet_BenchmarkDotNet.Artifacts.zip
/Edit:
Why is ASP.NET faster than a console app? More aggressive caching can't be. I previously ran the benchmarks with taskmanager running. No disk access for local ressources, network access for NAS resources: looks all the same.
Thanks @TedStryker we really appreciate you making the effort.
Here's my understanding of the results....
Why is ASP.NET faster than a console app
No idea at all. Could be background tasks running, anything. Microbenchmarks are incredibly susceptible to their environment.
Now to the rest of it... The rest of the results appear to support my initial understanding.
For each example the loads are passed through the sync pipeline without any internal copying because the input stream is seekable. (All loading goes though this method) We actually do the same thing you do and copy the entire stream to memory in the case of non-seekable streams so theoretically that would perform the same.
In short the difference is due to synchronization overheads when reading streams.
You do one big read into memory before parsing, while we do several as we parse the stream to decode it. Yours is faster, but more memory intensive as you allocate that buffer (That's avoidable with pooling).
When we load an image from your local file inputs the difference is less noticeable since there will be little synchronization required. When you load over the network though, those incremental reads begin to add up and we see the major difference you pointed out.
We do buffer our stream reading but it's a small buffer which I chose the size of based upon existing framework code.
https://github.com/SixLabors/ImageSharp/blob/ef5e21bb2774b82edfbb9c67286a99861f6e3b4a/src/ImageSharp/IO/BufferedReadStream.cs#L20
I do wonder whether we should actually increase that default given our use case? @antonfirsov I'd love to know what you think there? I kept the current value following benchmarking the jpeg decoder (larger 16kb buffer meant little change) but those benchmarks are rather contrived.
Just to make 100% sure I understand the problem correctly: in the NAS case you are opening a (seekable) FileStream from an SMB share?
In that case @JimBobSquarePants's analysis is likely correct. People are hitting similar findings by comparing StreamWriter performance to a simple file copy.
I don't think there is a "good value" for buffer sizes, since all depends on the use case, and what's good for SMB, might be bad for other - more common - use cases.
To address this, I would expose a property StreamProcessingBufferLength on Configuration with default value 8192:
```C#
public class Configurartion
{
// New member:
private int streamProcessingBufferLength = 8192;
// New property:
public int StreamProcessingBufferLength
{
get => streamProcessingBufferLength;
set { /* validate value */; streamProcessingBufferLength = value; }
}
}
```
It shall be then consumed by BufferedReadStream (turn the constant to a member + constructor parameter), and by other stream copy calls like this. For network stream processing, the user can change the value to 64k which is maximum buffer size of SMB.
The code change is pretty straightforward, what's hard for us is to do the benchmarks to prove the change actually improves perf for SMB reads. We need to create an actual network share, which means grepping a second PC (or a cloud server) creating the share etc. Considering other higher-priority issues, I don't think it will be done anytime soon by the team. We would be very happy to accept a PR though.
@TedStryker since you have the infrastructure in place, are you interested making the experiment and the PR?
@antonfirsov @JimBobSquarePants
Yes, the pics were accessed through a SMB share on my QNAP NAS. And it makes sense for me, that the SMB overhead could be the root cause.
"@TedStryker since you have the infrastructure in place, are you interested making the experiment and the PR?"
Yeah, sure! I'm so glad to use your library in my little hobby project, I want to give something back.
It will take some time, though. I've to fulfill the requirements of a PR here at least, get used to Git (TFS only here...) :-)
Perhaps I will do the test with some other device connected via SMB, coz those QNAPs are busy doing stuff all the time (taking care of themselves or whatnot).
@TedStryker if it's easier, I can smash together the PR and you could test it by building the PR locally?
@JimBobSquarePants
Yeah, that would be great and would save everybody'ss time coz your repo is not some random repo. It has strict rules for a good reason for a PR and teaching me my mistakes could be an avoidable effort ;-)
@TedStryker PR is in place now.