DEBUG and RELEASE modeIn use cases where many Image<>s are being constructed and expected to be produced at low latency, an overhead can currently be experienced due to the presence of Parallel.For in the pixel clearing code.

Even in the case where task inlining occurs, an overhead of around 65% can be observed. The case in this profiling involves 1x100 image created and populated each game frame.
Two points here:
Parallel.For should probably not be used under a certain threshold of image pixels as the overhead is larger than any benefit.Clear in such cases (although I could understand if this is not accepted, as it could potentially be a security concern in a multi-tenant environment).On advice, I did try converting my image to 100x1 to reduce parallelism (since the Parallel.For is done per row) but this has no effect. The overhead was already occurred from the Parallel.For's task paradigm.
This is a low priority proposal as I can restore my local pooling of Image<>s rather than constructing a new one each time (probably necessary anyway to avoid object alloc overheads anyway).
Profile CPU usage for a loop like:
for (int i = 0; i < 10000; i++)
using (var test = new Image<Rgba32>(100,1);
.net core 2.1, latest ImageSharp 1.0.0-beta0005
Did a quick benchmark. If we replace the parallel code in this scenario with a simple fill, it's 10x faster.
Need to check other dimensions but since Fill is SIMD optimized, it might well be best to simply use that in all circumstances.
c#
internal void Clear(TPixel value)
{
Span<TPixel> pixels = this.GetPixelSpan();
pixels.Fill(value);
}
Most helpful comment
Did a quick benchmark. If we replace the parallel code in this scenario with a simple fill, it's 10x faster.
Need to check other dimensions but since
Fillis SIMD optimized, it might well be best to simply use that in all circumstances.c# internal void Clear(TPixel value) { Span<TPixel> pixels = this.GetPixelSpan(); pixels.Fill(value); }