We need cheap bulk-conversion between the following formats:
Argb32, Bgra32, Rgba32, Bgr24, Rgb24
We are especially interested in Rgb24, Bgr24 <=> Rgba32. I'm having sweat dreams about a community PR dealing with this problem. Should be easy for anyone having basic knowledge of System.Runtime.Intrinsics.
Implementation should be added to new span-based methods in PixelConverter.cs, those could be invoked than from T4 generated PixelOperations<TPixel> implementors like here:
https://github.com/SixLabors/ImageSharp/blob/78a584e8482b052d7a9885682299e2f37518d83d/src/ImageSharp/PixelFormats/PixelImplementations/Generated/Rgb24.PixelOperations.Generated.cs
If a PR would only add the PixelConverter helpers + tests, I'm happy to provide guidance or even finish the code for the rest of the work.
@john-h-k @Sergio0694 any chance you are interested?
Tasks:
SimdUtilsSimdUtilsSimdUtilsRgba32 compatible pixel operations to utilize new shuffle methods.Rgb24 compatible pixel operations to utilize new shuffle methods.I'm gonna have a look at this.
Btw, the Span<Vector4> <=> Span<TPixel> conversions are delegating the (pad/slice) shuffle work to PixelOperations<TPixel> conversion methods:
Which means that the last two steps will be done automatically, if I'm not missing anything. Can't wait to see the Vector4 <=> Rgb24 before/after comparison. (Which is the main goal of this issue because of the conversion steps in Jpeg decoder and ResizeProcessor.)
@antonfirsov need to add a specific benchmark for that but I know for certain that even my Rgba32 <==> Rgb24 fallback is a lot faster than the original.
@antonfirsov Here you go!
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.403
[Host] : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
Job-OIBEDX : .NET Framework 4.8 (4.8.4250.0), X64 RyuJIT
Job-OPAORC : .NET Core 2.1.23 (CoreCLR 4.6.29321.03, CoreFX 4.6.29321.01), X64 RyuJIT
Job-VPSIRL : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
IterationCount=3 LaunchCount=1 WarmupCount=3
| Method | Job | Runtime | Count | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |----------- |-------------- |------ |-----------:|------------:|----------:|------:|--------:|-------:|------:|------:|----------:|
| PixelOperations_Base | Job-BPVKME | .NET 4.7.2 | 64 | 278.4 ns | 6.89 ns | 0.38 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-BPVKME | .NET 4.7.2 | 64 | 310.2 ns | 87.58 ns | 4.80 ns | 1.11 | 0.02 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-FBIBGB | .NET Core 2.1 | 64 | 231.9 ns | 254.31 ns | 13.94 ns | 1.00 | 0.00 | 0.0052 | - | - | 24 B |
| PixelOperations_Specialized | Job-FBIBGB | .NET Core 2.1 | 64 | 230.2 ns | 27.06 ns | 1.48 ns | 0.99 | 0.05 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-CETXPV | .NET Core 3.1 | 64 | 215.5 ns | 52.24 ns | 2.86 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-CETXPV | .NET Core 3.1 | 64 | 236.7 ns | 553.01 ns | 30.31 ns | 1.10 | 0.13 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-BPVKME | .NET 4.7.2 | 256 | 1,022.3 ns | 3,570.70 ns | 195.72 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-BPVKME | .NET 4.7.2 | 256 | 622.5 ns | 26.76 ns | 1.47 ns | 0.62 | 0.11 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-FBIBGB | .NET Core 2.1 | 256 | 762.3 ns | 103.78 ns | 5.69 ns | 1.00 | 0.00 | 0.0048 | - | - | 24 B |
| PixelOperations_Specialized | Job-FBIBGB | .NET Core 2.1 | 256 | 498.1 ns | 70.87 ns | 3.88 ns | 0.65 | 0.00 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-CETXPV | .NET Core 3.1 | 256 | 754.0 ns | 37.92 ns | 2.08 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-CETXPV | .NET Core 3.1 | 256 | 436.8 ns | 21.88 ns | 1.20 ns | 0.58 | 0.00 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-BPVKME | .NET 4.7.2 | 2048 | 5,679.3 ns | 1,454.37 ns | 79.72 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-BPVKME | .NET 4.7.2 | 2048 | 3,460.6 ns | 273.43 ns | 14.99 ns | 0.61 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-FBIBGB | .NET Core 2.1 | 2048 | 6,033.8 ns | 8,785.67 ns | 481.57 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-FBIBGB | .NET Core 2.1 | 2048 | 3,421.3 ns | 376.64 ns | 20.64 ns | 0.57 | 0.04 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-CETXPV | .NET Core 3.1 | 2048 | 5,542.3 ns | 790.31 ns | 43.32 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-CETXPV | .NET Core 3.1 | 2048 | 2,972.2 ns | 70.72 ns | 3.88 ns | 0.54 | 0.00 | - | - | - | - |
| Method | Job | Runtime | Count | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |----------- |-------------- |------ |-----------:|------------:|----------:|------:|--------:|-------:|------:|------:|----------:|
| PixelOperations_Base | Job-OIBEDX | .NET 4.7.2 | 64 | 298.4 ns | 33.63 ns | 1.84 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-OIBEDX | .NET 4.7.2 | 64 | 355.5 ns | 908.51 ns | 49.80 ns | 1.19 | 0.17 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-OPAORC | .NET Core 2.1 | 64 | 220.1 ns | 13.77 ns | 0.75 ns | 1.00 | 0.00 | 0.0055 | - | - | 24 B |
| PixelOperations_Specialized | Job-OPAORC | .NET Core 2.1 | 64 | 228.5 ns | 41.41 ns | 2.27 ns | 1.04 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-VPSIRL | .NET Core 3.1 | 64 | 213.6 ns | 12.47 ns | 0.68 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-VPSIRL | .NET Core 3.1 | 64 | 217.0 ns | 9.95 ns | 0.55 ns | 1.02 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-OIBEDX | .NET 4.7.2 | 256 | 829.0 ns | 242.93 ns | 13.32 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-OIBEDX | .NET 4.7.2 | 256 | 448.9 ns | 4.04 ns | 0.22 ns | 0.54 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-OPAORC | .NET Core 2.1 | 256 | 863.0 ns | 1,253.26 ns | 68.70 ns | 1.00 | 0.00 | 0.0048 | - | - | 24 B |
| PixelOperations_Specialized | Job-OPAORC | .NET Core 2.1 | 256 | 309.2 ns | 66.16 ns | 3.63 ns | 0.36 | 0.03 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-VPSIRL | .NET Core 3.1 | 256 | 737.0 ns | 253.90 ns | 13.92 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-VPSIRL | .NET Core 3.1 | 256 | 212.3 ns | 1.07 ns | 0.06 ns | 0.29 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-OIBEDX | .NET 4.7.2 | 2048 | 5,625.6 ns | 404.35 ns | 22.16 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-OIBEDX | .NET 4.7.2 | 2048 | 1,974.1 ns | 229.84 ns | 12.60 ns | 0.35 | 0.00 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-OPAORC | .NET Core 2.1 | 2048 | 5,467.2 ns | 537.29 ns | 29.45 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-OPAORC | .NET Core 2.1 | 2048 | 1,985.5 ns | 4,714.23 ns | 258.40 ns | 0.36 | 0.05 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-VPSIRL | .NET Core 3.1 | 2048 | 5,888.2 ns | 1,622.23 ns | 88.92 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-VPSIRL | .NET Core 3.1 | 2048 | 1,165.0 ns | 191.71 ns | 10.51 ns | 0.20 | 0.00 | - | - | - | - |
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.572 (2004/?/20H1)
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.403
[Host] : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
Job-XYEQXL : .NET Framework 4.8 (4.8.4250.0), X64 RyuJIT
Job-HSXNJV : .NET Core 2.1.23 (CoreCLR 4.6.29321.03, CoreFX 4.6.29321.01), X64 RyuJIT
Job-YUREJO : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
IterationCount=3 LaunchCount=1 WarmupCount=3
| Method | Job | Runtime | Count | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |----------- |-------------- |------ |-----------:|-------------:|----------:|------:|--------:|-------:|------:|------:|----------:|
| PixelOperations_Base | Job-BPNZYS | .NET 4.7.2 | 64 | 317.7 ns | 125.40 ns | 6.87 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-BPNZYS | .NET 4.7.2 | 64 | 316.4 ns | 70.42 ns | 3.86 ns | 1.00 | 0.03 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-NYROHY | .NET Core 2.1 | 64 | 232.7 ns | 82.61 ns | 4.53 ns | 1.00 | 0.00 | 0.0055 | - | - | 24 B |
| PixelOperations_Specialized | Job-NYROHY | .NET Core 2.1 | 64 | 238.9 ns | 106.11 ns | 5.82 ns | 1.03 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-LSNAID | .NET Core 3.1 | 64 | 228.4 ns | 15.16 ns | 0.83 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-LSNAID | .NET Core 3.1 | 64 | 250.3 ns | 22.79 ns | 1.25 ns | 1.10 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-BPNZYS | .NET 4.7.2 | 256 | 975.5 ns | 1,646.67 ns | 90.26 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-BPNZYS | .NET 4.7.2 | 256 | 1,051.3 ns | 170.43 ns | 9.34 ns | 1.08 | 0.11 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-NYROHY | .NET Core 2.1 | 256 | 793.0 ns | 69.24 ns | 3.80 ns | 1.00 | 0.00 | 0.0048 | - | - | 24 B |
| PixelOperations_Specialized | Job-NYROHY | .NET Core 2.1 | 256 | 846.8 ns | 117.07 ns | 6.42 ns | 1.07 | 0.01 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-LSNAID | .NET Core 3.1 | 256 | 797.2 ns | 342.02 ns | 18.75 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-LSNAID | .NET Core 3.1 | 256 | 640.2 ns | 19.74 ns | 1.08 ns | 0.80 | 0.02 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-BPNZYS | .NET 4.7.2 | 2048 | 6,178.4 ns | 1,537.81 ns | 84.29 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-BPNZYS | .NET 4.7.2 | 2048 | 4,551.2 ns | 257.65 ns | 14.12 ns | 0.74 | 0.01 | 0.0153 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-NYROHY | .NET Core 2.1 | 2048 | 6,621.8 ns | 13,533.62 ns | 741.82 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-NYROHY | .NET Core 2.1 | 2048 | 4,390.4 ns | 184.24 ns | 10.10 ns | 0.67 | 0.07 | 0.0153 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-LSNAID | .NET Core 3.1 | 2048 | 6,357.9 ns | 964.20 ns | 52.85 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-LSNAID | .NET Core 3.1 | 2048 | 2,979.4 ns | 132.81 ns | 7.28 ns | 0.47 | 0.00 | 0.0153 | - | - | 72 B |
| Method | Job | Runtime | Count | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |----------- |-------------- |------ |-----------:|------------:|----------:|------:|--------:|-------:|------:|------:|----------:|
| PixelOperations_Base | Job-XYEQXL | .NET 4.7.2 | 64 | 343.2 ns | 305.91 ns | 16.77 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-XYEQXL | .NET 4.7.2 | 64 | 320.8 ns | 19.93 ns | 1.09 ns | 0.94 | 0.05 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-HSXNJV | .NET Core 2.1 | 64 | 234.3 ns | 17.98 ns | 0.99 ns | 1.00 | 0.00 | 0.0052 | - | - | 24 B |
| PixelOperations_Specialized | Job-HSXNJV | .NET Core 2.1 | 64 | 246.0 ns | 82.34 ns | 4.51 ns | 1.05 | 0.02 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-YUREJO | .NET Core 3.1 | 64 | 222.3 ns | 39.46 ns | 2.16 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-YUREJO | .NET Core 3.1 | 64 | 243.4 ns | 33.58 ns | 1.84 ns | 1.09 | 0.01 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-XYEQXL | .NET 4.7.2 | 256 | 824.9 ns | 32.77 ns | 1.80 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-XYEQXL | .NET 4.7.2 | 256 | 967.0 ns | 39.09 ns | 2.14 ns | 1.17 | 0.01 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-HSXNJV | .NET Core 2.1 | 256 | 756.9 ns | 94.43 ns | 5.18 ns | 1.00 | 0.00 | 0.0048 | - | - | 24 B |
| PixelOperations_Specialized | Job-HSXNJV | .NET Core 2.1 | 256 | 1,003.3 ns | 3,192.09 ns | 174.97 ns | 1.32 | 0.22 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-YUREJO | .NET Core 3.1 | 256 | 748.6 ns | 248.03 ns | 13.60 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-YUREJO | .NET Core 3.1 | 256 | 437.0 ns | 36.48 ns | 2.00 ns | 0.58 | 0.01 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-XYEQXL | .NET 4.7.2 | 2048 | 5,751.6 ns | 704.24 ns | 38.60 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-XYEQXL | .NET 4.7.2 | 2048 | 4,391.6 ns | 718.17 ns | 39.37 ns | 0.76 | 0.00 | 0.0153 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-HSXNJV | .NET Core 2.1 | 2048 | 6,202.0 ns | 1,815.18 ns | 99.50 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-HSXNJV | .NET Core 2.1 | 2048 | 4,225.6 ns | 1,004.03 ns | 55.03 ns | 0.68 | 0.01 | 0.0153 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-YUREJO | .NET Core 3.1 | 2048 | 6,157.1 ns | 2,516.98 ns | 137.96 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-YUREJO | .NET Core 3.1 | 2048 | 1,822.7 ns | 1,764.43 ns | 96.71 ns | 0.30 | 0.02 | 0.0172 | - | - | 72 B |
@antonfirsov Pushed an update that affects FromVector4_Rgb24 based on a suggestion by @Sergio0694 and It's squeezed a few more percentage performance.
| Method | Job | Runtime | Count | Mean | Error | StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---------------------------- |----------- |-------------- |------ |-----------:|------------:|----------:|------:|--------:|-------:|------:|------:|----------:|
| PixelOperations_Base | Job-RPFDIH | .NET 4.7.2 | 64 | 327.7 ns | 179.42 ns | 9.83 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-RPFDIH | .NET 4.7.2 | 64 | 320.8 ns | 21.37 ns | 1.17 ns | 0.98 | 0.03 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-QSMZGQ | .NET Core 2.1 | 64 | 254.8 ns | 337.29 ns | 18.49 ns | 1.00 | 0.00 | 0.0052 | - | - | 24 B |
| PixelOperations_Specialized | Job-QSMZGQ | .NET Core 2.1 | 64 | 245.3 ns | 37.22 ns | 2.04 ns | 0.97 | 0.07 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-YJZMFY | .NET Core 3.1 | 64 | 232.2 ns | 189.28 ns | 10.37 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-YJZMFY | .NET Core 3.1 | 64 | 255.4 ns | 52.39 ns | 2.87 ns | 1.10 | 0.04 | - | - | - | - |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-RPFDIH | .NET 4.7.2 | 256 | 910.1 ns | 293.07 ns | 16.06 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-RPFDIH | .NET 4.7.2 | 256 | 974.4 ns | 490.48 ns | 26.89 ns | 1.07 | 0.05 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-QSMZGQ | .NET Core 2.1 | 256 | 849.7 ns | 1,654.73 ns | 90.70 ns | 1.00 | 0.00 | 0.0048 | - | - | 24 B |
| PixelOperations_Specialized | Job-QSMZGQ | .NET Core 2.1 | 256 | 759.9 ns | 77.06 ns | 4.22 ns | 0.90 | 0.09 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-YJZMFY | .NET Core 3.1 | 256 | 816.1 ns | 56.07 ns | 3.07 ns | 1.00 | 0.00 | 0.0057 | - | - | 24 B |
| PixelOperations_Specialized | Job-YJZMFY | .NET Core 3.1 | 256 | 493.0 ns | 216.79 ns | 11.88 ns | 0.60 | 0.01 | 0.0172 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-RPFDIH | .NET 4.7.2 | 2048 | 6,394.6 ns | 2,077.05 ns | 113.85 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-RPFDIH | .NET 4.7.2 | 2048 | 4,139.6 ns | 276.41 ns | 15.15 ns | 0.65 | 0.01 | 0.0153 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-QSMZGQ | .NET Core 2.1 | 2048 | 6,249.9 ns | 799.28 ns | 43.81 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-QSMZGQ | .NET Core 2.1 | 2048 | 4,020.5 ns | 3,211.19 ns | 176.02 ns | 0.64 | 0.03 | 0.0153 | - | - | 72 B |
| | | | | | | | | | | | | |
| PixelOperations_Base | Job-YJZMFY | .NET Core 3.1 | 2048 | 6,403.5 ns | 4,626.86 ns | 253.61 ns | 1.00 | 0.00 | - | - | - | 24 B |
| PixelOperations_Specialized | Job-YJZMFY | .NET Core 3.1 | 2048 | 1,588.3 ns | 912.20 ns | 50.00 ns | 0.25 | 0.01 | 0.0172 | - | - | 72 B |
Looks great!
@JimBobSquarePants there is one other important metric we need to check to set our expectations for #1410. Can be done by defining 2 new simple benchmark classes (baseline VS SIMD with Count = 2048):
Vector4 -> Rgba32 (baseline) to Vector4 -> Rgb24 (Jpeg decoder pipeline last step)Rgba32 -> Vector4 (baseline) to Rgb24 -> Vector4 (Resize pipeline first step)The smaller the difference the bigger the happiness.
Couldn't we pack directly into Rgb24 by not scaling down in the color converter and packing the planar values as bytes?
Vector4 => Rgba32 is still 2.5x faster on .NET Core 3.1 than Vector4 => Rgb24.
We'd need to have a method that goes direct to be able to cut into that since the pipeline is Vector4 => Rgba32 => Rgb24. Not difficult with hardware intrinsics based on my existing code but likely not fun with the old stuff.
What I dream of is a combination of shuffle + convert.
Vector4=>Rgba32is still 2.5x faster on .NET Core 3.1 thanVector4=>Rgb24.
That's bad news :(
I don't think going Vector4 => Rgb24 is the best approach here:
byte + pack), making it very hard to maintain the codeVector<float> -> Vector<byte> conversion shall be directly integrated into the colorspace conversion code.I'd rather suggest to do the following:
Vector3" buffer (Span<float> of RGB components only, no padding for alpha)SimdUtils.NormalizedFloatToByteSaturate to get the Rgb24 bufferPixelOperations<TPixel> if TPixel != Rgb24=> Pro: likely still very fast, much more predictible amount of work, no regressions on old platforms
(But still a lot of work!)
Yeah... All my shuffle code has touched conversion between pixel formats only. The Vector4 pipeline remains untouched (_other than a speedup for converting to/from Rgba32_)
What do you mean by Color converters? The jpeg ones?
What do you mean by Color converters? The jpeg ones?
Yes I meant that.
Couldn't we pack directly into Rgb24
I think I misunderstood you on this one. I thought you want Jpeg color converters to convert directly into Rgb24, my https://github.com/SixLabors/ImageSharp/issues/1354#issuecomment-721816361 is listing arguments against doing that.
A one-step Vector4 => Rgb24 method can help a bit, but wouldn't expect big miracle from it, choose wisely if you want to invest your time into it or not.
Just realized this was still open.