I made a small speedup for dcp apply. Because of the strange results I need some tests from other devs using Intel CPU.
I tested with three files (Pentax K1, Nikon D800 and Sony ILCE 6000).
Measured processing time for each of the files 7 times unpatched and patched.
No other programs have been opened during measuring.
Here the result of the tests on my AMD CPU
dcp apply 1 2 3 4 5 6 7 median median % min max
K1 unpatched 315 315 324 313 260 316 315 315 260 324
D800 unpatched 318 312 325 333 278 314 317 317 278 333
ILCE 6000 unpatched 258 250 244 259 236 252 252 252 236 259
K1 patched 275 241 268 267 269 261 265 267 84,76% 241 275
D800 patched 270 277 280 277 267 262 279 277 87,38% 262 280
ILCE 6000 patched 208 160 201 203 206 201 181 201 79,76% 160 208
Here's the patch:
diff --git a/rtengine/dcp.cc b/rtengine/dcp.cc
index f9cdd1c7..6133824f 100644
--- a/rtengine/dcp.cc
+++ b/rtengine/dcp.cc
@@ -26,7 +26,7 @@
#include "rawimagesource.h"
#include "improcfun.h"
#include "rt_math.h"
-
+#include "StopWatch.h"
using namespace rtengine;
using namespace rtexif;
@@ -1059,6 +1059,7 @@ void DCPProfile::apply(
}
// Convert to ProPhoto and apply LUT
+StopWatch Stop1("dcp loop");
#ifdef _OPENMP
#pragma omp parallel for schedule(dynamic,16)
#endif
@@ -1074,7 +1075,7 @@ void DCPProfile::apply(
float s;
float v;
- if(Color::rgb2hsvdcp(newr, newg, newb, h , s, v)) {
+ if (LIKELY(Color::rgb2hsvdcp(newr, newg, newb, h , s, v))) {
hsdApply(delta_info, delta_base, h, s, v);
@@ -1093,6 +1094,7 @@ void DCPProfile::apply(
img->b(y, x) = work[2][0] * newr + work[2][1] * newg + work[2][2] * newb;
}
}
+Stop1.stop();
}
}
Here you go :
DCP apply|1|2|3|4|5|6|7|median|median %|min|max
----------|-|-|-|-|-|-|-|-|-|--|--
K1 unpatched|411|396|396|395|393|395|391|395| |393|411
K-1 patched|347|332|330|331|330|339|329|331|84%|329|347
Sony ILCE-7RM2 unpatched|489|462|473|461|461|462|473|462| |461|489
Sony ILCE-7RM2 patched|412|401|392|397|399|397|398|398|86%|392|412
D850 unpatched|523|479|480|479|475|486|490|480| |475|523
D850 patched|464|414|414|420|412|407|412|414|86%|407|464
All timings are in ms, of course. I've enabled all DCP's functions, but I don't know if the profiles had everything. At least they all had a base curve.
@Hombre57 Thanks for testing :)
@Hombre57 Your result clearly shows, that my simple patch (compiler hint) als works in your Windows/Intel machine! Thanks!
Fascinating, indeed. I had the impression, those efforts were fruitless with modern branch predictors (and given the fact, that the if part is normally laid out without a jump). So, can you give us an insight on the effect on the assembly level? How are those hints given to the processor? Or is it just clever reordering by the compiler?
Much appreciated,
Fl枚ssie
@Floessie I didn't look at the assembly. I guess it is clever reordering by the compiler, especially in this case where Color::rgb2hsvdcp is inlined and can be reordered too.
@heckflosse Yep, it's better reordering:
--- no_likely.txt 2018-01-12 17:45:58.777835343 +0100
+++ likely.txt 2018-01-12 17:46:54.225845757 +0100
...
673314: e9 17 ff ff ff jmpq 673230 <rtengine::DCPProfile::apply(rtengine::Imagefloat*, int, Glib::ustring const&, rtengine::ColorTemp const&, std::array<double, 3ul> const&, std::array<std::array<double, 3ul>, 3ul> const&, bool) const [clone ._omp_fn.1]+0x2b0>
673319: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
- 673320: 49 8b 85 f0 00 00 00 mov 0xf0(%r13),%rax
- 673327: c5 f8 28 c8 vmovaps %xmm0,%xmm1
- 67332b: c5 f8 28 d4 vmovaps %xmm4,%xmm2
+ 673320: 48 8b 85 f0 00 00 00 mov 0xf0(%rbp),%rax
+ 673327: c5 f8 28 d0 vmovaps %xmm0,%xmm2
+ 67332b: 4c 8b 5d 00 mov 0x0(%rbp),%r11
67332f: c5 f8 28 c6 vmovaps %xmm6,%xmm0
- 673333: 4d 8b 5d 00 mov 0x0(%r13),%r11
- 673337: 4e 8b 0c 20 mov (%rax,%r12,1),%r9
- 67333b: 49 8b 85 b8 00 00 00 mov 0xb8(%r13),%rax
- 673342: 4e 8b 04 20 mov (%rax,%r12,1),%r8
- 673346: 49 8b 85 80 00 00 00 mov 0x80(%r13),%rax
- 67334d: 49 8d 14 19 lea (%r9,%rbx,1),%rdx
- 673351: 4a 8b 3c 20 mov (%rax,%r12,1),%rdi
- 673355: 49 8d 0c 18 lea (%r8,%rbx,1),%rcx
- 673359: 48 01 fb add %rdi,%rbx
- 67335c: e9 cf fe ff ff jmpq 673230 <rtengine::DCPProfile::apply(rtengine::Imagefloat*, int, Glib::ustring const&, rtengine::ColorTemp const&, std::array<double, 3ul> const&, std::array<std::array<double, 3ul>, 3ul> const&, bool) const [clone ._omp_fn.1]+0x2b0>
- 673361: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
- 673368: 49 8b 85 f0 00 00 00 mov 0xf0(%r13),%rax
- 67336f: c5 f8 28 c8 vmovaps %xmm0,%xmm1
- 673373: c5 f8 28 d6 vmovaps %xmm6,%xmm2
- 673377: c5 f8 28 c3 vmovaps %xmm3,%xmm0
- 67337b: 4d 8b 5d 00 mov 0x0(%r13),%r11
- 67337f: 4e 8b 0c 20 mov (%rax,%r12,1),%r9
- 673383: 49 8b 85 b8 00 00 00 mov 0xb8(%r13),%rax
- 67338a: 4e 8b 04 20 mov (%rax,%r12,1),%r8
- 67338e: 49 8b 85 80 00 00 00 mov 0x80(%r13),%rax
- 673395: 49 8d 14 19 lea (%r9,%rbx,1),%rdx
- 673399: 4a 8b 3c 20 mov (%rax,%r12,1),%rdi
- 67339d: 49 8d 0c 18 lea (%r8,%rbx,1),%rcx
- 6733a1: 48 01 fb add %rdi,%rbx
- 6733a4: e9 87 fe ff ff jmpq 673230 <rtengine::DCPProfile::apply(rtengine::Imagefloat*, int, Glib::ustring const&, rtengine::ColorTemp const&, std::array<double, 3ul> const&, std::array<std::array<double, 3ul>, 3ul> const&, bool) const [clone ._omp_fn.1]+0x2b0>
- 6733a9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
- 6733b0: 49 8b 85 f0 00 00 00 mov 0xf0(%r13),%rax
- 6733b7: c5 f8 28 d0 vmovaps %xmm0,%xmm2
- 6733bb: 4d 8b 5d 00 mov 0x0(%r13),%r11
- 6733bf: c5 f8 28 c6 vmovaps %xmm6,%xmm0
- 6733c3: 4e 8b 0c 20 mov (%rax,%r12,1),%r9
- 6733c7: 49 8b 85 b8 00 00 00 mov 0xb8(%r13),%rax
- 6733ce: 4e 8b 04 20 mov (%rax,%r12,1),%r8
- 6733d2: 49 8b 85 80 00 00 00 mov 0x80(%r13),%rax
- 6733d9: 49 8d 14 19 lea (%r9,%rbx,1),%rdx
- 6733dd: 4a 8b 3c 20 mov (%rax,%r12,1),%rdi
- 6733e1: 49 8d 0c 18 lea (%r8,%rbx,1),%rcx
- 6733e5: 48 01 fb add %rdi,%rbx
- 6733e8: e9 43 fe ff ff jmpq 673230 <rtengine::DCPProfile::apply(rtengine::Imagefloat*, int, Glib::ustring const&, rtengine::ColorTemp const&, std::array<double, 3ul> const&, std::array<std::array<double, 3ul>, 3ul> const&, bool) const [clone ._omp_fn.1]+0x2b0>
- 6733ed: 0f 1f 00 nopl (%rax)
- 6733f0: 49 8b 85 f0 00 00 00 mov 0xf0(%r13),%rax
+ 673333: 4e 8b 0c 20 mov (%rax,%r12,1),%r9
+ 673337: 48 8b 85 b8 00 00 00 mov 0xb8(%rbp),%rax
+ 67333e: 4e 8b 04 20 mov (%rax,%r12,1),%r8
+ 673342: 48 8b 85 80 00 00 00 mov 0x80(%rbp),%rax
+ 673349: 49 8d 14 19 lea (%r9,%rbx,1),%rdx
+ 67334d: 4a 8b 3c 20 mov (%rax,%r12,1),%rdi
+ 673351: 49 8d 0c 18 lea (%r8,%rbx,1),%rcx
+ 673355: 48 01 fb add %rdi,%rbx
+ 673358: e9 d3 fe ff ff jmpq 673230 <rtengine::DCPProfile::apply(rtengine::Imagefloat*, int, Glib::ustring const&, rtengine::ColorTemp const&, std::array<double, 3ul> const&, std::array<std::array<double, 3ul>, 3ul> const&, bool) const [clone ._omp_fn.1]+0x2b0>
+ 67335d: 0f 1f 00 nopl (%rax)
+ 673360: 48 8b 85 f0 00 00 00 mov 0xf0(%rbp),%rax
+ 673367: c5 f8 28 c8 vmovaps %xmm0,%xmm1
+ 67336b: c5 f8 28 d4 vmovaps %xmm4,%xmm2
+ 67336f: c5 f8 28 c6 vmovaps %xmm6,%xmm0
+ 673373: 4c 8b 5d 00 mov 0x0(%rbp),%r11
+ 673377: 4e 8b 0c 20 mov (%rax,%r12,1),%r9
+ 67337b: 48 8b 85 b8 00 00 00 mov 0xb8(%rbp),%rax
+ 673382: 4e 8b 04 20 mov (%rax,%r12,1),%r8
+ 673386: 48 8b 85 80 00 00 00 mov 0x80(%rbp),%rax
+ 67338d: 49 8d 14 19 lea (%r9,%rbx,1),%rdx
+ 673391: 4a 8b 3c 20 mov (%rax,%r12,1),%rdi
+ 673395: 49 8d 0c 18 lea (%r8,%rbx,1),%rcx
+ 673399: 48 01 fb add %rdi,%rbx
+ 67339c: e9 8f fe ff ff jmpq 673230 <rtengine::DCPProfile::apply(rtengine::Imagefloat*, int, Glib::ustring const&, rtengine::ColorTemp const&, std::array<double, 3ul> const&, std::array<std::array<double, 3ul>, 3ul> const&, bool) const [clone ._omp_fn.1]+0x2b0>
+ 6733a1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
+ 6733a8: 48 8b 85 f0 00 00 00 mov 0xf0(%rbp),%rax
+ 6733af: c5 f8 28 c8 vmovaps %xmm0,%xmm1
+ 6733b3: c5 f8 28 d6 vmovaps %xmm6,%xmm2
+ 6733b7: c5 f8 28 c3 vmovaps %xmm3,%xmm0
+ 6733bb: 4c 8b 5d 00 mov 0x0(%rbp),%r11
+ 6733bf: 4e 8b 0c 20 mov (%rax,%r12,1),%r9
+ 6733c3: 48 8b 85 b8 00 00 00 mov 0xb8(%rbp),%rax
+ 6733ca: 4e 8b 04 20 mov (%rax,%r12,1),%r8
+ 6733ce: 48 8b 85 80 00 00 00 mov 0x80(%rbp),%rax
+ 6733d5: 49 8d 14 19 lea (%r9,%rbx,1),%rdx
+ 6733d9: 4a 8b 3c 20 mov (%rax,%r12,1),%rdi
+ 6733dd: 49 8d 0c 18 lea (%r8,%rbx,1),%rcx
+ 6733e1: 48 01 fb add %rdi,%rbx
+ 6733e4: e9 47 fe ff ff jmpq 673230 <rtengine::DCPProfile::apply(rtengine::Imagefloat*, int, Glib::ustring const&, rtengine::ColorTemp const&, std::array<double, 3ul> const&, std::array<std::array<double, 3ul>, 3ul> const&, bool) const [clone ._omp_fn.1]+0x2b0>
+ 6733e9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
+ 6733f0: 48 8b 85 f0 00 00 00 mov 0xf0(%rbp),%rax
...
Best,
Fl枚ssie
Are there any objections to push the change?
UNLIKELY(). 馃榿
None from me.
Most helpful comment
UNLIKELY(). 馃榿