Rawtherapee: Improve decoding speed of Nikon NEF files

Created on 25 Aug 2018  路  23Comments  路  Source: Beep6581/RawTherapee

Being a Nikon Shooter (D700), today I tried to improve decoding speed of Nikon 14 bit lossless compressed NEF files. In a first mockup I got a roughly 20% speedup for decoding (dcraw.cc function nikon_load_raw) measured on D800 files (though only when loading the files from a fast storage like e.g. SSD. Decoding a file which is loaded from an USB 2.0 connected drive will not get a speedup).

Any Nikon shooter here who likes to contribute to this issue?

file format performance enhancement

All 23 comments

I just created nikon_deode branch with a ~20% speedup for decoding of all nikon NEF files when loaded from a fast storage medium.

Nikon shooter here, so I like that you're working on a speedup! What kind of contribution are you looking for?

@Thanatomanic Tests would be very welcome.

Concerning processing time:
I always measure processing in queue 7 times and take the median of the 7 measures. This leads to robust results and (because of queue processing) avoids influence of progress bar on processing time.

@Thanatomanic Of course ideas to further reduce decoding time are welcome too :)

The decoding time for a 14-bit lossless compressed D850 NEF was ~900 ms before and now is ~700 ms at my system.

With last commit decoding time for a 14-bit lossless compressed D850 NEF is ~660 ms

With last commit decoding time for a 14-bit lossless compressed D850 NEF is ~635 ms

I tested with a files from a lot of different Nikon cameras using a lot of different encodings which are

12-bit lossy compressed
12-bit lossy compressed with split
12-bit lossless compressed
14-bit lossy compressed
14-bit lossy compressed with split
14-bit lossless compressed

All seem to work fine

As I'm out of ideas now to further redcude decoding time of NEF files, I would like to merge this branch into dev to get more tests asap.

Any objections?

Ingo

I can only take a look at this some other day this week, would that be okay? As I'm not familiar with the programmatic part of RAW decoding, I doubt that I can find another speedup, but I want to give it a go as an exercise.

@Thanatomanic Roel, take your time to look at the changes. Code review is very welcome too. In my experience, further speedup by me may be possible later after some weeks.....

FWIW, I also see ~20% (1200ms to 1000ms) improvement on 14-bit lossless uncompressed nef files.

@Lompik I didn't change anything for lossless uncompressed files, only for compressed files (lossless and lossy)

@heckflosse you're right. Actually my tests were on 14bit lossless compressed files according to my camera(d600) settings. There is no option for uncompressed. Same speedup gains with 14bit lossy compressed.

PS: I also see speedup with -O3 compiler flag, from 400ms(dev branch, down from 1200 above) to 300ms for the nikon_load_raw function.

@Lompik Thanks for testing and feedback :+1:

Finally did some testing, and there's indeed a clear speed-up, so that's wonderful @heckflosse
On my machine I get a median timing going from 399 ms to 294 ms.

(on a side-note, applying a profile on my regular HD is still dead slow for me (calls to nikon_load_raw take up to 50000 ms), but a direct export afterwards only takes 300 ms, so weird)

@Lompik Do you say without -O3 it needs 1200ms/1000ms and with -O3 it needs 400ms/300ms?
Well, fortunately -O3 is default in RT for release builds

Edit: -O3 is default for windows builds. If it's not default for non windows builds, we should make it default.

@Thanatomanic Roel, thanks for testing :+1: I will merge now to get even more tests

@heckflosse Please do. I'm staring at the code, trying to figure out what is going on in nikbithuff, but definitely not seeing a way to optimize further.

@Thanatomanic Concerning further speedup of nikon_load_raw() it should be possible to call a special nikbithuff function, which does not check for EOL for roughly the first half of the file when build is made using MMAP...

@heckflosse you understood right. My first attempt was not using release builds.

I just tested with -O2 which yield similar results to -O3. Most linux distros will build with -O2 via env vars fortunately. My distro's rawtherapee binaries seems as fast as with optimization on.

@Lompik

Most linux distros will build with -O2

Then we should add at least -ftree-vectorize for the case -O3 is not used

i believe that's what is being done L58 in cmakelist.txt. https://github.com/Beep6581/RawTherapee/blob/dev/CMakeLists.txt#L58

@Lompik Ah, yes, I remember vaguely.

I'm getting old... :( https://github.com/Beep6581/RawTherapee/commit/7507b74d6fe0668343732b6c468b5fd5b1ba5a6d

Was this page helpful?
0 / 5 - 0 ratings