Sharp: GPU acceleration

Created on 11 Dec 2018  路  7Comments  路  Source: lovell/sharp

I've gotten OpenCL accleration for imagick and can expect upto 6x faster transformations of that library using my GPU on mediocre hardware (I don't have mediocre hardware.) Seeing as this binary provides about a 6x speedup on just the CPU I was wondering if there were similar GPU accelerated gains to be had?

question

Most helpful comment

You're right, a GPU JPG decode / encode implementation would change the tradeoffs completely and make a GPU implementation of resize very worthwhile.

libvips usually uses libjpeg-turbo, the fastest open source decode/encode there is (I think). It does a 4k video frame in about 40ms, so perhaps 10x slower than that GPU version you linked.

There are security considerations too. libjpeg-turbo has been tested carefully and is pretty safe against malicious files. It would probably take a few years for a new GPU implementation to settle down enough for people to feel comfortable deploying it.

All 7 comments

sorry I didn't read other issues.

For the future reference of others who find this, please see #1202. The tl;dr is that image (de)coding I/O is likely to be the bottleneck.

Well I run NVMe SSDs in raid 0, with 4266 MHz ram on a Ryzen 5 2600 [email protected] on all cores. IO isn't a problem. For people like me who have several 290x video cards using AMDs heterogeneous system architecture can we expect any support for our hardware?

Hello, it's not the physical IO (as you say, this is very fast on a modern machine), it's the single-threaded, no GPU, image read-write library.

There's an example on the issue linked to the linked issue:

$ vipsheader wtc.jpg 
wtc.jpg: 9372x9372 uchar, 3 bands, srgb, jpegload
$ time vips copy wtc.jpg x.jpg
real    0m0.872s
user    0m1.496s
sys 0m0.048s

So 870ms to read, decompress, copy (zero time) and recompress a 10,000 x 10,000 pixel JPEG image.

Now try running a simple operation on the same image:

$ more avg.mat 
3 3 9 0 
1 1 1
1 1 1
1 1 1
$ time vips conv wtc.jpg x.jpg avg.mat 
real    0m1.073s
user    0m2.468s
sys 0m0.076s

Now it's 1070ms. So even if GPU processing were instant, which it is not, using a GPU could only give a maximum speedup of 200ms, or about 20%. Most time is being spent in image read and write, where the GPU cannot help.

Oh, I thought of one more thing, vips already runs the load, process and save in parallel. For example, you can read a JPG to memory, do nothing, then write it out again with imagemagick like this:

$ time convert wtc.jpg x.jpg
real    0m1.640s
user    0m1.403s
sys 0m0.236s

When vips does it, the decompress and the recompress are on separate threads and are overlapped:

$ time vips copy wtc.jpg x.jpg[optimize-coding]
real    0m0.902s
user    0m1.282s
sys 0m0.104s

(imagemagick enables optimize-coding by default)

You can see the total CPU time is similar, but libvips finishes in about half the wall clock time thanks to the way it can overlap IO.

@jcupitt I'm sure there is something wrong with what I'm understanding, but you are saying to read the image files headers is taking about 870ms to perform everything except actual manipulation. This GPU JPEG codec is handling that in less than 1ms and 2ms for the copy(on a 4k image rather than 10k). I know that hardware will vary the performance but that's a rather big difference?

Your comment suggests that the steps besides image manipulation need to be done off the GPU and from what I understand the linked codec handles that part on the GPU. So the savings would be much more useful if GPU equivalents for image format codecs were available?

You're right, a GPU JPG decode / encode implementation would change the tradeoffs completely and make a GPU implementation of resize very worthwhile.

libvips usually uses libjpeg-turbo, the fastest open source decode/encode there is (I think). It does a 4k video frame in about 40ms, so perhaps 10x slower than that GPU version you linked.

There are security considerations too. libjpeg-turbo has been tested carefully and is pretty safe against malicious files. It would probably take a few years for a new GPU implementation to settle down enough for people to feel comfortable deploying it.

Was this page helpful?
0 / 5 - 0 ratings