Sharp: Image compositing very slow

Created on 29 Apr 2019  路  4Comments  路  Source: lovell/sharp

Hey there :-),
I'm trying to render a lot of images, each containing about ~256 images itself, using sharp's composite.
I'm collecting all the input images and their top-/ left-coordinates using into an array, and passing that into _sharp.composite_ afterwards.
The problem is that sharp seems to be _very_ slow when compositing. Again, it's about ~256 input images into a 256x256px png (every base image is 16x16)px.
I tried some different image libraries.
Using Jimp I got about 8 images/ second, Mapbox got me about 50 images/ second, but sharp only gives me 1 to 2 images/ second. I'm using the same base code to collect the images for every different image library.

I was wondering, is sharp's strength only in resizing images or am I doing something wrong? Is there something I can configure to improve performance?
Otherwise, sharp is more than perfect and does everything I want, so I would want to reduce the list of dependencies and just stick to sharp...

I hope you have an amazing week,
clarkx86 :)

question

Most helpful comment

Hello, the x/y offset feature was added to composite in 8.7, and implemented in the simplest way you can imagine. It's been rewritten for 8.8 (due in a few weeks) and should be quicker now.

I made a tiny benchmark in Python:

#!/usr/bin/python3

import sys
import random
import pyvips

bg = pyvips.Image.new_from_file(sys.argv[1])
fg = [pyvips.Image.new_from_file(filename) for filename in sys.argv[2:]]
xes = [random.randint(0, bg.width - 16) for i in range(len(fg))]
yes = [random.randint(0, bg.height - 16) for i in range(len(fg))]

for i in range(100):
    bg.composite(fg, "over", x=xes, y=yes).copy_memory()

That's compositing args 2+ on top of arg1 100 times. I can run it like this:

$ vips crop ~/pics/wtc.jpg base.jpg 0 0 256 256
$ vips crop ~/pics/PNG_transparency_demonstration_1.png x.png 150 150 16 16 
$ for i in {0..256}; do cp x.png $i.png; done
$ time ../composite-many.py base.jpg *.png
real    0m9.067s
user    0m15.710s
sys 0m0.407s

So about 11 per second. With 8.7, I see:

$ time ../composite-many.py base.jpg *.png
real    0m17.563s
user    0m42.677s
sys 0m0.768s

About 6 per second.

You'll see a larger speedup with larger images -- 256x256 is too small for the libvips threading system to be very effective.

All 4 comments

Hi, what you describe here sounds more like stitching images without overlap rather than compositing so the proposed feature of #1580 might better provide what you're looking for.

Hello, the x/y offset feature was added to composite in 8.7, and implemented in the simplest way you can imagine. It's been rewritten for 8.8 (due in a few weeks) and should be quicker now.

I made a tiny benchmark in Python:

#!/usr/bin/python3

import sys
import random
import pyvips

bg = pyvips.Image.new_from_file(sys.argv[1])
fg = [pyvips.Image.new_from_file(filename) for filename in sys.argv[2:]]
xes = [random.randint(0, bg.width - 16) for i in range(len(fg))]
yes = [random.randint(0, bg.height - 16) for i in range(len(fg))]

for i in range(100):
    bg.composite(fg, "over", x=xes, y=yes).copy_memory()

That's compositing args 2+ on top of arg1 100 times. I can run it like this:

$ vips crop ~/pics/wtc.jpg base.jpg 0 0 256 256
$ vips crop ~/pics/PNG_transparency_demonstration_1.png x.png 150 150 16 16 
$ for i in {0..256}; do cp x.png $i.png; done
$ time ../composite-many.py base.jpg *.png
real    0m9.067s
user    0m15.710s
sys 0m0.407s

So about 11 per second. With 8.7, I see:

$ time ../composite-many.py base.jpg *.png
real    0m17.563s
user    0m42.677s
sys 0m0.768s

About 6 per second.

You'll see a larger speedup with larger images -- 256x256 is too small for the libvips threading system to be very effective.

I remembered one more possibility: composite uses g++ vector arithmetic to generate SIMD code. If your gcc is too old of if your CPU does not support 4xfloat SIMD, it may not work and it'll fall back to a slower vanilla C path.

Check for this in configure output:

checking for gcc with working vector support... yes
checking for C++ vector shuffle... yes
checking for C++ vector arithmetic... yes
checking for C++ signed constants in vector templates... yes

And @lovell is correct of course, arrayjoin will be much faster if you are simply making a grid of small images.

Was this page helpful?
0 / 5 - 0 ratings