Moviepy: Some advice for improve CompositeVideoClip TextClip and the blit

Created on 14 Apr 2020  路  22Comments  路  Source: Zulko/moviepy

Since I was been stumped with the perfermance of my edit, I try to figure out the reason that slow down the program, and luckly I found couple of them.
The first one is CompositeVideoClip, its blit all clip for every frame,so the speed is decreasing as the number of clips increased, speed comes 1s/it when there are 20 clips needs to be composited, one solution is using PIL.Image.paste, it much faster, create a bg image object and paste other clip on it, it's really fast,came up to 30it/s
second one is blit func, I think PIL Image.paste is useful too.
And last one is Textclip, I use PIL imagefont to gen text image, Its much faster than imagemagisk,maybe it is possible to escape imagemagisk.

3rd-party feature-request imagemagick images performance

Most helpful comment

I use a day to rewrite the CompositiveVideoClip and it's fruitful.

In moviepy.video.compositing.CompositeVideoClip.CompositeVideoClip change the make_frame func:

        def make_frame(t):
            full_w, full_h = self.bg.size
            f = self.bg.get_frame(t).astype('uint8')
            bg_im = Image.fromarray(f)
            for c in self.playing_clips(t):
                img, pos, mask, ismask = c.new_blit_on(t, f)

                x, y = pos
                w, h = c.size

                out_x = x < -w or x == full_w
                out_y = y < -h or y == full_h

                if out_x and out_y:
                    continue

                pos = (int(round(min(max(-w, x), full_w))),
                       int(round(min(max(-h, y), full_h))))

                paste_im = Image.fromarray(img)

                if mask is not None:
                    mask_im = Image.fromarray(255 * mask).convert('L')
                    bg_im.paste(paste_im, pos, mask_im)
                else:
                    bg_im.paste(paste_im, pos)

            result_frame = np.array(bg_im)

            return result_frame.astype('uint8') if (not ismask) else result_frame

In moviepy.video.VideoClip.VideoClip add a new method new_blit_on:

    def new_blit_on(self, t, picture):
        hf, wf = framesize = picture.shape[:2]

        if self.ismask and picture.max() != 0:
            return np.minimum(1, picture + self.blit_on(np.zeros(framesize), t))

        ct = t - self.start  # clip time

        # GET IMAGE AND MASK IF ANY

        img = self.get_frame(ct)
        mask = (None if (self.mask is None) else
                self.mask.get_frame(ct))
        if mask is not None:
            if (img.shape[0] != mask.shape[0]) or (img.shape[1] != mask.shape[1]):
                img = self.fill_array(img, mask.shape)
        hi, wi = img.shape[:2]

        # SET POSITION

        pos = self.pos(ct)

        # preprocess short writings of the position
        if isinstance(pos, str):
            pos = {'center': ['center', 'center'],
                   'left': ['left', 'center'],
                   'right': ['right', 'center'],
                   'top': ['center', 'top'],
                   'bottom': ['center', 'bottom']}[pos]
        else:
            pos = list(pos)

        # is the position relative (given in % of the clip's size) ?
        if self.relative_pos:
            for i, dim in enumerate([wf, hf]):
                if not isinstance(pos[i], str):
                    pos[i] = dim * pos[i]

        if isinstance(pos[0], str):
            D = {'left': 0, 'center': (wf - wi) / 2, 'right': wf - wi}
            pos[0] = D[pos[0]]

        if isinstance(pos[1], str):
            D = {'top': 0, 'center': (hf - hi) / 2, 'bottom': hf - hi}
            pos[1] = D[pos[1]]

        # pos = map(int, pos)
        return img, pos, mask, self.ismask

I use PIL instead the blit func so that the performance of CompositeVideoClip is greatly improved, up to 25 times, and performs well in large numbers of clips.

I also write some code to gen text by PIL:

from PIL import Image, ImageFont, ImageDraw
import numpy as np

font = 'xxx'
leter = 'xxx'
color = '#xxxxxx'

pilfont = ImageFont.truetype(font=font, size=fontsize)
charsize = pilfont.getsize(letter)
bg_img = Image.new('RGB', charsize, color)
mask_img = Image.new('L', bg_img.size, 0)
draw = ImageDraw.Draw(mask_img)
draw.text((0, 0), letter, font=pilfont, fill='white')
bg_img.putalpha(mask_img)
clip = ImageClip(np.array(bg_img))

I'm glad that the speed is no longer an issue!

All 22 comments

Thank you for that investigation. Certainly I'd be up for depending more on pillow (and getting rid of/reducing need for imagemagick). There's also https://gist.github.com/Zulko/e072d78dd5dbd2458f34d2166265e081#file-text_clip_with_gizeh-py that I found, that may be even faster for text.

thanks for your share!actully I make a simple grid layout manager to design text animation, with single char,and that's why I need fast text gen speed.I love this and I will keep working on it.

For context, off the top of my head:

  • ImageMagick:

    • Advantage: nice text formatting, has access to the system fonts. Just historically my first choice.

    • inconvenients: difficult to install, slow.

  • Pillow:

    • Advantages: easy to install, fast

    • inconvenients: doesnt have access to system fonts, you need to provide a path to a ttf (?)

  • Gizeh/Cairo

    • Advantages: fast, access to system fonts

    • Inconvenients: Cairo not streightforward to install on some systems.

I use a day to rewrite the CompositiveVideoClip and it's fruitful.

In moviepy.video.compositing.CompositeVideoClip.CompositeVideoClip change the make_frame func:

        def make_frame(t):
            full_w, full_h = self.bg.size
            f = self.bg.get_frame(t).astype('uint8')
            bg_im = Image.fromarray(f)
            for c in self.playing_clips(t):
                img, pos, mask, ismask = c.new_blit_on(t, f)

                x, y = pos
                w, h = c.size

                out_x = x < -w or x == full_w
                out_y = y < -h or y == full_h

                if out_x and out_y:
                    continue

                pos = (int(round(min(max(-w, x), full_w))),
                       int(round(min(max(-h, y), full_h))))

                paste_im = Image.fromarray(img)

                if mask is not None:
                    mask_im = Image.fromarray(255 * mask).convert('L')
                    bg_im.paste(paste_im, pos, mask_im)
                else:
                    bg_im.paste(paste_im, pos)

            result_frame = np.array(bg_im)

            return result_frame.astype('uint8') if (not ismask) else result_frame

In moviepy.video.VideoClip.VideoClip add a new method new_blit_on:

    def new_blit_on(self, t, picture):
        hf, wf = framesize = picture.shape[:2]

        if self.ismask and picture.max() != 0:
            return np.minimum(1, picture + self.blit_on(np.zeros(framesize), t))

        ct = t - self.start  # clip time

        # GET IMAGE AND MASK IF ANY

        img = self.get_frame(ct)
        mask = (None if (self.mask is None) else
                self.mask.get_frame(ct))
        if mask is not None:
            if (img.shape[0] != mask.shape[0]) or (img.shape[1] != mask.shape[1]):
                img = self.fill_array(img, mask.shape)
        hi, wi = img.shape[:2]

        # SET POSITION

        pos = self.pos(ct)

        # preprocess short writings of the position
        if isinstance(pos, str):
            pos = {'center': ['center', 'center'],
                   'left': ['left', 'center'],
                   'right': ['right', 'center'],
                   'top': ['center', 'top'],
                   'bottom': ['center', 'bottom']}[pos]
        else:
            pos = list(pos)

        # is the position relative (given in % of the clip's size) ?
        if self.relative_pos:
            for i, dim in enumerate([wf, hf]):
                if not isinstance(pos[i], str):
                    pos[i] = dim * pos[i]

        if isinstance(pos[0], str):
            D = {'left': 0, 'center': (wf - wi) / 2, 'right': wf - wi}
            pos[0] = D[pos[0]]

        if isinstance(pos[1], str):
            D = {'top': 0, 'center': (hf - hi) / 2, 'bottom': hf - hi}
            pos[1] = D[pos[1]]

        # pos = map(int, pos)
        return img, pos, mask, self.ismask

I use PIL instead the blit func so that the performance of CompositeVideoClip is greatly improved, up to 25 times, and performs well in large numbers of clips.

I also write some code to gen text by PIL:

from PIL import Image, ImageFont, ImageDraw
import numpy as np

font = 'xxx'
leter = 'xxx'
color = '#xxxxxx'

pilfont = ImageFont.truetype(font=font, size=fontsize)
charsize = pilfont.getsize(letter)
bg_img = Image.new('RGB', charsize, color)
mask_img = Image.new('L', bg_img.size, 0)
draw = ImageDraw.Draw(mask_img)
draw.text((0, 0), letter, font=pilfont, fill='white')
bg_img.putalpha(mask_img)
clip = ImageClip(np.array(bg_img))

I'm glad that the speed is no longer an issue!

That sounds amazing @ODtian. I'm following this with great interest.

Very cool, @ODtian. It would be great if you could turn those 2 examples each into their own pull request. If not, I'm sure someone else would be able to, and then we can properly test/discuss/compare them.

@ODtian for what it's worth, I tried your modification but was getting an exception from Image.fromarray complaining about "Cannot handle this data type".

Very cool, @ODtian. It would be great if you could turn those 2 examples each into their own pull request. If not, I'm sure someone else would be able to, and then we can properly test/discuss/compare them.

Sure, I will try to do that.

@ODtian for what it's worth, I tried your modification but was getting an exception from Image.fromarray complaining about "Cannot handle this data type".

Yes, the problem is probably due to when the convert np array convert to the PIL.Image type, you can try to remove .astype('uint8') from f = self.bg.get_frame(t).astype('uint8').

Linking #1157

Hey, I鈥檝e created a proof of concept for the new TextClip: https://github.com/tburrows13/moviepy/blob/redo-textclip/moviepy/video/NewTextClip.py

Hey @tburrows13 The new TextClip works good. However, the CompositeVideoClip speed is still the same. For a 40 sec video it took 2:30 minutes to render.

Hi @MittalShruti, for the CompositeVideoClip, try not to nest it, so that there will be only one background PIL Image object, everything will work in PIL and convert nothing to frame array(takes plenty of time) until the frames are finally been rendered. By the way, the code I paste here is old version, for the newest and which passes all test, you can find it here -> #1157

Hey, I鈥檝e created a proof of concept for the new TextClip: https://github.com/tburrows13/moviepy/blob/redo-textclip/moviepy/video/NewTextClip.py

I like it.

Hi @MittalShruti, for the CompositeVideoClip, try not to nest it, so that there will be only one background PIL Image object, everything will work in PIL and convert nothing to frame array(takes plenty of time) until the frames are finally been rendered. By the way, the code I paste here is old version, for the newest and which passes all test, you can find it here -> #1157
HI @ODtian, I try to apply #1157锛宐ut the image effect is different...

before
4873-2ce59a99-86e1-49f3-8a2e-554d81d3fcfd-ce59a9

after
A0CA91F4-0348-4f2c-A75A-BD9BB40DEA01

@clotyxf can I have a look your code? That probably could point out what the problems is.

@clotyxf can I have a look your code? That probably could point out what the problems is.

@ODtian

txt_clip = mpy.TextClip('Which\ncomes first?', font='Montserrat-Regular.otf', color='#333', fontsize=95, size=(596,266))
txt_clip = txt_clip.set_duration(3).set_fps(30)

bg_clip = mpy.ColorClip(color=(255, 255, 255), size=(1280, 720))
bg_clip = bg_clip.set_duration(3).set_fps(30)

final_clip = mpy.CompositeVideoClip([bg_clip, txt_clip.set_position((374,221))], use_bgclip=True)
final_clip.write_videofile('test.mp4')

I deploy the new environment.
moviepy: 1.0.3
python: 3.6
pillow: 7.1.2

That's really a issue comes from CompositeVideoClip, probably due to pil's update (pil 7.1.0), it change the action which handle pasting thing. I will try to fix that.

@clotyxf I'm sorry of the late reply (due to github), I've update the code and it returns correct result on my pc, please clone the newest code and run it again, and tell me if it works.

@clotyxf actually the fix of pillow image pasting haven't been released, but merged.
this fix will allow to paste transparent background image to an other transparent background image, but for now, it can't, and what we can do is to wait. If you want to fix this now, make sure you have a non-transparent background at the bottom layer, therefore nothing will be pasted on a transparent image.

@clotyxf actually the fix of pillow image pasting haven't been released, but merged.
this fix will allow to paste transparent background image to an other transparent background image, but for now, it can't, and what we can do is to wait. If you want to fix this now, make sure you have a non-transparent background at the bottom layer, therefore nothing will be pasted on a transparent image.

@ODtian thx

Closing as changes to CompositeVideoClip and blit have been merged in #1342 and TextClip changes are being handled in #1472.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bobozar picture bobozar  路  4Comments

skizzy picture skizzy  路  3Comments

RahulPrasad picture RahulPrasad  路  4Comments

arianaa30 picture arianaa30  路  4Comments

Netherdrake picture Netherdrake  路  4Comments