Moviepy: Some advice for improve CompositeVideoClip TextClip and the blit

Created on 14 Apr 2020 · 22Comments · Source: Zulko/moviepy

Since I was been stumped with the perfermance of my edit, I try to figure out the reason that slow down the program, and luckly I found couple of them.
The first one is CompositeVideoClip, its blit all clip for every frame,so the speed is decreasing as the number of clips increased, speed comes 1s/it when there are 20 clips needs to be composited, one solution is using PIL.Image.paste, it much faster, create a bg image object and paste other clip on it, it's really fast,came up to 30it/s
second one is blit func, I think PIL Image.paste is useful too.
And last one is Textclip, I use PIL imagefont to gen text image, Its much faster than imagemagisk,maybe it is possible to escape imagemagisk.

3rd-party feature-request imagemagick images performance

Source

ODtian

Most helpful comment

I use a day to rewrite the CompositiveVideoClip and it's fruitful.

In moviepy.video.compositing.CompositeVideoClip.CompositeVideoClip change the make_frame func:

        def make_frame(t):
            full_w, full_h = self.bg.size
            f = self.bg.get_frame(t).astype('uint8')
            bg_im = Image.fromarray(f)
            for c in self.playing_clips(t):
                img, pos, mask, ismask = c.new_blit_on(t, f)

                x, y = pos
                w, h = c.size

                out_x = x < -w or x == full_w
                out_y = y < -h or y == full_h

                if out_x and out_y:
                    continue

                pos = (int(round(min(max(-w, x), full_w))),
                       int(round(min(max(-h, y), full_h))))

                paste_im = Image.fromarray(img)

                if mask is not None:
                    mask_im = Image.fromarray(255 * mask).convert('L')
                    bg_im.paste(paste_im, pos, mask_im)
                else:
                    bg_im.paste(paste_im, pos)

            result_frame = np.array(bg_im)

            return result_frame.astype('uint8') if (not ismask) else result_frame

In moviepy.video.VideoClip.VideoClip add a new method new_blit_on:

    def new_blit_on(self, t, picture):
        hf, wf = framesize = picture.shape[:2]

        if self.ismask and picture.max() != 0:
            return np.minimum(1, picture + self.blit_on(np.zeros(framesize), t))

        ct = t - self.start  # clip time

        # GET IMAGE AND MASK IF ANY

        img = self.get_frame(ct)
        mask = (None if (self.mask is None) else
                self.mask.get_frame(ct))
        if mask is not None:
            if (img.shape[0] != mask.shape[0]) or (img.shape[1] != mask.shape[1]):
                img = self.fill_array(img, mask.shape)
        hi, wi = img.shape[:2]

        # SET POSITION

        pos = self.pos(ct)

        # preprocess short writings of the position
        if isinstance(pos, str):
            pos = {'center': ['center', 'center'],
                   'left': ['left', 'center'],
                   'right': ['right', 'center'],
                   'top': ['center', 'top'],
                   'bottom': ['center', 'bottom']}[pos]
        else:
            pos = list(pos)

        # is the position relative (given in % of the clip's size) ?
        if self.relative_pos:
            for i, dim in enumerate([wf, hf]):
                if not isinstance(pos[i], str):
                    pos[i] = dim * pos[i]

        if isinstance(pos[0], str):
            D = {'left': 0, 'center': (wf - wi) / 2, 'right': wf - wi}
            pos[0] = D[pos[0]]

        if isinstance(pos[1], str):
            D = {'top': 0, 'center': (hf - hi) / 2, 'bottom': hf - hi}
            pos[1] = D[pos[1]]

        # pos = map(int, pos)
        return img, pos, mask, self.ismask

I use PIL instead the blit func so that the performance of CompositeVideoClip is greatly improved, up to 25 times, and performs well in large numbers of clips.

I also write some code to gen text by PIL:

from PIL import Image, ImageFont, ImageDraw
import numpy as np

font = 'xxx'
leter = 'xxx'
color = '#xxxxxx'

pilfont = ImageFont.truetype(font=font, size=fontsize)
charsize = pilfont.getsize(letter)
bg_img = Image.new('RGB', charsize, color)
mask_img = Image.new('L', bg_img.size, 0)
draw = ImageDraw.Draw(mask_img)
draw.text((0, 0), letter, font=pilfont, fill='white')
bg_img.putalpha(mask_img)
clip = ImageClip(np.array(bg_img))

I'm glad that the speed is no longer an issue!

ODtian on 15 Apr 2020

❤2

All 22 comments

Thank you for that investigation. Certainly I'd be up for depending more on pillow (and getting rid of/reducing need for imagemagick). There's also https://gist.github.com/Zulko/e072d78dd5dbd2458f34d2166265e081#file-text_clip_with_gizeh-py that I found, that may be even faster for text.

tburrows13 on 14 Apr 2020

thanks for your share!actully I make a simple grid layout manager to design text animation, with single char,and that's why I need fast text gen speed.I love this and I will keep working on it.

ODtian on 14 Apr 2020

For context, off the top of my head:

ImageMagick:
- Advantage: nice text formatting, has access to the system fonts. Just historically my first choice.
- inconvenients: difficult to install, slow.
Pillow:
- Advantages: easy to install, fast
- inconvenients: doesnt have access to system fonts, you need to provide a path to a ttf (?)
Gizeh/Cairo
- Advantages: fast, access to system fonts
- Inconvenients: Cairo not streightforward to install on some systems.

Zulko on 14 Apr 2020

I use a day to rewrite the CompositiveVideoClip and it's fruitful.

In moviepy.video.compositing.CompositeVideoClip.CompositeVideoClip change the make_frame func:

        def make_frame(t):
            full_w, full_h = self.bg.size
            f = self.bg.get_frame(t).astype('uint8')
            bg_im = Image.fromarray(f)
            for c in self.playing_clips(t):
                img, pos, mask, ismask = c.new_blit_on(t, f)

                x, y = pos
                w, h = c.size

                out_x = x < -w or x == full_w
                out_y = y < -h or y == full_h

                if out_x and out_y:
                    continue

                pos = (int(round(min(max(-w, x), full_w))),
                       int(round(min(max(-h, y), full_h))))

                paste_im = Image.fromarray(img)

                if mask is not None:
                    mask_im = Image.fromarray(255 * mask).convert('L')
                    bg_im.paste(paste_im, pos, mask_im)
                else:
                    bg_im.paste(paste_im, pos)

            result_frame = np.array(bg_im)

            return result_frame.astype('uint8') if (not ismask) else result_frame

In moviepy.video.VideoClip.VideoClip add a new method new_blit_on:

    def new_blit_on(self, t, picture):
        hf, wf = framesize = picture.shape[:2]

        if self.ismask and picture.max() != 0:
            return np.minimum(1, picture + self.blit_on(np.zeros(framesize), t))

        ct = t - self.start  # clip time

        # GET IMAGE AND MASK IF ANY

        img = self.get_frame(ct)
        mask = (None if (self.mask is None) else
                self.mask.get_frame(ct))
        if mask is not None:
            if (img.shape[0] != mask.shape[0]) or (img.shape[1] != mask.shape[1]):
                img = self.fill_array(img, mask.shape)
        hi, wi = img.shape[:2]

        # SET POSITION

        pos = self.pos(ct)

        # preprocess short writings of the position
        if isinstance(pos, str):
            pos = {'center': ['center', 'center'],
                   'left': ['left', 'center'],
                   'right': ['right', 'center'],
                   'top': ['center', 'top'],
                   'bottom': ['center', 'bottom']}[pos]
        else:
            pos = list(pos)

        # is the position relative (given in % of the clip's size) ?
        if self.relative_pos:
            for i, dim in enumerate([wf, hf]):
                if not isinstance(pos[i], str):
                    pos[i] = dim * pos[i]

        if isinstance(pos[0], str):
            D = {'left': 0, 'center': (wf - wi) / 2, 'right': wf - wi}
            pos[0] = D[pos[0]]

        if isinstance(pos[1], str):
            D = {'top': 0, 'center': (hf - hi) / 2, 'bottom': hf - hi}
            pos[1] = D[pos[1]]

        # pos = map(int, pos)
        return img, pos, mask, self.ismask

I use PIL instead the blit func so that the performance of CompositeVideoClip is greatly improved, up to 25 times, and performs well in large numbers of clips.

I also write some code to gen text by PIL:

from PIL import Image, ImageFont, ImageDraw
import numpy as np

font = 'xxx'
leter = 'xxx'
color = '#xxxxxx'

pilfont = ImageFont.truetype(font=font, size=fontsize)
charsize = pilfont.getsize(letter)
bg_img = Image.new('RGB', charsize, color)
mask_img = Image.new('L', bg_img.size, 0)
draw = ImageDraw.Draw(mask_img)
draw.text((0, 0), letter, font=pilfont, fill='white')
bg_img.putalpha(mask_img)
clip = ImageClip(np.array(bg_img))

I'm glad that the speed is no longer an issue!

ODtian on 15 Apr 2020

❤2

That sounds amazing @ODtian. I'm following this with great interest.

zamirkhan on 16 Apr 2020

Very cool, @ODtian. It would be great if you could turn those 2 examples each into their own pull request. If not, I'm sure someone else would be able to, and then we can properly test/discuss/compare them.

tburrows13 on 17 Apr 2020

@ODtian for what it's worth, I tried your modification but was getting an exception from Image.fromarray complaining about "Cannot handle this data type".

zamirkhan on 17 Apr 2020

Very cool, @ODtian. It would be great if you could turn those 2 examples each into their own pull request. If not, I'm sure someone else would be able to, and then we can properly test/discuss/compare them.

Sure, I will try to do that.

ODtian on 20 Apr 2020

@ODtian for what it's worth, I tried your modification but was getting an exception from Image.fromarray complaining about "Cannot handle this data type".

Yes, the problem is probably due to when the convert np array convert to the PIL.Image type, you can try to remove .astype('uint8') from f = self.bg.get_frame(t).astype('uint8').

ODtian on 20 Apr 2020

Linking #1157

tburrows13 on 3 May 2020

Hey, I’ve created a proof of concept for the new TextClip: https://github.com/tburrows13/moviepy/blob/redo-textclip/moviepy/video/NewTextClip.py

tburrows13 on 4 May 2020

👍1

Hey @tburrows13 The new TextClip works good. However, the CompositeVideoClip speed is still the same. For a 40 sec video it took 2:30 minutes to render.

MittalShruti on 14 Jun 2020

Hi @MittalShruti, for the CompositeVideoClip, try not to nest it, so that there will be only one background PIL Image object, everything will work in PIL and convert nothing to frame array(takes plenty of time) until the frames are finally been rendered. By the way, the code I paste here is old version, for the newest and which passes all test, you can find it here -> #1157

ODtian on 21 Jun 2020

Hey, I’ve created a proof of concept for the new TextClip: https://github.com/tburrows13/moviepy/blob/redo-textclip/moviepy/video/NewTextClip.py

I like it.

ODtian on 21 Jun 2020

Hi @MittalShruti, for the CompositeVideoClip, try not to nest it, so that there will be only one background PIL Image object, everything will work in PIL and convert nothing to frame array(takes plenty of time) until the frames are finally been rendered. By the way, the code I paste here is old version, for the newest and which passes all test, you can find it here -> #1157
HI @ODtian, I try to apply #1157，but the image effect is different...

before
4873-2ce59a99-86e1-49f3-8a2e-554d81d3fcfd-ce59a9

after
A0CA91F4-0348-4f2c-A75A-BD9BB40DEA01

clotyxf on 29 Jun 2020

@clotyxf can I have a look your code? That probably could point out what the problems is.

ODtian on 29 Jun 2020

@clotyxf can I have a look your code? That probably could point out what the problems is.

@ODtian

txt_clip = mpy.TextClip('Which\ncomes first?', font='Montserrat-Regular.otf', color='#333', fontsize=95, size=(596,266))
txt_clip = txt_clip.set_duration(3).set_fps(30)

bg_clip = mpy.ColorClip(color=(255, 255, 255), size=(1280, 720))
bg_clip = bg_clip.set_duration(3).set_fps(30)

final_clip = mpy.CompositeVideoClip([bg_clip, txt_clip.set_position((374,221))], use_bgclip=True)
final_clip.write_videofile('test.mp4')

I deploy the new environment.
moviepy: 1.0.3
python: 3.6
pillow: 7.1.2

clotyxf on 29 Jun 2020

That's really a issue comes from CompositeVideoClip, probably due to pil's update (pil 7.1.0), it change the action which handle pasting thing. I will try to fix that.

ODtian on 29 Jun 2020

@clotyxf I'm sorry of the late reply (due to github), I've update the code and it returns correct result on my pc, please clone the newest code and run it again, and tell me if it works.

ODtian on 29 Jun 2020

@clotyxf actually the fix of pillow image pasting haven't been released, but merged.
this fix will allow to paste transparent background image to an other transparent background image, but for now, it can't, and what we can do is to wait. If you want to fix this now, make sure you have a non-transparent background at the bottom layer, therefore nothing will be pasted on a transparent image.

ODtian on 29 Jun 2020

@clotyxf actually the fix of pillow image pasting haven't been released, but merged.
this fix will allow to paste transparent background image to an other transparent background image, but for now, it can't, and what we can do is to wait. If you want to fix this now, make sure you have a non-transparent background at the bottom layer, therefore nothing will be pasted on a transparent image.

@ODtian thx

clotyxf on 30 Jun 2020

Closing as changes to CompositeVideoClip and blit have been merged in #1342 and TextClip changes are being handled in #1472.

tburrows13 on 21 Jan 2021

Was this page helpful?

0 / 5 - 0 ratings