Pillow: Obtain Character-Level Bounding Boxes of Generated Text

Created on 27 Jun 2019  路  7Comments  路  Source: python-pillow/Pillow

I'm placing some text on an image as follows, using Python 3.7.3 and PIL 5.4.1:

from PIL import Image, ImageDraw, ImageFont

image = ...
font_filepath = ...
font_size = ...

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = ...   # generate random (x, y) coordinates to place the text at
text = ...

draw.text(xy, text, font=font)

I want to get the character-level bounding boxes around the text placed on the image. For example, if text = "hello", I want a list of five rectangles, each of which bounds a corresponding letter in "hello". For example, the bounding box for "l" should be thinner and taller than the bounding box for "o", and the bounding boxes for the two "l"s should have different x-positions and same y-positions.

I have investigated using:

size = font.getsize(text)
mask = font.getmask(text)

However, I don't know how to interpret mask, because:

  • mask is an ImagingCore object instead of an Image object
  • len(mask) does not even equal the area calculated by size[0] * size[1]

What is the easiest way to obtain character-level bounding boxes for text placed on an image by PIL?

Question

Most helpful comment

Since @radarhere solution is incorrect with multiple words like this:

out

I slightly modified it so that it can works on even more cases:

from PIL import Image, ImageDraw, ImageFont

image = Image.new("RGB", (500, 100))
font_filepath = "./template/arial.ttf"
font_size = 50

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = (40, 20)
text = "Ho脿ng T霉ng L芒m"

draw.text(xy, text, font=font)

for i, char in enumerate(text):
    bottom_1 = font.getsize(text[i])[1]
    right, bottom_2 = font.getsize(text[:i+1])
    bottom = bottom_1 if bottom_1 < bottom_2 else bottom_2
    width, height = font.getmask(char).size
    right += xy[0]
    bottom += xy[1]
    top = bottom - height
    left = right - width

    draw.rectangle((left, top, right, bottom), None, "#f00")

    draw.rectangle((left, top, right, bottom), None, "#f00")

image.save("out.png")

out

Hope this could help someone :P

All 7 comments

from PIL import Image, ImageDraw, ImageFont

image = Image.new("RGB", (200, 100))
font_filepath = "/Library/Fonts/Arial.ttf"
font_size = 50

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = (50, 20)
text = "hello"

draw.text(xy, text, font=font)

for char in text:
    print(font.getmask(char).size)

If it helps, here is where 'size' is defined for ImagingCore - https://github.com/python-pillow/Pillow/blob/292b4d038c1ba2b4cbf8aa02843acda656dc8a89/src/_imaging.c#L3388

Let us know if this doesn't answer your question, or if you have any further questions.

This gives me the size of each character's bounding box, but how do I retrieve the location of each character's bounding box?

from PIL import Image, ImageDraw, ImageFont

image = Image.new("RGB", (200, 100))
font_filepath = "/Library/Fonts/Arial.ttf"
font_size = 50

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = (40, 20)
text = "hello"

draw.text(xy, text, font=font)

for i, char in enumerate(text):
    right, bottom = font.getsize(text[:i+1])
    width, height = font.getmask(char).size
    right += xy[0]
    bottom += xy[1]
    top = bottom - height
    left = right - width

    draw.rectangle((left, top, right, bottom), None, "#f00")

image.save("out.png")

out

That makes sense, seems like you can just assume the boxes for each character are adjacent to each other and aligned along the bottom.

That answers my question. I'm closing this issue now.

Since @radarhere solution is incorrect with multiple words like this:

out

I slightly modified it so that it can works on even more cases:

from PIL import Image, ImageDraw, ImageFont

image = Image.new("RGB", (500, 100))
font_filepath = "./template/arial.ttf"
font_size = 50

draw = ImageDraw.Draw(image)
font = ImageFont.truetype(font_filepath, font_size)

xy = (40, 20)
text = "Ho脿ng T霉ng L芒m"

draw.text(xy, text, font=font)

for i, char in enumerate(text):
    bottom_1 = font.getsize(text[i])[1]
    right, bottom_2 = font.getsize(text[:i+1])
    bottom = bottom_1 if bottom_1 < bottom_2 else bottom_2
    width, height = font.getmask(char).size
    right += xy[0]
    bottom += xy[1]
    top = bottom - height
    left = right - width

    draw.rectangle((left, top, right, bottom), None, "#f00")

    draw.rectangle((left, top, right, bottom), None, "#f00")

image.save("out.png")

out

Hope this could help someone :P

The above is close, but not always correct for slanted fonts: see https://github.com/python-pillow/Pillow/issues/4789#issuecomment-659609574

Was this page helpful?
0 / 5 - 0 ratings

Related issues

steph-ben picture steph-ben  路  4Comments

indirectlylit picture indirectlylit  路  4Comments

Larivact picture Larivact  路  4Comments

anonymous530 picture anonymous530  路  3Comments

thinrhino picture thinrhino  路  3Comments