Weasyprint: Unexpected horizontal whitespace

Created on 10 May 2018  Â·  30Comments  Â·  Source: Kozea/WeasyPrint

Running WeasyPrint 0.42.3 on Python 3.6.5 and Ubuntu 18.04, I occasionally see odd horizontal gaps in documents. Here's an example that's about as minimal as I can make it:

<!DOCTYPE html>

<html lang="en-US">
<head>
<meta charset="UTF-8">
<title>foo</title>
<style>

body
   {font-family: "Times New Roman";
    font-size: 12pt;}

</style></head><body>

<p>Xyzz, B. B., <b>Smith A.</b> <i>Hello world</i>.</p>

</body>
</html>

The generated PDF looks like this in Okular 1.3.3 (200% zoom):

Screenshot

Why is there a lot of whitespace after "Smith A."?

bug

All 30 comments

OMG! Is this another Pango?
Cant reproduce the gap. WeasyPrint 0.42.3, Python 3.6.4, Pango 1.40.11, Windows 7 (64bit)

Is "Times New Roman, 12pt" a prerequisite to produce the space?

The font-size doesn't seem to be necessary to reproduce the bug, but the font-family is. The gap still appears with Arial, but not with e.g. DejaVu Sans.

What happens when you change the font-family to

font-family: "Times New Roman, serif";

I'm almost shure that this is a Pango feature - what's the version of your Pango?

The gap's still there.

How do I tell what version of Pango I'm using? Does it have a Python package?

from weasyprint.text import pango

print('pango:', pango.pango_version())

Thanks, I got pango: 14014, which I guess would be 1.40.14.

Hmm... something weird with the fonts? "Arial" and "Times New Roman" sounds windowish to me...

Not shure whether I'll see anything, but could you upload the buggy PDF?

Sure: test.pdf

Analyzing your PDF I can see that after the bold Smith A. instead of just printing a space char and then the italic Hello world! (like my PDF does) there is a command to advance the output position to the right. That's the gap.

When WeasyPrint paginates the HTML it gives every TextBox an x- and y-position on the page. Those positions are calculated with the help of Pango/Cairo.
To get the horizontal position of Hello world! WeasyPrint asks Pango how wide the bold Mr. Smith is.

Looks to me as if your Pango gets the size of text formatted with "Times New Roman, bold" font wrong.

To verify or falsify this hypothesis, please execute the following snippet:

import weasyprint.css

from weasyprint.css.properties import INITIAL_VALUES
from weasyprint.text import split_first_line

_needsStyleDict = 'StyleDict' in dir(weasyprint.css)

def calc_text_width(text, font_family, font_weight=None):
    new_style = dict(INITIAL_VALUES)
    new_style['font_family'] = font_family.split(',')
    new_style['font_size'] = 12
    if _needsStyleDict:
        new_style = weasyprint.css.StyleDict(new_style)
    if font_weight == 'bold':
        new_style['font_weight'] = 700
    _, _, _, width, _, _ = split_first_line(
        text, new_style,
        context=None, max_width=None,
        justification_spacing=0)
    return width

def print_text_width(text, font_family):
    width = calc_text_width(text, font_family)
    print(font_family, 'normal:', width)
    width = calc_text_width(text, font_family, 'bold')
    print(font_family, '  bold:', width)
    print('')

snippet = 'Smith A.'
print_text_width(snippet, 'Times New Roman')
print_text_width(snippet, 'serif')
# print_text_width(snippet, 'DejaVu Sans')
# print_text_width(snippet, 'Arial')
# print_text_width(snippet, 'sans-serif')

Gives me:

Times New Roman normal: 43.341796875
Times New Roman   bold: 45.33984375

serif normal: 52.283203125
serif   bold: 57.85546875

I excpect your bold Times will be something around 90...

It looks like both roman and bold are large:

Times New Roman normal: 133.009765625
Times New Roman   bold: 135.0078125

serif normal: 52.283203125
serif   bold: 57.85546875

I can reproduce on Ubuntu 18.04.

Swapping the b and the i tags puts the space after the i tag, so the problem is not in the bold font.

I have the same problem with Arial (another MS font), but not with DejaVu or Liberation fonts.

OK, there's only a problem when the string A (space + upper a) is in the text. I suppose there's something wrong between MS kerning tables and Pango.

Reminds me of the rendering issues I had with the Linux fonts on Windows when my fontconfig was not active - see 4. Broken fonts in #587

No problem on Gentoo with the same fonts and the same Pango and Fontconfig versions.

No problem on Ubuntu when generating a PNG file instead of a PDF.

Edit: I can reproduce on Gentoo.

Sounds like the culprit isnt Pango but Cairo, or rather the cairo.PDFSurface.
No chance to work around that within WeasyPrint.

BTW: On my system the output of 'weasyprint.tests.test_acid2.test_acid2()` only looks perfect as PNG, when rendered as PDF the smiley is still disrupted.

On my system the output of 'weasyprint.tests.test_acid2.test_acid2()` only looks perfect as PNG, when rendered as PDF the smiley is still disrupted.

If you get something like that, it's normal (it's even explained in WeasyPrint's home page):
capture d ecran de 2018-05-13 21-27-41

Yes, that's how it looks like.

Furthermore it looks like it's Cairo that decides whether (instead of a simple Tj) a TJ operator is emitted in the PDF and what the amount of horizontal relocation should be. Couldnt find in the Cairo source code what algorithm it uses to calculate the glyphs' advances...
Let's hope Cairo asks Pango -- Pango's release frequency is higher :grimacing:

I'm reproducing this issue on Cairo 1.16.0.0, it does not occur in 1.15.10

As said in #770, the problem appears when we generate PDF documents including some Windows fonts (at least Arial and Times) with a text including special kerning pairs (' A' (space + A) in the original example).

The bug is probably either in the fonts, in Pango or in Cairo.

Sorry, I should be more clear. I'm trying to provide a workaround for people that arrive here looking for one.
I believe Cairo is the issue. I have not tested with the latest version of Cairo, but 1.15.10 does not suffer the issue. _(Changing the Pango version did not reproduce the error with special kerning pairs)_

I'm reproducing this issue on Cairo 1.16.0.0, it does not occur in 1.15.10

I've tried hard to reproduce on Gentoo using various 1.14.x, 1.15.x and 1.16.0 versions, but I can't. I've only reproduced on Ubuntu 18.04 with … cairo 1.15.10 (true story).

@MindFluid What's your OS (+distribution)?

I'm not convinced yet that the bug is in cairo.

It works on Arch too with cairo 1.16.0.
test.pdf

I am not reproducing error with Cairo 1.15.x on Ubuntu 18.04.1 LTS (bionic)
I am reproducing on Cairo 1.16.x on Alpine Linux v3.8

(sorry for the edits, I realised this is running in a docker container on server)

I am _not_ reproducing error with Cairo 1.15.x on Ubuntu 18.04.1 LTS (bionic)
I am reproducing on Cairo 1.16.x on Alpine Linux v3.8

For the record: I am reproducing with cairo 1.15.10 on Ubuntu 18.04.2 LTS.

table.pdf

I use exactly the same fonts everywhere.

I'm starting to think that it comes from some font-related configuration, not from a specific version of some library.

Yeah, currently I'm working around it on the product environment by avoiding references to MS Fonts -- actually just using a default fallback font of font-family: sans-serif 😂

I hope we can get to the bottom of this soon.

I've tried various versions, configurations and compilation options of freetype and fontconfig on my Gentoo box, but I've not been able to reproduce the bug.

I'm no longer seeing this bug. I'm now using WeasyPrint 47, Python 3.7.3, Pango 1.42.03, and Ubuntu 19.04.

I'm no longer seeing this bug. I'm now using WeasyPrint 47, Python 3.7.3, Pango 1.42.03, and Ubuntu 19.04.

Thanks for this information. I'll check that everything works fine on my Ubuntu installation, and close this issue if I'm not able to reproduce anymore.

It's still happening, at least for me. I had some monospace in a heading and was using Courier New. If I had the EXACT same heading twice in a row, only the second one had the spacing issue.

The problem is solved with Source Code Pro. I had already moved on last week when I saw what was going on, but I can change my template back and slap another PDF together if someone needs to see.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SimonSapin picture SimonSapin  Â·  4Comments

amarnav picture amarnav  Â·  5Comments

ajakubo1 picture ajakubo1  Â·  5Comments

elyak123 picture elyak123  Â·  3Comments

knyttl picture knyttl  Â·  4Comments