Weasyprint: pytest fails on Windows 7

Created on 10 Mar 2018 · 4Comments · Source: Kozea/WeasyPrint

Running unit tests on a clean fresh clone reported lots of failures. Installing Ahem font reduced the failings to 22.

They can be grouped into 4 categories:

CairoError
No @font-face on Windows
Encoding and other platform specific issues
Rendering issues

After de-bugging to the best of my knowledge I was able to reduce the failing test number to 8.

Tested with Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) on Windows 7 (64 bits) and Cairo 1.15.6.
Updating Cairo to 1.15.10 didn't help a lot (passed +=1).

CairoError

Exploring the reason for the 55 occurences of

E cairocffi.CairoError: cairo returned 41: b'error occurred in the Windows Graphics Device Interface'

I finally spotted the culprit in test_api.py at line 635 in test_bookmarks():

assert_bookmarks('''
    <style>* { height: 10px; font-size: 0 }</style>
    ....

The font-size: 0 kills Pango and provokes the CairoErrors in subsquent unit tests.

Inside pytest the Pango-WARNING isn't visible. When I run the snippet outside of pytest I see

(python.exe:7200): Pango-WARNING **: scaled_font status is: error occurred in the Windows Graphics Device Interface

Why doesn't this happen on Linux machines? Don't know. On Windows it's probably related to the way how DLLs are loaded. I guess, Python loads Pango/Cairo only once at startup/when required...

In any case, replacing font-size: 0 with line-height:0 left me with 12 failures.

No `@font-face` on Windows

The 3 tests in test_fonts.py cannot pass on Windows because of:

UserWarning: @font-face is currently not supported on Windows

Out of curiosity I installed the weasyprint.otf from the ./resources/ folder and have the sneaking suspicion that this font must be loaded via special treatment (using FontConfig instead of Cairo) because otherwise not even the kernings and ligatures (being on by default) are respected, let alone turning them on and off via font-feature-settings or font-variant.

Thats how "kkliga", styled with font-family weasyprint, looks when rendered as PDF or in a M$Word 2010 document:
test_font_face
Red color applied to "liga" for clarification.

All the letters have the same shape/kerning/ligature, no wonder that the box.widths are bigger than expected.

> assert span1.width == 1.5 * 16
E assert 32.0 == (1.5 * 16)

Out of more curiosity I enabled @font-face by letting weasyprint/fonts.py load the libfontconfig-1.dll and libpangoft2-1.0-0.dll present in my GTK3.

Only drawback seems to be:
FontConfiguration can't cleanup the temporary files it created because the library has still open file handles when the object is __del__eted.
On Unix os.remove() a file that is in use works fine, on Windows a PermissionError is raised.

No, not the only drawback.
Although the tests in test_fonts.py now pass, 19 new failures emerge, failures seemingly unrelatet to @font-face, so I guess there's actually a good reason not to use libfontconfig/libpangoft2 on Windows.

Encoding and other platform specific issues

1. UnicodeDecodeError

Opening an utf-8 encoded file without specifying 'utf-8' fails.
In test_api.test_html_parsing

   with open(filename) as fd:
>       string = fd.read()
...
self = <encodings.cp1252.IncrementalDecoder object at 0x000000000732F828>
...
E  UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1445: character maps to <undefined>

If encoding is not specified, the default is platform dependent -- which in my western Windows case is 'cp1252'. Giving 'utf-8' solves the problem:

with open(filename, encoding='utf-8') as fd:

2. path2url on Windows

construction of base_url not as expected
unable to convert b'filename' to url

In test_api.test_html_parsing, test_api.test_command_line_render, test_api.test_unicode_filenames, test_pdf.test_embedded_files

Had a hard time finding the offending code -- it's Python's nturl2path.pathname2url() cutting off trailing path separators and being unable to handle binary b'path'.

Altered weasyprint/urls.py for the rescue:

def path2url(path):
    """Return file URL of `path`

    Quoting https://docs.python.org/3/library/sys.html:

    > sys.getfilesystemencoding()
    > Return the name of the encoding used to convert between Unicode
    > filenames and bytes filenames. For best compatibility, str should
    > be used for filenames in all cases, although representing filenames
    > as bytes is also supported.
    > Functions accepting or returning filenames should support either
    > str or bytes and internally convert to the system’s preferred
    > representation.

    Fact is: Windows specific nt2url2path.pathname2url only works with `str`
    Workaround: decode b'path' to filesytem's encoding to avoid
     - TypeError: can't concat str to bytes
     - TypeError: a bytes-like object is required, not 'str'
    """
    if not isinstance(path, str):
        # convert to `str`
        path = path.decode(sys.getfilesystemencoding())

    path = os.path.abspath(path)
    add_trailing_slash = os.path.isdir(path)
    if add_trailing_slash:
        # Make sure directory names have a trailing slash.
        # Otherwise relative URIs are resolved from the parent directory.
        # this, too, only works with `str`!
        path += os.path.sep

    path = pathname2url(path)
    # on Windows pathname2url cuts off trailing slash
    if add_trailing_slash and not path.endswith('/'):
        path += '/'
    if path.startswith('///'):
        # On Windows pathname2url(r'C:\foo') is apparently '///C:/foo'
        # That enough slashes already.
        return 'file:' + path
    else:
        return 'file://' + path

3. Filenames in Cairo on Windows

Cairo < 1.15.10 treats filenames as being in the current locale aka ANSI aka 'mcbs'. Python >= 3.6 defaults Windows filesystem encoding to UTF-8.
In test_api.test_unicode_filenames

filename = b'Unic\xc3\xb6d\xc3\xa9'

    def read_file(filename):
        """Shortcut for reading a file."""
>       with open(filename, 'rb') as fd:
E       FileNotFoundError: [Errno 2] No such file or directory: b'Unic\xc3\xb6d\xc3\xa9'

Indeed, there is no such file, no file named 'Unicödé'. Instead there is a file named 'UnicÃ¶dÃ©' (Huh, looks familiar to me).

The concerned filename is generated in cairocffi.surfaces._encode_filename() to generate a char[], suitable for Cairo.

Contemplating the mental state of C programmers and taking into account PEP 529, I finally came up with the following patch, suitable for all Cairo versions:

# cairocffi.surfaces.py

def _encode_filename(filename):
    """Return a byte string, encoding Unicode with the filesystem encoding.

    Experimental patch for Cairo on Windows:

    Since Python 3.6 the default filesystem encoding defaults to 'utf-8'.
    Apparently Cairo treats filenames as being in the current locale
    aka ANSI aka 'mcbs'.
    See PEP 529 https://www.python.org/dev/peps/pep-0529/
    Beware: Characters outside of the user's active code page!

    Update: Cairo >= V 1.15.10 uses UTF-8 filenames on Windows.
    """
    if not isinstance(filename, bytes):
        # Q: os.name == 'nt' ??
        if sys.platform.startswith('win'):
            if cairo.cairo_version() >= 11510:
                filename = filename.encode('utf-8')
            else:
                # not shure what's the best value for errors.
                # neither "?" nor "\" allowed in Windows filenames
                try:
                    filename = filename.encode('mbcs')
                except UnicodeEncodeError:
                    # any better idea?
                    filename = filename.encode('utf-8')
        else:
            filename = filename.encode(sys.getfilesystemencoding())
    return ffi.new('char[]', filename)

When using Cairo < 1.15.10 this patch makes test_unicode_filenames pass on my machine because the strange letters in 'Unicödé' exist in cp1252. With a 'Cyrillicкири́ллица' filename the test would probably succeed in Russia.

To bugfix this within WeasyPrint (instead of altering cairocffi) all code, calling cairo.ImageSurface with a str filename, should instead pass properly encoded bytes.

Rendering issues

1. Superfluous space

In test_draw.test_visibility

expected:                    got:
_ _ _ _ _ _ _ _ _ _ _ _      _ _ _ _ _ _ _ _ _ _ _ _
_ r B B B _ r B B B _ _      _ r B B B _ _ r B B B _
_ B B B B _ B B B B _ _      _ B B B B _ _ B B B B _
_ B B B B _ B B B B _ _      _ B B B B _ _ B B B B _
_ B B B B _ B B B B _ _      _ B B B B _ _ B B B B _
_ _ _ _ _ _ _ _ _ _ _ _      _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _      _ _ _ _ _ _ _ _ _ _ _ _

BTW: At the first, second and third glance my eyes didn't spot the difference between the PNGs, not even at maximum magnification.

That's the breeding html:

<div>
    <img src="pattern.png">
    <span><img src="pattern.png"></span>
<div>

Looking at the <div>s Linebox reveals that the first TextBox is not empty:

<InlineReplacedBox img>
<TextBox div> text = " "
<InlineBox span>
<TextBox div> text = ""

It's been a little while since I encountered such unexpected pixels in WebDesign. I usually switched to Zen-Design or removed the triggering linebreaks.
Indeed, the 1-pixel-relocation can be avoided by eliminating the linebreak after the first <img> tag, but that would probably contradict the test's purpose.

2. Shifted InlineBox

The box.position_y is greater than expexted.
In test_layout.test_linebox_positions

ref_position_y += line.height
assert ref_position_y == box.position_y
E assert 26.0 == 32.0
E +  where 32.0 = <InlineBox strong>.position_y

Further investigation revealed that it's alway and only the last <InlineBox strong>. Though, thanks to tests_ua.css no strong formatting is applied. And always 6 pixels too far down.

With a blue background applied to the <strong>, the failing html-snipped renders like this:

test_linebox_positions
The blue rectangle is the displaced InlineBox. Its child, the TextBox has the correct position_y. Confirmed in debugger.

3. Semitransparent pixels

Looks like Ahem has no chance to be rendered precisely.
In test_tables.test_table_vertical_align

table_vertical_align

The semitransparent and protruding pixels in the PNG are probably a result of Windows' ClearTyping or suchlike. Whatever the reason, assert_pixels_equal() is doomed to fail.

4. Broken fonts

Both, my Liberation Sans and my DejaVu Sans, seem to be broken when used with WeasyPrint. They work perfectly well when used in the browser or in M$Word...maybe another Cairo feature?

Liberation Sans
In test_layout.test_page_and_linebox_breaking

The snippet should produce 2 pages. Instead it produces the following ugly output:

test_page_and_linebox_breaking

After forcing another, working font, everything was fine.

DejaVu Sans
In test_layout.test_font_stretch

Similar issue with DejaVu Sans -- seems the renderer doesn't get the charcters' heights right. Pointless to expect a proper font-stretch.
Using e.g Lucida Console, font-stretch:condensed works, but the y-position is definitely wrong.

test_font_stretch
Applied blue background to the floating paragraphs.

5. Surprise!

Acid test fails.
In test_draw.test_acid2

acid2

Summary:

Most of the issues mentioned here can be circumvented. By avoiding special chars in filenames, only using proper fonts with simple properties and switching to Zen-Mode.

I suggest, pytest should skip the tests that require @font-face, when sys.platform.startswith('win').

Source

Tontyna

Most helpful comment

Couldn't resist to implemet @font-face for Windows, meaning: Implementing FontConfig and FreeType for font rendering. And guess what?

OMG

pytest passes with no errors.

OMG

Will create a pull request in the next days...

OMG

:heart: BEST BUG REPORT EVER :heart:

liZe on 13 Mar 2018

🎉3

All 4 comments

Couldn't resist to implemet @font-face for Windows, meaning: Implementing FontConfig and FreeType for font rendering. And guess what?

acid2

pytest passes with no errors.
Of course, I also applied the above-mentioned fixes for CairoError and encoding issuess.

Will create a pull request in the next days...

Tontyna on 13 Mar 2018

❤1 🎉1

Couldn't resist to implemet @font-face for Windows, meaning: Implementing FontConfig and FreeType for font rendering. And guess what?

OMG

pytest passes with no errors.

OMG

Will create a pull request in the next days...

OMG

:heart: BEST BUG REPORT EVER :heart:

liZe on 13 Mar 2018

🎉3

I think we can close that issue. With the current master branch and when Python is up-to-date (>= V 3.6) and Cairo is up-to-date (>= V 1.15.10) and has a working fontconfig then all pytests pass.

Maybe in documents.py another warning could be issued:

if sys.platform.startswith('win'):
   if cairo.cairo_version() < 11510 and sys.getfilesystemencoding() != 'utf-8':
       warnings.warn('expect funny Unicode filenames')

Tontyna on 2 May 2018

For the record: b5c1840 fixes the encoding problem in cairocffi.

liZe on 31 Jul 2018

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Add support for CSS Custom Properties

whitelynx · 5Comments

control over content in page margins

ivanprice · 3Comments

Changing header on reports by defining a html tag/id #request

bjornasm · 3Comments

Plans for semantic versioning

thejasechen · 3Comments

Repeating <thead> and <tfoot> on every page with border-collapse: collapse

SimonSapin · 4Comments

Weasyprint: pytest fails on Windows 7

CairoError

No @font-face on Windows

Encoding and other platform specific issues

1. UnicodeDecodeError

2. path2url on Windows

3. Filenames in Cairo on Windows

Rendering issues

1. Superfluous space

2. Shifted InlineBox

3. Semitransparent pixels

4. Broken fonts

5. Surprise!

Most helpful comment

All 4 comments

Related issues

No `@font-face` on Windows