Running unit tests on a clean fresh clone reported lots of failures. Installing Ahem font reduced the failings to 22.
They can be grouped into 4 categories:
@font-face on WindowsAfter de-bugging to the best of my knowledge I was able to reduce the failing test number to 8.
Tested with Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) on Windows 7 (64 bits) and Cairo 1.15.6.
Updating Cairo to 1.15.10 didn't help a lot (passed +=1).
Exploring the reason for the 55 occurences of
E cairocffi.CairoError: cairo returned 41: b'error occurred in the Windows Graphics Device Interface'
I finally spotted the culprit in test_api.py at line 635 in test_bookmarks():
assert_bookmarks('''
<style>* { height: 10px; font-size: 0 }</style>
....
The font-size: 0 kills Pango and provokes the CairoErrors in subsquent unit tests.
Inside pytest the Pango-WARNING isn't visible. When I run the snippet outside of pytest I see
(python.exe:7200): Pango-WARNING **: scaled_font status is: error occurred in the Windows Graphics Device Interface
Why doesn't this happen on Linux machines? Don't know. On Windows it's probably related to the way how DLLs are loaded. I guess, Python loads Pango/Cairo only once at startup/when required...
In any case, replacing font-size: 0 with line-height:0 left me with 12 failures.
@font-face on WindowsThe 3 tests in test_fonts.py cannot pass on Windows because of:
UserWarning: @font-face is currently not supported on Windows
Out of curiosity I installed the weasyprint.otf from the ./resources/ folder and have the sneaking suspicion that this font must be loaded via special treatment (using FontConfig instead of Cairo) because otherwise not even the kernings and ligatures (being on by default) are respected, let alone turning them on and off via font-feature-settings or font-variant.
Thats how "kkliga", styled with font-family weasyprint, looks when rendered as PDF or in a M$Word 2010 document:

Red color applied to "liga" for clarification.
All the letters have the same shape/kerning/ligature, no wonder that the box.widths are bigger than expected.
> assert span1.width == 1.5 * 16
E assert 32.0 == (1.5 * 16)
Out of more curiosity I enabled @font-face by letting weasyprint/fonts.py load the libfontconfig-1.dll and libpangoft2-1.0-0.dll present in my GTK3.
Only drawback seems to be:
FontConfiguration can't cleanup the temporary files it created because the library has still open file handles when the object is __del__eted.
On Unix os.remove() a file that is in use works fine, on Windows a PermissionError is raised.
No, not the only drawback.
Although the tests in test_fonts.py now pass, 19 new failures emerge, failures seemingly unrelatet to @font-face, so I guess there's actually a good reason not to use libfontconfig/libpangoft2 on Windows.
Opening an utf-8 encoded file without specifying 'utf-8' fails.
In test_api.test_html_parsing
with open(filename) as fd:
> string = fd.read()
...
self = <encodings.cp1252.IncrementalDecoder object at 0x000000000732F828>
...
E UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1445: character maps to <undefined>
If encoding is not specified, the default is platform dependent -- which in my western Windows case is 'cp1252'. Giving 'utf-8' solves the problem:
with open(filename, encoding='utf-8') as fd:
b'filename' to urlIn test_api.test_html_parsing, test_api.test_command_line_render, test_api.test_unicode_filenames, test_pdf.test_embedded_files
Had a hard time finding the offending code -- it's Python's nturl2path.pathname2url() cutting off trailing path separators and being unable to handle binary b'path'.
Altered weasyprint/urls.py for the rescue:
def path2url(path):
"""Return file URL of `path`
Quoting https://docs.python.org/3/library/sys.html:
> sys.getfilesystemencoding()
> Return the name of the encoding used to convert between Unicode
> filenames and bytes filenames. For best compatibility, str should
> be used for filenames in all cases, although representing filenames
> as bytes is also supported.
> Functions accepting or returning filenames should support either
> str or bytes and internally convert to the system鈥檚 preferred
> representation.
Fact is: Windows specific nt2url2path.pathname2url only works with `str`
Workaround: decode b'path' to filesytem's encoding to avoid
- TypeError: can't concat str to bytes
- TypeError: a bytes-like object is required, not 'str'
"""
if not isinstance(path, str):
# convert to `str`
path = path.decode(sys.getfilesystemencoding())
path = os.path.abspath(path)
add_trailing_slash = os.path.isdir(path)
if add_trailing_slash:
# Make sure directory names have a trailing slash.
# Otherwise relative URIs are resolved from the parent directory.
# this, too, only works with `str`!
path += os.path.sep
path = pathname2url(path)
# on Windows pathname2url cuts off trailing slash
if add_trailing_slash and not path.endswith('/'):
path += '/'
if path.startswith('///'):
# On Windows pathname2url(r'C:\foo') is apparently '///C:/foo'
# That enough slashes already.
return 'file:' + path
else:
return 'file://' + path
Cairo < 1.15.10 treats filenames as being in the current locale aka ANSI aka 'mcbs'. Python >= 3.6 defaults Windows filesystem encoding to UTF-8.
In test_api.test_unicode_filenames
filename = b'Unic\xc3\xb6d\xc3\xa9'
def read_file(filename):
"""Shortcut for reading a file."""
> with open(filename, 'rb') as fd:
E FileNotFoundError: [Errno 2] No such file or directory: b'Unic\xc3\xb6d\xc3\xa9'
Indeed, there is no such file, no file named 'Unic枚d茅'. Instead there is a file named 'Unic脙露d脙漏' (Huh, looks familiar to me).
The concerned filename is generated in cairocffi.surfaces._encode_filename() to generate a char[], suitable for Cairo.
Contemplating the mental state of C programmers and taking into account PEP 529, I finally came up with the following patch, suitable for all Cairo versions:
# cairocffi.surfaces.py
def _encode_filename(filename):
"""Return a byte string, encoding Unicode with the filesystem encoding.
Experimental patch for Cairo on Windows:
Since Python 3.6 the default filesystem encoding defaults to 'utf-8'.
Apparently Cairo treats filenames as being in the current locale
aka ANSI aka 'mcbs'.
See PEP 529 https://www.python.org/dev/peps/pep-0529/
Beware: Characters outside of the user's active code page!
Update: Cairo >= V 1.15.10 uses UTF-8 filenames on Windows.
"""
if not isinstance(filename, bytes):
# Q: os.name == 'nt' ??
if sys.platform.startswith('win'):
if cairo.cairo_version() >= 11510:
filename = filename.encode('utf-8')
else:
# not shure what's the best value for errors.
# neither "?" nor "\" allowed in Windows filenames
try:
filename = filename.encode('mbcs')
except UnicodeEncodeError:
# any better idea?
filename = filename.encode('utf-8')
else:
filename = filename.encode(sys.getfilesystemencoding())
return ffi.new('char[]', filename)
When using Cairo < 1.15.10 this patch makes test_unicode_filenames pass on my machine because the strange letters in 'Unic枚d茅' exist in cp1252. With a 'Cyrillic泻懈褉懈虂谢谢懈褑邪' filename the test would probably succeed in Russia.
To bugfix this within WeasyPrint (instead of altering cairocffi) all code, calling cairo.ImageSurface with a str filename, should instead pass properly encoded bytes.
In test_draw.test_visibility
expected: got:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ r B B B _ r B B B _ _ _ r B B B _ _ r B B B _
_ B B B B _ B B B B _ _ _ B B B B _ _ B B B B _
_ B B B B _ B B B B _ _ _ B B B B _ _ B B B B _
_ B B B B _ B B B B _ _ _ B B B B _ _ B B B B _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
BTW: At the first, second and third glance my eyes didn't spot the difference between the PNGs, not even at maximum magnification.
That's the breeding html:
<div>
<img src="pattern.png">
<span><img src="pattern.png"></span>
<div>
Looking at the <div>s Linebox reveals that the first TextBox is not empty:
<InlineReplacedBox img>
<TextBox div> text = " "
<InlineBox span>
<TextBox div> text = ""
It's been a little while since I encountered such unexpected pixels in WebDesign. I usually switched to Zen-Design or removed the triggering linebreaks.
Indeed, the 1-pixel-relocation can be avoided by eliminating the linebreak after the first <img> tag, but that would probably contradict the test's purpose.
The box.position_y is greater than expexted.
In test_layout.test_linebox_positions
ref_position_y += line.height
assert ref_position_y == box.position_y
E assert 26.0 == 32.0
E + where 32.0 = <InlineBox strong>.position_y
Further investigation revealed that it's alway and only the last <InlineBox strong>. Though, thanks to tests_ua.css no strong formatting is applied. And always 6 pixels too far down.
With a blue background applied to the <strong>, the failing html-snipped renders like this:

The blue rectangle is the displaced InlineBox. Its child, the TextBox has the correct position_y. Confirmed in debugger.
Looks like Ahem has no chance to be rendered precisely.
In test_tables.test_table_vertical_align

The semitransparent and protruding pixels in the PNG are probably a result of Windows' ClearTyping or suchlike. Whatever the reason, assert_pixels_equal() is doomed to fail.
Both, my Liberation Sans and my DejaVu Sans, seem to be broken when used with WeasyPrint. They work perfectly well when used in the browser or in M$Word...maybe another Cairo feature?
Liberation Sans
In test_layout.test_page_and_linebox_breaking
The snippet should produce 2 pages. Instead it produces the following ugly output:

After forcing another, working font, everything was fine.
DejaVu Sans
In test_layout.test_font_stretch
Similar issue with DejaVu Sans -- seems the renderer doesn't get the charcters' heights right. Pointless to expect a proper font-stretch.
Using e.g Lucida Console, font-stretch:condensed works, but the y-position is definitely wrong.

Applied blue background to the floating paragraphs.
Acid test fails.
In test_draw.test_acid2

Summary:
Most of the issues mentioned here can be circumvented. By avoiding special chars in filenames, only using proper fonts with simple properties and switching to Zen-Mode.
I suggest, pytest should skip the tests that require @font-face, when sys.platform.startswith('win').
Couldn't resist to implemet @font-face for Windows, meaning: Implementing FontConfig and FreeType for font rendering. And guess what?

pytest passes with no errors.
Of course, I also applied the above-mentioned fixes for CairoError and encoding issuess.
Will create a pull request in the next days...
Couldn't resist to implemet @font-face for Windows, meaning: Implementing FontConfig and FreeType for font rendering. And guess what?
OMG
pytest passes with no errors.
OMG
Will create a pull request in the next days...
OMG
:heart: BEST聽BUG REPORT聽EVER :heart:
I think we can close that issue. With the current master branch and when Python is up-to-date (>= V 3.6) and Cairo is up-to-date (>= V 1.15.10) and has a working fontconfig then all pytests pass.
Maybe in documents.py another warning could be issued:
if sys.platform.startswith('win'):
if cairo.cairo_version() < 11510 and sys.getfilesystemencoding() != 'utf-8':
warnings.warn('expect funny Unicode filenames')
For the record: b5c1840 fixes the encoding problem in cairocffi.
Most helpful comment
OMG
OMG
OMG
:heart: BEST聽BUG REPORT聽EVER :heart: