Hello.
The \A new line escape char in CSS raises AssesrtionError if the first char of the second line is non-latin, or in my case an Arabic letter. The following is what I get:
2019-03-18 15:33:39,460 [ERROR] Error occured: Traceback (most recent call last):
File "D:\my\project\pdf.py", line 126, in generate
pdf = html.write_pdf(stylesheets=stylesheets)
File "C:\Python37\lib\site-packages\weasyprint\__init__.py", line 198, in write_pdf
font_config=font_config).write_pdf(
File "C:\Python37\lib\site-packages\weasyprint\__init__.py", line 159, in render
font_config)
File "C:\Python37\lib\site-packages\weasyprint\document.py", line 361, in _render
[Page(p, enable_hinting) for p in page_boxes],
File "C:\Python37\lib\site-packages\weasyprint\document.py", line 361, in <listcomp>
[Page(p, enable_hinting) for p in page_boxes],
File "C:\Python37\lib\site-packages\weasyprint\layout\__init__.py", line 184, in layout_document
make_margin_boxes(context, page, state))
File "C:\Python37\lib\site-packages\weasyprint\layout\pages.py", line 435, in make_margin_boxes
yield margin_box_content_layout(context, page, box)
File "C:\Python37\lib\site-packages\weasyprint\layout\pages.py", line 444, in margin_box_content_layout
absolute_boxes=[], fixed_boxes=[])
File "C:\Python37\lib\site-packages\weasyprint\layout\blocks.py", line 363, in block_container_layout
for line, resume_at in lines_iterator:
File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 56, in iter_line_boxes
device_size, absolute_boxes, fixed_boxes, first_letter_style)
File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 105, in get_next_linebox
waiting_floats, line_children=[])
File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 745, in split_inline_box
line_placeholders, child_waiting_floats, line_children))
File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 583, in split_inline_level
context, box, max_x - position_x, skip)
File "C:\Python37\lib\site-packages\weasyprint\layout\inlines.py", line 1020, in split_text_box
'Expected nothing or a preserved line break' % (between,))
AssertionError: Got '\n ุงูุชุงุฑูุฎ: ุงูุฃุญุฏ 3 ู
ุงุฑุณ 2019' between two lines. Expected nothing or a preserved line break
I'm using:
WeasyPrint==45
cairocffi==1.0.2
CairoSVG==2.3.0
The part of CSS string causing the issue:
@top-right {{
padding: 0.5cm;
font-family: 'Dubai-Medium', sans-serif;
content: " {doc_ref}: {ref} \A {doc_date}: {weekday} {day} {month} {year} \A {doc_addressee}: {addressee} ";
display: block;
white-space: pre;
vertical-align: top;
}}
Hello!
To be honest, right-to-left scripts are poorly supported. As the library we're using to render text (called Pango) handles them perfectly, WeasyPrint may sometimes give the impression that everything just works. But it actually doesn't at all (see #106) :cry:.
I can leave this issue open and see if anyone finds a way to fix this exact use case. But until #106 is closed, you can expect a lot of bugs with Arabic text.
Hello LiZe.
I have to admit, your work is beyond amazing. The fact that I can generate pdf docs containing arabic texts is more than enough for me. The RTL caveats are being handled in special cases in my code. So, thanks a lot for your efforts.
Okay after a lot of attempts in a lot of previous cases I found that it's actauly more common and that https://github.com/Kozea/WeasyPrint/commit/51dcbe36a509311a7d549f65c6499e46cf3f3bdf had resolved this problem after I manually ported this commit to my current install.
Do you think you can get a release soon with this commit as I can't use master branch in the production, I can only use the releases (some CI/CD limitations). I would be really really grateful if you would create an out of schedule release (today?) with that commit with it in order to be able to push some major updates to the project.
I have to admit, your work is beyond amazing. The fact that I can generate pdf docs containing arabic texts is more than enough for me. The RTL caveats are being handled in special cases in my code. So, thanks a lot for your efforts.
:smile:
Do you think you can get a release soon with this commit as I can't use master branch in the production, I can only use the releases (some CI/CD limitations)
Maybe you already know that, but you can install the master branch with pip using pip install git+https://github.com/Kozea/WeasyPrint.git@master.
I would be really really grateful if you would create an out of schedule release (today?) with that commit with it in order to be able to push some major updates to the project.
A version was already scheduled this week, it can be today if I can find the time. I'd just like to fix the problem in this comment before.
Maybe you already know that, but you can install the master branch with pip using pip install git+https://github.com/Kozea/WeasyPrint.git@master.
I'm aware of this use-case, but unfortunately I have genuine reason not to use it.
A version was already scheduled this week, it can be today if I can find the time. I'd just like to fix the problem in this comment before.
I'm wishing you best of luck.
Version 46 has been released!
I bursted into productivity cycle after the release and missed replying here.
Thanks a lot. Keep up the good work.
Hello and wish you a great weekend ahead!
I'm reopening this issue due to false positive condition. The commit didn't actually fix the whole problem. I was able to reproduce exactly the same error with other templates. I'm going to spend some more time testing this area of your lib. If I achieve anything I would prepare a PR.
I was able to reproduce exactly the same error with other templates.
Don't hesitate to post your HTML+CSS samples, I'll see if I can help.
I'm going to spend some more time testing this area of your lib. If I achieve anything I would prepare a PR.
Thank you, good luck!
Could you give a shot with some basic lipsom Arabic text like:
import weasyprint
weasyprint.HTML(string=''''ููุฑูู
ุงูุจุณูู
ุฏููุงุฑ ุณูุช ุฃู
ูุช ,ูููุณููุชูุชูุฑ ุฃุฏุงูุจุง ูุณูููุฌ ุฃููุงูุช,ุณูุช ุฏู ุฃููุณู
ูุฏ ุชูู
ุจูุฑ
ุฃููุงูุฏูุฏูููุชููุช ูุงุจูุฑู ุงุช ุฏููุงุฑ ู
ุงุฌูุง ุฃูููููุง . ููุช ุงููู
ุฃุฏ ู
ูููู
ูููุงูู
,ูููุงุณ ููุณุชุฑูุฏ
ุฃูุณูุฑ ุณูุชุงุดู ูููุฃู
ูู ูุงุจูุฑุฃุณ ููุณู ูุช ุฃูููููุจ ุฃูุณ ุฃูุง ููู
ู
ูุฏู ูููุณููููุงุช . ุฏููุงุณ
ุฃููุชู ุฃุฑูุฑู ุฏููุงุฑ ุฅู ุฑูุจุฑููููุฏูุฑุฃูุช ูููููุจุชุงุชู ูููุงูุช ุฃูุณุณู ูุงูููููู
ุฏููุงุฑ ุฃูู ููุฌุงูุช
ููููุง ุจุงุฑุงูุงุชููุฑ. ุฃููุณุณูุจุชููุฑ ุณุงููุช ุฃูููุงููุงุช ูููุจุงูุฏุงุชุงุช ููู ุจุฑูุงูุฏููุช ,ุณูููุช ุงู ููููุจุง
ููู ุฃูููุณูุง ุฏูุณูุฑูููุชู
ูููุช ุงููู
ุฃูุฏู ุงูุณุช ูุงุจูุฑููู
''').write_pdf()
Could you give a shot with some basic lipsom Arabic text like:
I can reproduce your bug, thank you.
We were very lucky to have your original bug fixed, but as I said earlier: right-to-left scripts are poorly supported. The code is full of assumptions that only work with Latin or Cyrilic scripts, and even some left-to-right languages should fail.
I understand this. I would be glad if I could add something to this great
project that is been helping me for years. I'll try to work out something.
On Fri, Mar 22, 2019 at 6:38 PM Guillaume Ayoub notifications@github.com
wrote:
Could you give a shot with some basic lipsom Arabic text like:
I can reproduce your bug, thank you.
We were very lucky to have your original bug fixed, but as I said earlier:
right-to-left scripts are poorly supported. The code is full of assumptions
that only work with Latin or Cyrilic scripts, and even some left-to-right
languages should fail.โ
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/Kozea/WeasyPrint/issues/828#issuecomment-475645608,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AATVPnGsbqRhyUevlEwIFlvHXU049zvnks5vZOrTgaJpZM4b5lqc
.
LiZe, what's the real purpose of this:
assert between in line_breaks #inlines.py line1022
after commenting this line (and the following two) in an attempt to see where the code is breaking the error message was gone. Could you please elaborate more on where in your code is the line breaks being generated? I assume it's get_next_linebox but since I'm new to your codebase I'm unable to find my way through it.
UPDATE:
Another observation is after commenting lines 1022, 1023, and 1024 in inlines.py what I get is a text that is overflowing over the page width. I'm starting to assume it has something to do with iterating over bytes rather that chars. Your help here would allow me to fix this issue.
what's the real purpose of this:
This assert is here to check that there's no text between the end of a line and the beginning of the next line. If this assert is wrong, it basically means that some text is missing in the layout. So it's generally a good idea to keep itโฆ
I'm starting to assume it has something to do with iterating over bytes rather that chars. Your help here would allow me to fix this issue.
The problem is probably in text.split_first_line. The whole line splitting algorithm has been created with left-to-right scripts in mind, and even if Pango is really good at dealing with right-to-left scripts, our code doesn't use it "the right way".
If you want to debug it, your lipsum sample is perfect. You can comment the assert, see in the PDF where the lines should be split, and check the values returned by split_first_line. resume_at is basically the place in the document where the split occurs (i.e. the number of characters that have already been displayed). It is used as a string index, and it may be wrong with RTL scripts.
Hello.
@liZe, there's some funny business going on here.
I realised an old app that i've written in Python as well sometime before is still working and it has a condition where an overlapping line in Arabic is there. However, unline version 45 and 46, no errors nor any part of the lines are getting hidden. The app was written with version 43 of weasyprint.
I tried having a deep look at the diff between master and the tagged version and found nothing to tell me about any kind of related issue to the previously discussed attrs; split_first_line and resume_at. Maybe since you are the author of most of the lib you can find the issue way faster than me.
I'm still digging into the issue though in order to find which commit changed the behaviour.
Hello,
Thanks a lot for your hints.
I've bisected using your script and the bad commit is 0a3db7d. There's probably a problem with the index we get from pango_layout_iter_get_index.
Hello
We have the same problem. With older version of weasyprint our rtl documents were generated (not perfect but we have pdfs). Now we get an error :
line_children)) File "/usr/local/lib/python3.7/dist-packages/weasyprint/layout/inlines.py", line 584, in split_inline_level context, box, max_x - position_x, skip) File "/usr/local/lib/python3.7/dist-packages/weasyprint/layout/inlines.py", line 1024, in split_text_box 'Expected nothing or a preserved line break' % (between,)) AssertionError: Got ' ูุฑูุณู- ุฃุฑุฏูู ุจุฅุฏุงุฑ' between two lines. Expected nothing or a preserved line break ' on stderr, on output. CMD : /usr/local/bin/weasyprint --encoding utf8 --format pdf
As for AjawadMahmoud the problem was skipped by commenting lines (1022 to 1024 in inlines.py) but it is not a long-term solution. And we don't evaluate the side effects yet.
We have a lot of arabic documents and it is a real problem if we can't get pdf versions.
Do you plan to fix this bug soon ? just to think how we can do
Thank you for your answer
Have a good day
It's now fixed, but the fix is probably bad for performance. I'll release version 47 and try to improve performance later.
I want to thank you all for fixing it!
Thanks a lot, @liZe. I really wanted to give you more hand in here to help you fix it but my schedule is really full.
Thanks for @lnprieto as well for adding to conversation.
Most helpful comment
It's now fixed, but the fix is probably bad for performance. I'll release version 47 and try to improve performance later.