Wkhtmltopdf: large gaps and undesirable page breaking for tables

Created on 15 Apr 2014  路  33Comments  路  Source: wkhtmltopdf/wkhtmltopdf

After the latest patch to fix blank pages that occur with large nested tables, there are no more blank pages at the start of the document, but in certain scenarios there are large gaps between tables within pages (not always a page break), and even medium sized nested tables (<50% page height) are being put onto the next page.

Could we fix this by only breaking the page if the TD (that will be moved to the next page) contains no nested tables? Isn't this really what we're trying to achieve?

Fixed

Most helpful comment

Thanks @repher , your solution works for me

tr { page-break-inside: avoid; }

Its nice ..

image

Even with tfoot

image

:+1:

All 33 comments

Can you post a sample test case?

A sample test case would be helpful to understand the exact scenario.

It's occurring in one of my client's reports that had a lot of sensitive data in it. When I tried filtering out all the sensitive stuff, it wouldn't exhibit the problem. I'll try again when I get a moment.

Please reopen the issue when you are able to reproduce it.

Sorry this took so long. The issue still exists with the current version.
http://myworkflowfirst.com/hotfix/portrisk.htm
...generates this:
http://myworkflowfirst.com/hotfix/portrisk.pdf

The report is auto-generated and consists of multiple nested tables.

Would it be possible to change the page break logic so that it only pushes the TD to the next page if it DOESN'T contain a table?

@ashkulz - not sure how to re-open, could you do it for me?

@tenshon: it looks like the HTML was generated from Word, is that correct? You might want to clean it up before passing it on to wkhtmltopdf.

@ashkulz - I think it has additional styles in there for Word because the users wanted to open it in Word or something - not sure, I didn't create it. I could try to create a cleaner HTML to reproduce, but still I don't think wkhtmltopdf should be creating those page breaks...

What do you think of my idea of only breaking if the TD doesn't contain a table?

With the kind of junk HTML Word generates, the document switches to "Quirks mode" -- which may cause all sorts of rendering issues. Can you try to see if you get different output after cleaning up the HTML so that it renders in "Standards compliance mode"?

quirks-mode

@ashkulz - Ok I'll edit the HTML to get it out of quirks mode and let you know.

@ashkulz - ok the guy who obfuscated the figures used Word to find/replace, which messed it up.

I've attached a cleaner version here:
http://myworkflowfirst.com/hotfix/portrisk2.htm
http://myworkflowfirst.com/hotfix/portrisk2.pdf

This does look better (much better than the normal generated report) - but still at the bottom of the first page it's moving the table onto the next page. I presume this is because it's wrapped in a TD - but again, if you only pushed TDs to the next page that don't contain tables, wouldn't it fix it?

I don't think that is really fixable without changing the current approach -- you should never implement any row moving logic _unless_ you have a page-break-inside: avoid defined at the tr level. Then the default behaviour would be as earlier, and you could use CSS to enable the row moving logic for specific tables/rows insead of the current hacky approach.

Is there any way to disable the current logic, in factor of explicitly putting in "page-break-inside: avoid" ?

@ashkulz - ok I just read through the previous thread and looked at the code changes that were made, and right - it doesn't look like it can be disabled in CSS. Would you be able to implement the "less hacky" approach of requiring page-break-inside:avoid any time soon? My clients are pretty eager to have the issue resolved, it's a pretty big deal to them...

Nope, not really. I work on wkhtmltopdf in my spare time, and this doesn't look like something which can be tackled very easily (although the actual changes required may be less, understanding the Webkit code enough to know _what_ needs to be done will take a lot of time).

@ashkulz - I appreciate all the work you do on this project - but I don't suppose you consider doing work on wkhtmltopdf on a paid basis do you?

A development snapshot 0.12.2-dev-5dea253 is available, which should fix this issue. Please refer to the breaking changes in this release and report back if your issue is not solved with the above snapshot or there is a regression from existing behavior.

@ashkulz OK continuing here. So I tried to check what the difference might be between your html and mine but without success. It is easier said than done, I spent hours trying out css woodoo ...

I think that we need a more deterministic approach by specifying and building the right TOOLS to find this kind of problems so we can fix them.
A first step would be if wkhtmltopdf could provide a debug output focused on this kind of issues
i.e.when page break happens and what where the effective css rules leading to this...

What do you think?

Strip out all the CSS and see if it still gets cut, then it is a clear content related issue. If not, then go on adding till you find the offending declaration (it can help to use a binary search approach -- remove half of the CSS which is know to cause the problem and repeat this step).

See https://gist.github.com/Ognian/813dedee3bd28633784c
I've striped out all css, left only the
thead, tfoot {display: table-row-group; }
generating the pdf via

wkhtmltopdf --page-size A2 --orientation landscape test1.html test1.pdf

and still get a cut line
bildschirmfoto 2014-12-01 um 10 58 32
even if removing thead, tfoot {display: table-row-group; } it is still cut at that line...
So, what do you mean that this is a content related issue? (It is a correct html and it is displayed correct in the browser)....

@Ognian: it looks like you are using nested tables. If you have a row in the outer table which spans multiple pages, it is going to get cut anyway -- so it makes sense to apply the rule to the inner table only e.g. tr.json-array-table { page-break-inside: avoid } so that its rows are kept together. Still, the line should not get cut even then -- I'll have to investigate it further.

@Ognian: 715cc3dd48a1f5a1be70be612afd732003d0eeda is a step in this direction, but unfortunately the reported output is not accurate for table cells. That's something to be done in the future :smile:

tried tr.json-array-table { page-break-inside: avoid } but, the line is still cut; so I hope you'll find something...

Just stumbled accross this discussion after having the same issue as most of you (overlapping, line cut in half). After installing 0.12.2-dev-5dea253 and adding tr { page-break-inside: avoid; } everything works for me. Even when using thead for repeating headers and extrem combination of rowspan="x".

Thanks for your work!

Unfortunately I am having the same issues as @Ognian. The fix @repher mentioned does not work for me.

@ashkulz thanks for the awesome work. I am also not using nested tables. I am just using a default table.

Here is my code. I hope it's ok to post my slim template.

  table.table
    thead
      tr
        th.span-1 WID
        th.span-4 Work Item Description
        th.span-2 Unit
        th.span-1 Price
        th.span-1 Quantity
        th.span-1 Discount
        th.span-1 Subtotal
    tbody
      - section.line_items.each do |li|
        tr
          td= li.wid
          td= li.description
          td= li.unit
          td= humanized_money_with_symbol @report.convert_money(li.price)
          td= li.quantity
          td= number_to_percentage li.discount, precision: 2
          td= humanized_money_with_symbol @report.convert_money(li.budget)
        - if li.custom_description.present?
          tr
            td colspan="7"
              .pre-wrap= li.custom_description

Here is a screenshot of how it looks like:
image

Edit: I removed all stylesheets. Except the following:

.hidden-in-print
  display: none

body
  background: #fff

.main-container
  background: #fff
  padding: 0px

.page-break
  height: 1px
  page-break-before: always

table
  border-collapse: collapse
  tr, td, th
    page-break-inside: avoid !important

It still looks like this:

image

The wkhtml version I am using is: wkhtmltopdf 0.12.2-dev-5dea253 (with patched qt) for OSX

@hendricius: Please post actual HTML and CSS so that I can verify it.

0.12.2 has been released, which includes changes related to this issue.

I tried with 0.12.2 but unfortunately it doesn't solve my issues...
Any idea when a 0.13 alpha for OS X will be available, so that I can test with that?
Thanks
Ognian

0.13 alpha won't help you, see the release plan. I'll have to port all these patches to the 0.13 version (which should be done at 0.13 rc). Your problem is basically that

when page-break-inside: avoid for rows is used, lines within a row may break in between pages

which is a consequence of the way the patch was done -- in RenderBlock::layoutRunsAndFloats the position of the line is not adjusted if it is a table cell which has pagination -- which causes the break. There's no easy way to avoid it.

Please open up a new issue to track this.

was having the same issue, everything fixes when i add a style table tr {page-break-inside: avoid}

Thanks @repher , your solution works for me

tr { page-break-inside: avoid; }

Its nice ..

image

Even with tfoot

image

:+1:

Your "solutions" don't work in my case. I get both your issues (1) Overlapping table headers (2) Lineabreaks which b0rk the table content. I regret becoming a programmer because of CSS.

I was facing the same issue add after a lot of trial n errors this css solved the issue

tr {
display: inline-table;
}

Was this page helpful?
0 / 5 - 0 ratings