Weasyprint: Declarative way to add accessibility tags to PDFs

Created on 25 Mar 2020  Â·  10Comments  Â·  Source: Kozea/WeasyPrint

In many industries, accessibility is a big deal. It's impossible for a PDF to be considered accessible unless it's "tagged":

  1. WebAIM's guide to accessible PDFs
  2. Section 508's guide to creating accessible PDFs
  3. Acrobat's Accessibility Portal
  4. U of MN's guide to Accessible PDFs
  5. WebAIM mailing list archives that mention WeasyPrint

If accessibility tagging was added to weasyprint, it would be an ace in the hole for many industries. The current state of the art seems to be to do a whole lot of clicking in acrobat.

If the input HTML had certain special attributes, weasyprint could apply the equivalent accessibility tags.

feature

Most helpful comment

  1. Without any promises, how soon will you be dropping Cairo?

Next release will come in September (I hope), next one will be without Cairo (may take time for users to test and report broken corner cases).

2. Assuming that 1) will not be happening in the next 4 weeks, if we supported your patreon campaign, would you be willing to tackle this issue still using Cairo?

Big secret: we’re currently building a small structure dedicated to WeasyPrint and its dependencies (and misc free software), you shouldn’t give to the patreon campaign and wait for a the end of the month, we’ll have more time and more resources to implement the features you want :wink:.

3. If not, I would welcome some ideas on where and how you would like to see me implement this..

It will be easier without Cairo, as we’ll have a dedicated library to create PDF files, and won’t have the Cairo surface / PDF bytestring separation as it’s done now.

All 10 comments

It would be possible if we had our own PDF generator or post-processor. If someone is interested in replacing Cairo…

@liZe

  1. how much work would you think it is to replace Cairo ?
  2. to what you would suggest to replace it to ?
  1. how much work would you think it is to replace Cairo ?

A lot (see #841).

  1. to what you would suggest to replace it to ?

A pure-Python library generating PDF. I’m not fond of reportlab, but something like that should be OK.

(I’m currently reading the PDF spec to write such a library, but that’s a secret :wink:.)

This sounds very interesting. I would really appreciate the effort to implement the tagging of PDF documents.

Dear Lize!

We also now have the requirement of a tagged and accessible PDF. After some research I found:
https://www.cairographics.org/news/cairo-1.16.0/

The PDF backend has gained support for .. tags. Tags permit adding
logical info such as headings, tables, figures, etc. that facilitates
indexing, accessibility, text reflow, searching, and extraction of the
tagged items to other software. For details on this new PDF
functionality, see:
https://lists.cairographics.org/archives/cairo/2016-June/027427.html

And directly in the Cairo docs:
https://www.cairographics.org/manual/cairo-Tags-and-Links.html#doc-struct

It does seem to me, that Cairo does indeed support the required structural tagging.

As a standard HTML uses all these tags allready, it should be quite straight forward to map those to the appropriate PDF-Tags?

Let me know, if we can assist in any way to help this issue along!

As always, thanks a lot for your great library!

Johannes

It does seem to me, that Cairo does indeed support the required structural tagging.

It does, you’re right!

It means that it could be possible to add tags using Cairo. I don’t think that there’s currently an easy way to do this using only the public API of WeasyPrint, even with the new finisher option. We’ll drop Cairo soon, but it will be possible to do this with another library too.

If anyone wants to work on this, I can help!

Great news!
1) Without any promises, how soon will you be dropping Cairo?
2) Assuming that 1) will not be happening in the next 4 weeks, if we supported your patreon campaign, would you be willing to tackle this issue still using Cairo?
3) If not, I would welcome some ideas on where and how you would like to see me implement this..

  1. Without any promises, how soon will you be dropping Cairo?

Next release will come in September (I hope), next one will be without Cairo (may take time for users to test and report broken corner cases).

2. Assuming that 1) will not be happening in the next 4 weeks, if we supported your patreon campaign, would you be willing to tackle this issue still using Cairo?

Big secret: we’re currently building a small structure dedicated to WeasyPrint and its dependencies (and misc free software), you shouldn’t give to the patreon campaign and wait for a the end of the month, we’ll have more time and more resources to implement the features you want :wink:.

3. If not, I would welcome some ideas on where and how you would like to see me implement this..

It will be easier without Cairo, as we’ll have a dedicated library to create PDF files, and won’t have the Cairo surface / PDF bytestring separation as it’s done now.

Sounds great! Looking forward to a bright future, where pdf writing is fully under your control :-)

But I can't put our current project on that timetable. So I just looked into the code.. it was very straight forward to put a few calls to context.tag_begin and tag_end into draw.py with the appropriate mappings from element_tag. Works great! Checked with PAC3 and Adobe Acrobat. Next step will be nested sections and table structures.

Will keep you posted!

So I just looked into the code.. it was very straight forward to put a few calls to context.tag_begin and tag_end into draw.py with the appropriate mappings from element_tag.

Would you be willing to share a snippet that explains how to implement this strategy @JohannesMunk ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

SimonSapin picture SimonSapin  Â·  4Comments

whitelynx picture whitelynx  Â·  5Comments

assuntaw picture assuntaw  Â·  3Comments

amarnav picture amarnav  Â·  5Comments

Tontyna picture Tontyna  Â·  4Comments