Almanac.httparchive.org: Streamline figures in markdown

Created on 1 Aug 2020 · 10Comments · Source: HTTPArchive/almanac.httparchive.org

I'd like to propose a change in the way figures are described by authors in the markdown and rendered internally.

In css.md, the corresponding markdown (using embedded HTML) for the figure above is:

<figure>
  <a href="/static/images/2019/css/fig12.png">
    <img src="/static/images/2019/css/fig12.png" alt="Figure 12. Adoption of flexbox." aria-labelledby="fig12-caption" aria-describedby="fig12-description" width="600" height="371" data-width="600" data-height="371" data-seamless data-frameborder="0" data-scrolling="no" data-iframe="https://docs.google.com/spreadsheets/d/e/2PACX-1vQO5CabwLwQ5Lj1_9bbEFnFM1qEqCorymaBHrcaNiMSJ7sYDKHUI5iish5VAS-SxN447UTW-1-5-OjE/pubchart?oid=2021161093&amp;format=interactive">
  </a>
  <div id="fig12-description" class="visually-hidden">Bar chart showing 49% of desktop pages and 52% of mobile pages using flexbox.</div>
  <figcaption id="fig12-caption">Figure 12. Adoption of flexbox.</figcaption>
</figure>

One thing I'd like to improve is the figure numbering. The figure ID is automatically generated, but we're still manually writing Figure 12 in the <figcaption> and managing the IDs needed for the accessible description. Can all of this be automated? I'd also like to make all figure numbers unique and prefixed by chapter number. So rather than Figure 12 this would be Figure 2.12.

The fig12.png image shouldn't be named according to its figure number. These names aren't descriptive and must be kept in sync with the figure ordering in the chapter. Instead, we should name figure images descriptively, like flexbox-adoption.png. We should use this name as the <figure id> and permalink so that reordering figures wouldn't make old links obsolete.

I would also love to see all figures accompanied by links to their metadata like the corresponding SQL file and results sheet. This would enable anyone to see how the metrics were calculated and run it themselves. This requires some design choices like how to provide these links unobtrusively and accessibly. My idea is a menu in the corner of the figure, similar to the HTTP Archive metrics:

Rather than generating figure numbers in the build process, I wonder if this could be done entirely in the templates. For example, could authors include a Jinja macro like this:

{%
  figure(
    # Figure ID corresponding to the `png` image and `sql` file name.
    'flexbox-adoption',
    # Figure caption.
    'Adoption of flexbox.',
    # Detailed figure description.
    'Bar chart showing 49% of desktop pages and 52% of mobile pages using flexbox.',
    # Embedded Sheets URL. (maybe we can add the base of this URL to the chapter yaml and only include the relevant IDs?)
    'https://docs.google.com/spreadsheets/d/e/2PACX-1vQO5CabwLwQ5Lj1_9bbEFnFM1qEqCorymaBHrcaNiMSJ7sYDKHUI5iish5VAS-SxN447UTW-1-5-OjE/pubchart?oid=2021161093&amp;format=interactive',
    # Tab ID in Sheets for this metric's results. For example `/edit#gid=1861654265`.
    '1861654265',
    # Optional width and height if non-standard (600x371).
    600, 371
  )
%}

There are other kinds of figures, like big numbers:

<figure>
  <div class="big-number">2%</div>
  <figcaption>Figure 13. Percent of websites using grid.</figcaption>
</figure>

Even though the markup for this figure is much simpler, I would love to see another macro to standardize the boilerplate across all chapters so that the developers have more centralized control over how these are generated. We could have a macro like this:

{%
  figure_big_number(
    # Figure ID corresponding to the `sql` file name.
    'grid-adoption',
    # The big number.
    '2%',
    # Figure caption.
    'Percent of websites using grid.',
    # Tab ID in Sheets for this metric's results.
    '1459448594'
  )
%}

We should build some customizability into these macros. For example, we needed to adjust the appearance of the z-index "really big number" figure. Perhaps there could be an optional parameter for a classname to be added to the figure. This could be used to adjust the font size as in the z-index example, create "alternate" themes for the big numbers so that they're not all the same color, etc.

We also need to support other types of figures: tables, images, and videos. The 2019 Mobile Web chapter uses all three in the first three figures.

<figure>
<table>
  <tr>
    <th>Connection type</th>
    <td><a href="https://www.gsma.com/r/mobileeconomy/">2G or 3G</a></td>
  </tr>
  <tr>
    <th>Latency</th>
    <td>300 - 400ms</td>
  </tr>
  <tr>
    <th>Bandwidth</th>
    <td>0.4 - 1.6Mbps</td>
  </tr>
  <tr>
    <th>Phone</th>
    <td><a href="https://www.gsmarena.com/samsung_galaxy_s6-6849.php">Galaxy S6</a> — <a href="https://www.notebookcheck.net/A11-Bionic-vs-7420-Octa_9250_6662.247596.0.html">4x slower</a> than iPhone 8 (Octane V2 score)</td>
  </tr>
</table>
<figcaption>Figure 1. Technical profile of a typical mobile user.</figcaption>
</figure>

I think it's ok to include the <figure> HTML although it may be nice to use markdown tables when possible. It may be harder to get the <th> elements right for both horizontal and vertical using only markdown. But we should have a solution to abstract away the figure numbering and IDing. For example:

<figure id="technical-profile">
<table>
  <tr>
    <th>Connection type</th>
    <td><a href="https://www.gsma.com/r/mobileeconomy/">2G or 3G</a></td>
  </tr>
  <tr>
    <th>Latency</th>
    <td>300 - 400ms</td>
  </tr>
  <tr>
    <th>Bandwidth</th>
    <td>0.4 - 1.6Mbps</td>
  </tr>
  <tr>
    <th>Phone</th>
    <td><a href="https://www.gsmarena.com/samsung_galaxy_s6-6849.php">Galaxy S6</a> — <a href="https://www.notebookcheck.net/A11-Bionic-vs-7420-Octa_9250_6662.247596.0.html">4x slower</a> than iPhone 8 (Octane V2 score)</td>
  </tr>
</table>
<figcaption>{{ figure_number() }} Technical profile of a typical mobile user.</figcaption>
</figure>

I've added id="technical-profile" to the <figure> and {{ figure_number() }} to the <figcaption>. The figure_number Jinja macro could generate a monotonically increasing figure number using shared state with the other figure macros. I'd love to hear any ideas to remove even more of the figure/figcaption boilerplate.

Summary

Figure markup is a dark art that should be abstracted away so that authors can focus on the content.
Numeric figure IDs are brittle and should be replaced with descriptive, human readable IDs.
Figure numbers should be entirely automated.
We should include more meta resources with figures to connect the results back to the source.

cc @HTTPArchive/developers

development good first issue help wanted

Source

rviscomi

All 10 comments

I like it, @rviscomi as described, agree that additional meta would provide better UX. I was working on a similar scheme fairly recently. I'll go back and see if there's anything that might be useful in meeting your objectives.

logicalphase on 2 Aug 2020

👍1

Thanks for raising and definitely some improvements we could do here before we generate 2020 figures.

A few comments:

The figure ID is automatically generated, but we're still manually writing Figure 12 in the
and managing the IDs needed for the accessible description. Can all of this be automated?

This can certainly be automated to bring consistency.

I'd also like to make all figure numbers unique and prefixed by chapter number. So rather than Figure 12 this would be Figure 2.12.

Yes this probably makes sense. Will think on it some more.

The fig12.png image shouldn't be named according to its figure number. These names aren't descriptive and must be kept in sync with the figure ordering in the chapter. Instead, we should name figure images descriptively, like flexbox-adoption.png.

100% agree!

We should use this name as the
and permalink so that reordering figures wouldn't make old links obsolete.

Less in agreement with this. I do want to avoid making old links obsolete (or worse - point to wrong data) but not convinced moving away from numeric id's is the answer. Especially as the caption will likely still include the id as I think the text should be able to reference the figure number (e.g. "we can see in figure XXX that...").

One thing to be aware of is that we have in the past temporary removed figures resulting in changing automatically generated figure numbers, so need to be cautious that any automatic numbering is likely still to cause an issue here. And I think that could easily be missed these days since we automate the HTML generation so it's not part of the pull request anymore.

Another thing to be aware of is that the 2019 Third Parties chapter doesn't have a figure 6.

So I actually think the id should be set when the figure is inserted by author or analyst and shouldn't be automatically generated. Granted this means some re-ordering if new figures are added (e.g. large figures added during copy editing to break up long pieces of text) or removed, but that should be rare after publication.

I would also love to see all figures accompanied by links to their metadata like the corresponding SQL file and results sheet. This would enable anyone to see how the metrics were calculated and run it themselves. This requires some design choices like how to provide these links unobtrusively and accessibly. My idea is a menu in the corner of the figure, similar to the HTTP Archive metrics:

I like this! It would require using ids for the SQL filename (which I don't like for same reason as we don't want to use them for image alt attributes), having a data-sql attribute (a little extra effort), or using same name for SQL and image. Think either of last two options are do-able and maybe the data-sql attribute gives us most flexibility and allows linking the same SQL to two figures which will probably be needed (e.g. 2019 CDN chapter gives charts and tables of same data).

Rather than generating figure numbers in the build process, I wonder if this could be done entirely in the templates. For example, could authors include a Jinja macro like this:

Yes this is possible, and probably more robust than using jsdom during chapter generation. I would however give the figure id explicitly for reasons discussed above (also makes the numbering easier - especially if using different macros for big figures or none at all for tables for example).

I think it's ok to include the
HTML although it may be nice to use markdown tables when possible. It may be harder to get the elements right for both horizontal and vertical using only markdown. But we should have a solution to abstract away the figure numbering and IDing.

Yes markdown tables are very limited so only use them for simple tables. Given my preference not to abstract the numbers , I'd suggest very simple macros in the language templates to allow for easy of translation:

<figure id="{{ figure_id(1) }}" >
...
<figcaption>{{ self.caption(1, "Technical profile of a typical mobile user.") }}</figcaption>

The {{ figure_id(1) }}`` macro may not even be needed but allows to ensure consistency betweenfig-1andfig1` which we've failed to do in the past.

Or, if do want to move away from numeric ids, then need to do this to allow the link, but can see that's already repetitive:

<figure id="technical-profile" >
...
<figcaption>{{ self.caption(1, "technical-profile", "Technical profile of a typical mobile user.") }}</figcaption>

Summary

Figure markup is a dark art that should be abstracted away so that authors can focus on the content.

Agreed!

Numeric figure IDs are brittle and should be replaced with descriptive, human readable IDs.

Disagree.. They will still be brittle as long as we use them in Descriptions and text (which I think we should!). Human readable IDs will require more effort (could reuse image filenames, but not all figures have images) and require repeating to get links working.

Figure numbers should be entirely automated.

Disagree.

We should include more meta resources with figures to connect the results back to the source.

Agreed!

bazzadp on 2 Aug 2020

To clarify, I'm not suggesting any changes to the way the figures appear other than prefixing with the chapter number, eg the figcaption will still say Figure 2.12. When I say "figure ID" I'm referring to the human readable name which we can reuse as the figure's anchor ID, SQL file name, and ARIA attribute plumbing.

rviscomi on 2 Aug 2020

Yeah I get that. And do see benefit of a non-changing anchor id to avoid link rot, but see the following problems:

still have problem of incorrect ids in text
need to avoid name space clash with chapter headings
need to reuse id and number for tables to make quick link work.
SQL file is not unique as reuse same SQL across different figures.
the big figures and tables won’t have images so need to come up with a unique name for them, whereas currently don’t so a bit more work (naming things is hard!)

bazzadp on 2 Aug 2020

Oh and “human readable name” isn’t great for translations. Already a problem for image named admittedly but they often are in English, and aren’t really exposed to readers anyway - like figure ids would be in deep link URLs.

bazzadp on 2 Aug 2020

One thing I may have communicated poorly is that I'm proposing figures' sequential number to be completely taken out of the markdown. Authors shouldn't have to count figures to determine what their number will be and hardcode that number in the document. What doesn't change is the semantic meaning of the figure, eg flexbox-adoption, so this is the best way to make the figures directly addressable.

The problems you described could be solved with templating*. For example, the template could map figure names (flexbox-adoption) to figure numbers (2.12) determined dynamically based on the order of the figures at runtime. If an author wants to direct readers to figure 2.12, they could invoke a template function like {{ figure_ref('flexbox-adoption') }}, which would render a link like this <a href="#fig-flexbox-adoption">Figure 2.12</a>. (Note I added the prefix fig- here to address your point about heading ID conflicts).

_* Maybe not entirely with templating. If a figure is referenced before it's defined, I'm not sure that the template would already have its named mapped to the number. Need to think on this some more, but the compilation step may still be needed to scan through all figures first._

Reusing the same SQL for multiple figures is a real concern, albeit not the common case. I think the templates should assume the common case where the figure name is the same as the SQL file name. If two figures need to map back to the same SQL file, we can support optional kwargs in the template macro. For example, if the flexbox and grid figures were queried in the same file:

{%
  figure(
    # Figure ID corresponding to the `png` file name.
    'flexbox-adoption',
    # Figure caption.
    'Adoption of flexbox.',
    # ...
    # Optional: ID corresponding to the `sql` file name.
    sql_file='flexbox-grid'
  )
%}
# [...]
{%
  figure(
    # Figure ID corresponding to the `png` file name.
    'grid-adoption',
    # Figure caption.
    'Adoption of grid.',
    # ...
    # Optional: ID corresponding to the `sql` file name.
    sql_file='flexbox-grid'
  )
%}

The same could be done for translations, although I don't think this is necessary. For example, we don't translate the page names in the URL, like the chapter title, contributors, or methodology, so untranslated anchor links are not a new concern. If needed, the macro could support a translation-specific localized_id kwarg:

{%
  figure(
    # Figure ID corresponding to the `png` and `sql` file names.
    'flexbox-adoption',
    # Figure caption.
    'Adopción de flexbox.',
    # ...
    # Optional: Localized ID that overrides the figure ID in anchor names.
    localized_id='flexbox-adopción'
  )
%}

(I assume macros support kwargs, but worst case if not we'd use different macros for each scenario)

@bazzadp WDYT?

rviscomi on 2 Aug 2020

Yes if we could solve the way of referencing the figures then I agree the numbering is less important and then there is real merit in getting rid of them completely.

Not sure how to do this technically though but will have a play. May have to be a combination of jsdom code at generation and Jinja templating.

Also like you’re idea of prefixing their if with fig- to avoid name clashes.

Localisation could be solved with the id being set in the markdown as you say.

@bazzadp WDYT

You’re starting to win me round with all these counter arguments! Now just need to think about how to do it! 😀

bazzadp on 2 Aug 2020

👍1

Moving discussion from https://github.com/HTTPArchive/almanac.httparchive.org/pull/1589#issuecomment-735402716.

I would suggest we make the Show description of Figure N.M less prominent. Instead of giving this repetitive phrase a line after each figure, we could just add an icon next to the figure caption for the curious ones. However, if the description is something would would like to show everyone (not just for the sake of accessibility), then it can be expanded by default on large screens.

ibnesayeed on 29 Nov 2020

I would suggest we make the Show description of Figure N.M less prominent. Instead of giving this repetitive phrase a line after each figure, we could just add an icon next to the figure caption for the curious ones. However, if the description is something would would like to show everyone (not just for the sake of accessibility), then it can be expanded by default on large screens.

My vote would be to keep the button as is. The point of this is to make it available for visually impaired readers (not all of whom will be using screen readers with access to aria-describedby which links to this text). In fact in #854 we were asked to make this description more available to everyone which is what led to the button as it appears now. I worry that hiding it under an icon, or under the menu we hope to implement as part of this makes it less accessible.

Happy to hear other's thoughts on this.

bazzadp on 29 Nov 2020

I'd like to explore ways to keep the "Show description" functionality accessible and discoverable while tidying up the UI. I do wonder if it would be a better fit in the figure menu options with the other metadata. I've reread #854 and my understanding is that it was important to make long-form descriptions of figures available to everyone, but the prominence of the show/hide functionality wasn't as critical. I think as long as we make the show/hide functionality available to everyone, the original intent of the feature is preserved.

📟 paging @juliemoynat in case they have opinions on this.

rviscomi on 29 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Home page broken on Safari

bazzadp · 4Comments

Wrong axis labels for Compression charts

rviscomi · 5Comments

Home page contributor count style bug

rviscomi · 5Comments

Add a link to the HTTP Archive logo in footer

MSakamaki · 4Comments

Heavy images

AymenLoukil · 4Comments