Website: Remove the use of the capture shortcode (or: how to prepare for Goldmark)

Created on 5 May 2020  路  19Comments  路  Source: kubernetes/website

@zacharysarah asked me to do some investigative work figuring out if it would be possible/how hard to get this site up and running with Goldmark/latest Hugo (i.e. CommonMark compliant).

A lot of work on this has already been done in #19907.

I diffed a sub-set (English content, no blog) using stripped down templates to make it easier to see some patterns. A very common pattern was HTML blocks treated as Markdown, so I created this https://github.com/yuin/goldmark/issues/125 -- which is closed (I agree about that).

So, to use HTML blocks in Markdown and still be CommonMark compliant, we would either have to:

  • Remove all blank newlines in these blocks.
  • Or wrap it in a {{< wrap >}} shortcode or something, so Goldmark never sees it.

Both of the above could probably be easily scripted (esp. the last part), had it now been for the indirection of the {{% capture %}} shortcode, which _produces Markdown_ with template includes etc.

Of the 365 different files (out of 667 samples), 226 of them use the {{% capture %}} shortcode.

A simple example:

{{% capture prerequisites %}}
## Hello
{{< version-check >}}
{{% /capture %}}

The inner {{< version-check >}} uses {{< which marks it as (usually) raw HTML. But since it's included in a {{% block, its content will be passed to Goldmark.

A long story short, to solve this issue I think we either need to

  1. Write a parser extension to Goldmark that treats HTML blocks differently. I'm not sure how easy this is; "easy parsing" was probably the motivation for the current very simple rule.
  2. Remove the use of the capture shortcode so we can get better control on this from the Markdown file itself.

The {{% capture %}} templating was ported by me from the old Jekyll site. I suspect its biggest value is that it sets a common structure for disposition of guides and avoids some repetition. But:

  • It makes the Markdown very unportable, even when browsing on GitHub.
  • It slows down the build significantly
  • A common structure could probably be achieved using Hugo Archetypes
  • This construction is no longer needed to make shortcodes be part of the ToC (as long as you use {{% brackets, they will be part of the ToC).

I think it should be possible to temporarily patch Hugo to make it do a very shallow render that expands this capture one level and writes the files to disk. It would be a big change set, but it would be a correct one.

Most helpful comment

I would recommend you do:

## {{% heading-whatsnext %}}

The % makes sure that Goldmark sees the expanded content (for TableOfContents etc.)

All 19 comments

A side effect of the {{% capture ... %}} shortcodes is that it adds high level structure and helps reviewers see what is present / missing in a PR.

I'm OK with losing that. Instead, that kind of check is more of a job for a linter or similar automated process. Archetypes also sound like a helpful approach.

We could still use a shortcode for the section titles? Eg

## {{< heading-whatsnext >}}

Example future lints:

  • if you use the version-check shortcode, warn when min-kubernetes-server-version isn't set
  • if you create a task, warn if it doesn't have a prerequisites section
  • if you create a page that ends with an unordered list where each list item contains a hyperlink, warn if that part of the page isn't headed with {{< heading-whatsnext >}}

@sftim you would still be able to use all the shortcodes you want, it's just that very special capture magic.

There could be a possibility to change these capture blocks with "something else", e.g. separate them into different Markdown-files, e.g. prerequisites.md and whats-next.md which server the same purpose, but without the troubles.

temporarily patch Hugo to make it do a very shallow render that expands this capture one level and writes the files to disk

Sounds good to me.

@bep, Are you referring to:
blocks.html and block.html? If so, seems like a good thing.
This construction is no longer needed to make shortcodes be part of the ToC (as long as you use {{% brackets, they will be part of the ToC).

I don't fully understand what very shallow render that expands this capture one level?
Can you be more explicit?

I think it should be possible to temporarily patch Hugo to make it do a very shallow render that expands this capture one level and writes the files to disk. It would be a big change set, but it would be a correct one.

@kbhawkey not sure what you mean by block.html -- but my re. my ToC reference. Back when I ported this site from Jekyll to Hugo, the shortcodes vs Markdown behaved a little differently (mostly because how it was implemented).

So, if you had Markdown that looked linke this:

## A title

{{% some-shortcode %}}
### Another title
{{% /some-shortcode %}}

In "earlier days" the "Another title" above would not be part of the TableOfContents produces unless you did some trickery.

I think it should be possible to temporarily patch Hugo

That would be something that I would help out with and would be a one time hack that would make this task practical (it would be a gigantic job if done manually) and, if done correctly, guaranteed to produce the same output as you have today. In short it will inline all the Markdown, but keep the shortcodes as-is.

So, use code from Hugo to generate what we think the files should have looked like in the first place (in an ideal world). Does that summarize that part of your proposal @bep?

@sftim I guess so, but it depends a little on how much you like the current capture setup.

I admit that I don't fully understand the details of the proposal (need to think on this a bit more).
The capture statements define the structure (required) for a given class of page (concept, task, tutorial).
For instance, concept template,

{{ partial "templates/block" (dict "page" .page "block" "overview" "purpose" "states, in one or two sentences, the purpose of this document") }}

{{ .ctx.Scratch.Set "blocks" slice }}
{{ .ctx.Scratch.Add "blocks" (dict "page" .page "block" "body" "purpose" "supplies the body of the page content.")  }}
{{ .ctx.Scratch.Add "blocks" (dict "content" .page.Content) }}
{{ .ctx.Scratch.Add "blocks" (dict "page" .page "block" "whatsnext" "heading" (i18n "whatsnext_heading") "optional" true )  }}
{{ partial "templates/blocks" . }}

Without knowing more details, I typically favor non-custom solutions.
Do we need the capture statements or what is the alternative to achieve similar templating?

As mentioned above, we could remove the requirement of capture statements and instead provide templates for some of the page-specific H2 level headings.
The inline, page TOC (H2 level headings) goes away with the docsy theme (becomes right hand column TOC).
So as not to lose some form of page structure, add linting of the required page headings (capture statements do this already). Would a render hook template work here or can the error be caught earlier in the process?
Linting: Check per page template type. If a given H2 level heading is not
found on the page, fail the build and show how to fix the problem (more than the current capture statement error handling).
For example, possible _required_ H2 level headings:

  • Task pages: prerequisites
  • Concept pages: overview, what's next
  • Tutorial pages: objective and prerequisites
  • Tool/reference pages: synopsis, options

Combining a few ideas, the linter could check for:

## {{< heading-whatsnext >}}

rather than

## What's next

and this helps with localization.

I would recommend you do:

## {{% heading-whatsnext %}}

The % makes sure that Goldmark sees the expanded content (for TableOfContents etc.)

I also like the idea of using render hooks to catch problems. I don't know if it's a good idea / there's a better way.

Archetypes could help with new content creation.
The current page templates provide consistency/structure/maintainability.
Here are some more ideas:

  • Checking for specific H2 level headings does not cover the cases of enforcing structure where previously there was no heading generated (post parse check).
  • Option: Hugo/parser validates the set of ordered headings/strings (add shortcodes to page) and wordCount() between page sections. Cleans up/removes headings that should not be displayed.
  • Option: Move the validation and cleanup of "structure" headings to a post processing step.
    Validate set of ordered page headings (+ wordCount()) and remove (or replace as comments) headings that should not be displayed. Depends on page template configuration (heading1: structure, heading2: display, ...).

@bep, Any thoughts?

I've seen a few issues (see #20942) that look hard to solve without either a significant rewrite _or_ removal of the problematic capture shortcode.

@kbhawkey as to validating the structure without capture I suggest you enforce this using header IDs, and do that in a language-agnostic way, e.g.:

### What's next {#whats-next}

There are some ongoing work creating a render hook for headings in Hugo, see https://github.com/gohugoio/hugo/pull/7133#issuecomment-628638620 -- I have had a little problem seeing "if it's worth it", but I can certainly see a value here. It would be an easy way to validate content structure while building.

I have been a little busy with other work lately, but I'll cc @zacharysarah on this issue; I still think my initial post sums it up fairly good.

I'm finding that the parser will not add an H2 heading from a shortcode
into the TOC.

Here is an example shortcode that gets called as:
{{% myshortcode_heading %}}

{{- $id := T "myshortcode_heading" -}}

<h2 id="{{ $id | anchorize }}">{{- $id  -}}</h2>

<!-- This works standalone -->
<!-- {{- T "myshortcode_heading" -}} -->

Instead, this works (when the shortcode just returns a string):
## {{% myshortcode_heading %}}

I'd rather have the entire statement wrapped in the shortcode so that authors can omit the
opening ##.

PR #21359 is working towards fixing this

Fixed in #21359
/close

@sftim: Closing this issue.

In response to this:

Fixed in #21359
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Was this page helpful?
0 / 5 - 0 ratings