This issue sets up discussion started during a Core Editor chat for the functionality of a Table of Contents (TOC) Block. Currently, there are several PRs/Issues that provide possible solutions.
Add Table of Contents block (dynamic rendering + hooks version) PR https://github.com/WordPress/gutenberg/pull/21234
"Table of Contents" Block PR https://github.com/WordPress/gutenberg/issues/11047
https://github.com/WordPress/gutenberg/pull/15426 (Closed) PR https://github.com/WordPress/gutenberg/pull/15426
From a technical point, when working with a TOC block, how are items that aren't in blocks like headings and next page tags counted, and how is it determined if those items precede the current block? Counting Heading blocks is relatively easy, but counting all headings in the HTML is more difficult, and counting all headings in the HTML preceding the current block seems impossible in some situations. This challenge is compounded when considering if the headings are in a dynamic block.
Resolving these questions impacts:
Specific challenges that need feedback are:
Possible solutions include:
@ZebulanStanphill @mtias @youknowriad @MichaelArestad contributed to the original conversation. Additional feedback here is welcome.
If we create some kind of document outline API, we should probably include page break (<!--nextpage-->) data in it, so you can easily determine what page a block would appear on. That's one of the challenges I've run into with the Table of Contents block PR.
I'm looking forward to have it live.
Should the TOC block only support Heading Blocks?
I strongly recommend starting with just Heading blocks. This greatly simplifies things both product and implementation, and removes hurdles to getting started. The question of whether there is an opportunity for supporting other blocks (perhaps via an API at the level of the block _type_ or of the block _proper_) or for supporting HTML-level indexing of heading tags (in my opinion, something to avoid) can then be explored separately and on top of a finished base.
Something like allowing a block to declare one of its attributes as contributing to the outline of a document and abstracting away outline/table of contents in a
getOutlinerather than just gettingcore/heading.
There are many parallels with the optional _HTML anchor_ feature in core blocks. Recently, #23197 extended this feature to all static core blocks, and it's notable how everything hinges on block types adhering to the feature with a simple supports declaration. One can imagine something similar with ToC:
"supports": {
"tableOfContents": true
}
Any block type declaring the above would be picked up by a ToC hook. This could then mean that such blocks automatically sport a control to include it in the ToC, or could mean a more subtle experience (e.g. adding an HTML anchor to a block that has tableOfContents: true automatically adds the block to the ToC).
@mcsf There's a bit of a problem with "just supporting Heading blocks" in the case of the Table of Contents block. That's easy to do in the editor, but on the front-end, it's a lot more difficult because the JS APIs are not available there. There's no awareness of blocks in the PHP file dynamically rendering the front-end output. So the front-end implementation ends up having to parse HTML, which results in inconsistency between it and the editor implementation.
The Table of Contents block also needs to support paginated posts properly, and this also currently has to be done two different ways depending on if you're in the editor or the front-end.
Right now, the Table of Contents block works perfectly on the front-end, but relies entirely on HTML parsing (which definitely isn't a performant way to handle it). I can't even change the PHP implementation to only work with core Heading blocks, because there's no concept of blocks anymore at that point. The only way to get the necessary data would be through something kinda like the block context system, and no such API relating to headings and page breaks currently exists.
So as far as I can tell, it's not possible to provide a shippable Table of Contents block right now. There is no clean, simple solution, because what the block tries to do requires data that is currently only available by creating temporary clones of the post in memory to parse and scan for specific HTML tags and comment strings.
As far as I can tell, the Table of Contents block _needs_ a table of contents API.
Specifically, here's what the Table of Contents block needs to know in both the editor and the front-end:
To provide this data, Heading blocks will likely have to provide this data to the API:
Page Break blocks will likely have to tell the API that they mark the start of a new page, and therefore all blocks following them should be considered to be on page 2 (or 3, and so on).
All of the data requirements I have just listed are absolutely necessary to make the Table of Contents block work. If any one of these is not provided by some sort of API, then the block has to resort to messy HTML parsing.
(Remember, you can't just provide a list of Heading block clientIds to the API, because the blocks no longer exist at the dynamic rendering stage, so you can't just pull their data during PHP rendering.)
That's easy to do in the editor, but on the front-end, it's a lot more difficult because the JS APIs are not available there. There's no awareness of blocks in the PHP file dynamically rendering the front-end output. So the front-end implementation ends up having to parse HTML, which results in inconsistency between it and the editor implementation.
I don't follow; why is the ToC back end not consuming the output of the PHP block parser? Even if the server can't parse as fully as the block editor (_stage I_ is block demarcation and explicit attribute parsing; _stage II_ is full attribute sourcing, validation, migration, and is JS-only), there should be enough to get us started, and it will be much faster and safer than _ad-hoc_ parsing of HTML.
Things like pagination support are not necessarily trivial, but would fall into place as soon as we can use the proper parser on the server to clearly identify — always relying on blocks, not HTML — what is a heading, what is a page boundary, and what else is _heading-like_.
What page will I be on in the front-end? (Necessary to support only showing headings from the current page.)
This might be something that the (environment-agnostic) block context API nicely solves.
Hmm... I'd forgotten about the PHP block parser. Thanks for reminding me. You're right that I could use that on the PHP implementation. I'm currently not using it because my current implementation is still trying to support 3rd party heading blocks. If I switch to sourcing the data from block attributes, I have to drop support for all headings outside of the core Heading block.
It's also worth noting that even headings in our own Custom HTML block will be ignored by a Table of Contents implementation that only checks Heading block attributes. My thinking was that if we had a table of contents API, we could at least update the Custom HTML block to provide data to the API so they would work as expected.
Would a Table of Contents block that only supports core Heading and Next Page blocks be acceptable? It feels kind of wrong to ship it without 3rd party block support. But if desired, I can update my PR to work that way.
Still, though, it seems less than ideal to parse the whole post for block data whenever it encounters a Table of Contents block.
Also, I'm not certain that post pagination info can be provided through the block context API. If a whole post is considered a single source of data, how can it provide different answers to "what page am I on?"... it seems like you'd have to use "Page" blocks to divide up the post, rather than marker points like the current Next Page block. But maybe the block context API is more powerful than I think?
Most helpful comment
I'm looking forward to have it live.