Gutenberg: Data storage mechanisms for blocks

Created on 10 Apr 2017  ยท  16Comments  ยท  Source: WordPress/gutenberg

I've been thinking about the best way(s) to handle parsing and data storage, which I think will end up being pretty closely related. This is important for us to get right so that we build a solid foundation for other plugins.

I'll start with one of the things that annoys me about WordPress: settings for shortcodes like [gallery]. We see the list of IDs in post_content, but what about the other settings? A lot of them are missing, stored along with the attachment post, and I think there are a lot of cases where settings are stored in post meta as well. For a concrete example, if you want to use the same image on multiple galleries, with different captions, AFAIK this is impossible because the caption is stored as a property of the image only.

I don't know all the permutations of how this works currently, but it's a mess and I shouldn't have to care. I should be able to take whatever appears in post_content for a gallery block, paste it to another post or another location in the same post, and everything should "just work".

I'd really like to avoid repeating this set of mistakes from "classic" WordPress. If we're storing data in post_content, let's store all the data in post_content. If we can also ensure that we serialize block information to semantically-meaningful markup, this will have a lot of other benefits:

  • We get most of the functionality needed for server-side rendering without any extra effort
  • Importing and exporting content between posts/sites is much easier
  • We don't have to worry about the link between blocks and their settings; this would be difficult to manage in cases like duplicating blocks or rearranging their order
  • We avoid changing "global" state outside of a post content which can have unintended effects
[Feature] Block API [Feature] Blocks

Most helpful comment

While I've decided to shelve this idea for now, I'll recap an exploration I'd considered yesterday involving using linked data (specifically JSON-LD) as the source of truth data storage mechanism of a block. Instead of parsing the HTML of a block, we'd understand it by its JSON data:

<!-- wp:core/image -->
<script type="application/ld+json">
{
    "@context": "http://schema.org",
    "@type": "ImageObject",
    "caption": "Beautiful landscape",
    "contentUrl": "https://cldup.com/vuGcj2VB8M.jpg"
}
</script>
<figure>
    <img src="https://cldup.com/vuGcj2VB8M.jpg" />
    <figcaption>Beautiful landscape</figcaption>
</figure>
<!-- /wp:core/image -->

(See ImageObject schema)

And, as we have already with the block's save method, we'd be able to generate the editor and saved markup from this data alone, shown here with the figure element generated subsequent the JSON.

There are a few benefits of this approach:

  • Data is immediately available in JSON form, no need for markup parsing
  • Search engines and social media not only understand it, they _love_ it, as it's a solution to their need to better understand the content of a page (Google's introduction to the topic)

    • It could even be framed as a "betterment of the web" karma treasure-trove

    • Benefits site owner by enabling Google et. al to display rich snippets for their search result

  • We could register API endpoints for a block as being their context definition. Larger plugin authors could even host these endpoints on their own domain so a block could be made sense of even if the plugin had since been deactivated from the original site.

And of course some downsides:

  • Standard vocabularies like schema.org can satisfy some but not all of the needs of describing object types. This is not necessarily a problem in and of itself since it's fine enough to extend them with custom contexts, but...
  • Sifting through schema type references is not the most pleasant experience from a developer's perspective
  • If JSON-LD is the source of truth, how do we represent embedded elements of a text type. In the image example above, how would I apply a bold or italic effect to the caption, or include a link?
  • Most importantly, some block markup just doesn't make sense as JSON-LD. The text block is most obvious. It'd be silly to duplicate this data. Inevitably this means we'd have to parse the block's markup regardless, which puts us right back where we're at currently.

A few more interesting references I found along the way:

  • Avoiding duplication by integrating JSON-LD with Web Components
  • Microdata could solve some of these problems by embedding the data definitions in the markup (but JSON-LD has gained significant traction and is most recommended by large interests)
  • Cartoon video introductions to Linked Data and JSON-LD are great to clarify some of the parallels I drew to content blocks

All 16 comments

I'll start with one of the things that annoys me about WordPress: settings for shortcodes like [gallery].
I'd really like to avoid repeating this set of mistakes from "classic" WordPress. If we're storing data in post_content, let's store all the data in post_content.

You seem to be mixing apples and oranges here. First, the gallery shortcode is not a block. It's a shortcode with a specific purpose. If the gallery block that is being created is meant to replace the shortcode, it would need a different implementation. From what I can see, the gallery block is _not_ a replacement of the shortcode, but a block that contains all the images and captions within it. It is self-contained and does not get information from outside to render it. That sounds to me like a static block, not a dynamic one.

@nylen I understand where you're coming from, but there are some big advantages also to being able to reference data stored externally to the post content. For one, it keeps it DRY. Imagine uploading an image to the gallery and you give it a caption but there is a typo. If you create multiple galleries with that image, it would be sad to have to update each gallery with the fixed caption. Ideally you would be able to update it once, and that is what we have today.

If the gallery block that is being created is meant to replace the shortcode

Yes, this is the (unstated) assumption I was making in the original issue - that in general, the Gutenberg editor will be able to transparently "upgrade" post content from shortcodes to <!-- wp:core/blockname -->.

Imagine uploading an image to the gallery and you give it a caption but there is a typo. If you create multiple galleries with that image, it would be sad to have to update each gallery with the fixed caption.

Maybe a gallery isn't the best example to use here, and we'll end up sticking with more of the way WP currently works for this particular case. In particular I'd like to avoid storing information in post meta, rather providing a standard way for blocks to serialize and deserialize the data they need to/from markup inside post_content.

@westonruter on the other hand, that prevents you from having different captions for the same image in different galleries, which is different from how single images work.

@mtias that's right. So it should be a both/and instead of an either/or, it seems.

Yes, I'm thinking reusable dynamic blocks needs to be handled differently and probably with an explicit flow.

Yes, I'm thinking reusable dynamic blocks needs to be handled differently and probably with an explicit flow.

Sounds like widgets.

Widgets are not really reusable, they are instances.

Widgets are not really reusable, they are instances.

How do you mean?

If I have, say, a text widget in a sidebar I can't reuse it in another place without creating a new instance by hand.

Given the current widget _sidebars_ UI, that's true. But widgets can be used independently of sidebars. For example, there is also the_widget() which allows you to use and re-use a widget instance outside of widget areas. I believe the widget (as opposed to the sidebar) is the closest thing (โ€œprior artโ€) that core has to dynamic blocks that we'll need (display, form UI, state manipulation), where shortcodes fall _short_ in that they are only concerned with one aspect: the display. Sidebars and widgets certainly need a lot of work, including in the UI, data model, and code architecture, but I think they serve as a starting point for iteration and a migration path for where we could go with dynamic blocks as the next step in the widget's evolutionary journey.

Besides which, the original example of using different captions on the same image is exactly what the widget does by each instance storing its own version of how to display its data. Again, if you want reuseable "same data", you would use a shortcode with the same parameters. If you want reuseable "customized", you would use a widget. And if you just want the data in the post with nothing dynamic, then it's just plain old content inserted like usual.

While I've decided to shelve this idea for now, I'll recap an exploration I'd considered yesterday involving using linked data (specifically JSON-LD) as the source of truth data storage mechanism of a block. Instead of parsing the HTML of a block, we'd understand it by its JSON data:

<!-- wp:core/image -->
<script type="application/ld+json">
{
    "@context": "http://schema.org",
    "@type": "ImageObject",
    "caption": "Beautiful landscape",
    "contentUrl": "https://cldup.com/vuGcj2VB8M.jpg"
}
</script>
<figure>
    <img src="https://cldup.com/vuGcj2VB8M.jpg" />
    <figcaption>Beautiful landscape</figcaption>
</figure>
<!-- /wp:core/image -->

(See ImageObject schema)

And, as we have already with the block's save method, we'd be able to generate the editor and saved markup from this data alone, shown here with the figure element generated subsequent the JSON.

There are a few benefits of this approach:

  • Data is immediately available in JSON form, no need for markup parsing
  • Search engines and social media not only understand it, they _love_ it, as it's a solution to their need to better understand the content of a page (Google's introduction to the topic)

    • It could even be framed as a "betterment of the web" karma treasure-trove

    • Benefits site owner by enabling Google et. al to display rich snippets for their search result

  • We could register API endpoints for a block as being their context definition. Larger plugin authors could even host these endpoints on their own domain so a block could be made sense of even if the plugin had since been deactivated from the original site.

And of course some downsides:

  • Standard vocabularies like schema.org can satisfy some but not all of the needs of describing object types. This is not necessarily a problem in and of itself since it's fine enough to extend them with custom contexts, but...
  • Sifting through schema type references is not the most pleasant experience from a developer's perspective
  • If JSON-LD is the source of truth, how do we represent embedded elements of a text type. In the image example above, how would I apply a bold or italic effect to the caption, or include a link?
  • Most importantly, some block markup just doesn't make sense as JSON-LD. The text block is most obvious. It'd be silly to duplicate this data. Inevitably this means we'd have to parse the block's markup regardless, which puts us right back where we're at currently.

A few more interesting references I found along the way:

  • Avoiding duplication by integrating JSON-LD with Web Components
  • Microdata could solve some of these problems by embedding the data definitions in the markup (but JSON-LD has gained significant traction and is most recommended by large interests)
  • Cartoon video introductions to Linked Data and JSON-LD are great to clarify some of the parallels I drew to content blocks

To me the major downside of using something like JSON-LD is that we would be duplicating information between the markup and the annotations. This would unavoidably lead to unexpected results and the two sources of information getting out of sync.

See also #886 and its discussion of defining a schema with the server-side block registration.

Closing; this discussion has served its purpose and further tasks should be split out into separate, smaller issues.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

moorscode picture moorscode  ยท  3Comments

hedgefield picture hedgefield  ยท  3Comments

JohnPixle picture JohnPixle  ยท  3Comments

youknowriad picture youknowriad  ยท  3Comments

franz-josef-kaiser picture franz-josef-kaiser  ยท  3Comments