Web-stories-wp: Storing Story content in the DB (JSON)

Created on 19 Nov 2019 · 31Comments · Source: google/web-stories-wp

Description

Currently, the idea based on the new PoC is to save the data for the editor in JSON format and separately the content for rendering in the FE, in AMP HTML.

We should look into the different ideas and options to determine which is the preferred and viable way for saving the content and if there are better alternatives for the currently suggested idea.

Note that Gutenberg stores data in the post_content field, including both the blocks' configuration and the HTML to render.

Also consider CPT suggestion from ampproject/amp-wp#3743

Acceptance Criteria

AC: There is a document comparing at least 2 different options for storing content in the DB for editor + the FE.
AC: A suggestion is made based on the findings.

Implementation brief

Modify the REST API controller, so it accepts story_data and saves this to the post_content_filtered field on the WP_Post object.
content_filtered will be registered as an array, so care must be taken to use wp_json_encode and json_decode where possible.
Implementation will tracking the following fields in WP
- Title
- slug ( post name ).
- Status ( Post status ) i.e Published, auto-draft or draft.
- Author
useLoadStory will be changed to load these fields into state.
Add a useSavePost, to change the call to the API hook useAPI and setting the isSaving global state.
An isSaving state will be tracked, so a spinner can be displayed to user and save button disabled while saving.

PoC can be found here.

QA testing instructions

Demo

Changelog entry

Source

miina

Most helpful comment

I just recalled that Custom CSS in the Customizer uses post_content_filtered in core. See wp_update_custom_css_post().

It is using post_content_filtered in a way very analagous to what we are looking to do with Stories here. Namely, it stores any pre-processed CSS (e.g. SASS, LESS) in the post_content_filtered and then the transpiled CSS in the post_content.

Only the post_content is used when obtaining CSS to render on the page, per wp_get_custom_css(). The post_content_filtered field here is only used for _editing_, such as when a plugin like Jetpack adds support for CSS preprocessors. The post_content_filtered value is then displayed in the code editor.

In the same way to this, Stories could be manipulating the underlying JSON data structure stored in post_content_filtered which is then rendered out to AMP HTML in post_content. The post_content would be exclusively used for rendering on the frontend. Edits would be performed on post_content_filtered and then rendered onto post_content; when updating a story, both the post content_filtered and post_content fields would be provided in sync. If the data format changes, then it would be responsible for migrating the data structure in post_content_filtered to be in a format that the editor can use to make future updates to post_content.

westonruter on 11 Dec 2019

👍2

All 31 comments

Some things to consider:

If the HTML is saved into post_content then where would the JSON be saved?
Could there be potential scripts etc. which modify the post_content directly, e.g. perhaps some migration plugins?
How could this influence deprecation logic (the custom one we don't have yet)?

miina on 19 Nov 2019

Are we considering storing JSON in a separate DB field? Or as a big comment within the AMP document in the post_content itself?

dvoytenko on 25 Nov 2019

The original idea in the architecture doc was to have a separate field, however, considering how WP works then ideally there would be a single source of truth -- the post_content.

That's something that needs to be analyzed within this ticket (started by the person who picks it up) to come to a conclusion.

miina on 25 Nov 2019

👍1

There are some hoops to go through to store JSON in post_content without WordPress (or other plugins) corrupting it. This was tackled for implementing the customize_changeset post type for the Customizer. It's also something we've wrestled through for storing JSON data in taxonomy description. All that to say, it can be done successfully, but it doesn't look pretty. For improving how core can facilitate JSON storage, see https://core.trac.wordpress.org/ticket/38715

westonruter on 25 Nov 2019

After some researching, I believe the best course of action is to store the json data in post meta. In WordPress 5.3, see the original ticket for details.

To save meta data to a post object using called register_meta. An example of which looks like this.

    $meta_args = array(
        'auth_callback' => '__return_true',
        'type'          => 'object',
        'show_in_rest'  => array(
            'schema' => array(
                'type'       => 'object',
                'properties' => array(
                    'test_field'  => array(
                        'type' => 'string',
                    ),
                    'pages' => array(
                        'type' => 'array',
                    ),
                ),
            ),
        ),
        'description'   => 'Test meta key',
        'single'        => true,
    );
    register_meta( 'amp_stories', 'story_data', $meta_args );

Using post meta adds some key benefits

Build in security, using the auth_callback, meaning a custom permissions model.
Formatting and sanitization.
No issues with storing json objects in post_content ( see above comment from Weston )
Not public by default.
No new REST API endpoints required.

spacedmonkey on 27 Nov 2019

A downside of post_meta is that the content won't be searchable by default in WordPress. To achieve that, you could always extract the text content to store in post_content. But then the post_content would be purely for search and not used for anything else.

westonruter on 28 Nov 2019

👍1

@westonruter The current idea is to keep JSON in post_meta and the FE output (AMP HTML) based on the JSON in post_content.

miina on 28 Nov 2019

👍2

(Added a new issue for the post_content part of the task since the decision was to keep JSON & AMP HTML separately: ampproject/web-stories-wp#57)

miina on 28 Nov 2019

Was the post_content_filtered column considered to store the JSON blob? IIRC some Markdown plugins do that too.

swissspidy on 28 Nov 2019

👍1

I just checked and it doesn't seem like the REST API supports writing to the post_content_filtered.
Also we would have to do some custom handling and validation of JSON if writing to this field.

spacedmonkey on 28 Nov 2019

Was the post_content_filtered column considered to store the JSON blob? IIRC some Markdown plugins do that too.

A key benefit to using post_content_filtered here is that we would be assured that its value would be guaranteed to match post_content. In other words, updating post_content and post_content_filtered would be an atomic operation since they are fields of the same table and are updated with one SQL UPDATE. In contrast, updates to happen in a subsequent SQL operation. So there is a chance that concurrent changes could result in an inconsistent state.

Also, if using post_content_filtered then it would be trivial to add support for revisions just via:

add_filter( '_wp_post_revision_fields', static function ( $fields, $post ) {
    if ( get_post_type( $post ) === 'amp_story' ) {
        $fields['post_content_filtered'] = __( 'Story data', 'amp' );
    }
    return $fields;
}, 10, 2 );

I just checked and it doesn't seem like the REST API supports writing to the post_content_filtered.

This could be achieved by overriding WP_REST_Posts_Controller::update_item() for that post type. The rest_pre_insert_{$post_type} filter may also used.

Aside: In either case, we'll need to make sure the data is persisted when doing import/export WordPress importer.

Also we would have to do some custom handling and validation of JSON if writing to this field.

I think this will be needed regardless for the post_content field. We don't want plugins corrupting the valid AMP HTML in the post_content, so the same logic that is used to bypass any plugin mutations for that field could also be done to preserve pristine JSON in post_content_filtered.

All this to say, though it will require some more code to support in the REST API, I think storing the story JSON in post_content_filtered will be a better choice than storing it in post meta.

westonruter on 3 Dec 2019

Why again was it that the JSON and the HTML representations need to _both_ be stored? If the post_content is AMP HTML, why not just store JSON in the first place and render to HTML at runtime? Storing a separate representation in post_content could make sense for searchability, but all of the keywords would be in the JSON as well so I don't know if there is a difference in that regard?

westonruter on 3 Dec 2019

I think searchability is an important need. But also to avoid dealing with out-dated models in the runtime when plugin is updated. E.g. the content is not regenerated until the editor is saved again. The rendering update without editor saving could be a good feature, but that was the initial tradeoff.

dvoytenko on 3 Dec 2019

So the benefit here is that the data migration logic for new versions only needs to be written in JS and not PHP? Outdated models would still need migration logic (deprecation logic in Gutenberg terminology) in at least JS in order to ensure old stories can still be edited. Right?

westonruter on 3 Dec 2019

Outdated models would still need migration logic (deprecation logic in Gutenberg terminology) in at least JS in order to ensure old stories can still be edited.

Additionally, it ensures that the previously existing stories aren't changing unexpectedly in the FE even if something has changed in the JSON.
Having the static output enables also easier porting to other environments without being dependent on the dynamic rendering.

It's true that we should make sure that the two fields are in sync (as in Gutenberg the block validation logic is checking constantly that the output of saving logic and what's already saved into post_content are in sync).

miina on 3 Dec 2019

We can generate revisions on post meta update as well, I referred this article in https://github.com/ampproject/amp-wp/issues/3786 . It is simple enough to filter to add revisions support to post meta.

I think that the worries about concurrent are a little unfounded. I have used post meta for many things over the years and never had an issue. With in the REST API call, the updates to the two different rows happen in the same request. The likelihood of a breakage is small, as the request would have to fail mid way through. I find it hard to believe that one site would have 100s of stories being created a second to make this issue noticeable.

There is a much higher likely hood that post_content_filtered is be used by a plugin than a meta key we define.

spacedmonkey on 3 Dec 2019

Still, it seems more logical to store this _content_ in wp_posts rather than postmeta.

There is a much higher likely hood that post_content_filtered is be used by a plugin than a meta key we define.

Per ampproject/amp-wp#3788 we should remove or bypass all filters in any case, whether content is being stored in wp_posts or wp_postmeta.

westonruter on 5 Dec 2019

Still, it seems more logical to store this content in wp_posts rather than postmeta.

There are a number of serious downsides to using post_content_filtered. Lack api support being one of the biggest ones for me. It is also a miss use of post_content_filtered, as it is not post content, it is structured data used to generate a AMP markup.

Are there other benefits that 'it keeps everything with in one database row'?

Per ampproject/amp-wp#3788 we should remove or bypass all filters in any case, whether content is being stored in wp_posts or wp_postmeta.

I am not sure about removing filters on the_content. srcset and the upcoming lazy loading elements are added with filters. Features like this are a benefit to what we are doing here.

spacedmonkey on 5 Dec 2019

I am not sure about removing filters on the_content. srcset and the upcoming lazy loading elements are added with filters. Features like this are a benefit to what we are doing here.

Good point. I misspoke. The actual suggestion is to remove all the_content filters when rendering the story post type _other than_ the ones we know we are using.

There are a number of serious downsides to using post_content_filtered. Lack api support being one of the biggest ones for me.

It's not hard to add support for it though, is it?

It is also a miss use of post_content_filtered, as it is not post content, it is structured data used to generate a AMP markup.

I don't think there is a clear definition of what post_content_filtered is or is not supposed to be used for. @swissspidy recalled that it is being used to store the underlying Markdown representation by some Markdown plugins. This seemed like a good precedent for storing an underlying JSON representation for AMP Story content.

westonruter on 6 Dec 2019

So the benefit here is that the data migration logic for new versions only needs to be written in JS and not PHP? Outdated models would still need migration logic (deprecation logic in Gutenberg terminology) in at least JS in order to ensure old stories can still be edited. Right?

@westonruter to note, the migration/deprecation logic is being discussed in ampproject/web-stories-wp#76.

dvoytenko on 6 Dec 2019

I just recalled that Custom CSS in the Customizer uses post_content_filtered in core. See wp_update_custom_css_post().

westonruter on 11 Dec 2019

👍2

Okay, I still do believe this is the best place for this data, but I will update my PoC with REST API changes.

spacedmonkey on 11 Dec 2019

I have updated the IB and PoC.
This PR now has to extend the REST API to enable saving of the post_content_filtered. So will PHP and JS devs to review the PR.

spacedmonkey on 13 Dec 2019

@barklund Worth noting about the PoC here does add some new states. This was required to test the PoC. I in one way thought that this would be final used statement management in the application. I would say we merge my PR here, just so we can have the stories saving and we can change global statement in the ampproject/web-stories-wp#37 where we can discuss and design a data architecture before writing any code.

So will PHP and JS devs to review the PR.

@westonruter This is a little unclear, but that, I mean look at, not give a code review. PoC is just that, hacking some code together to see if the idea works. This process is designed to keep on discussions about how things are built within tickets. I have already made lots of changes in the PoC that make the IB here out of date and unless. To keep this manageable for all of us, let's keep big discussions code is written on the IBR and PR for code formatting and styling.

Should we rename all the fields in the REST API to map to fields in the javascript. What is the benefit of this? Doesn't it just hide the WP implementation. Why is this needed?

spacedmonkey on 16 Dec 2019

@spacedmonkey On naming: note that this implementation will never be part of core where the field might be used for different purposes and perhaps require a more general name. This is specifically for using the post_content_filtered field for storing our story data in the Plugin. For making the usage of this field more obvious then let's use a more explanatory (semantic) name as well (such as story_data).

The FE app doesn't need to be aware of the DB field name, the PHP API implementation can do the mapping, the same way as it's already done in REST API for slug -> post_name where also the field name and then DB field do not match for semantic reasons.

Also, could we do the following and add a note to IB:
post_content and post_content_filtered should never be updated separately, meaning that if there is data only for post_content_filtered, we shouldn't update. Otherwise, the two fields would get out of sync and what's in the editor wouldn't match with what's displayed in the FE.
Alternatively, since currently, post_content implementation is not included yet then I could also add this within ampproject/web-stories-wp#57 instead if you prefer?

miina on 16 Dec 2019

Okay, seems like I am alone in field naming thing. I have renamed the field and updated on the PoC.

spacedmonkey on 16 Dec 2019

Relevant to the discussion here: I think we've discussed this before, but we should try to come up with a data structure for the story itself that is largely independent from WP and editor idiosyncrasies. This is important so that we can re-use templates between story creation tools, and maybe even provide interop between tools. Should I open a separate issue for that?

pbakaus on 19 Dec 2019

When we are building this, we are keeping in mind that this editor will be used with other platforms. But in these early stages we will maybe cut some corners just to get things working. A think it is another piece of work to make it completely portable. I do recommend creating an issue for this, so we can keep it mind. Maybe when we do cut these corner, we can note it on the ticket and we can loop around it on it later.

spacedmonkey on 19 Dec 2019

@spacedmonkey makes sense! I've created another issue here: https://github.com/ampproject/amp-wp/issues/3973

pbakaus on 19 Dec 2019

can this issue be closed?

pbakaus on 13 Jan 2020

Yes, the tasks for this specific issue have been merged.

miina on 14 Jan 2020

Was this page helpful?

0 / 5 - 0 ratings