Ckeditor5: Introduce a "safer" marker conversion

Created on 6 Jul 2020 · 9Comments · Source: ckeditor/ckeditor5

📝 Provide a description of the improvement

🛠 For migration steps see this comment. 🛠

Currently, marker conversion works in a way that a view element (HTML tag) is created at positions where a marker starts and where a marker ends. This solution creates an incorrect HTML (for example, a custom element is inserted in <tr> which accepts only <td>s and <th>s). This leads to two problems:

Data sanitization.
Data parsing from HTML.

Data sanitization becomes problematic because various libraries that perform it may remove the custom tags that are in incorrect places. Similarly, data parsing may lead to errors.

Thus, I propose changing how we would convert markers. If a marker boundary is before or after a view container element, instead of creating "marker tag", we should add an attribute to that element. The attribute will mark that a marker starts/ends after/before that element.

Some other remarks:

Proposed attribute names for view container elements: data-marker-start, data-marker-end.
Those attributes will simply contain a marker name. If there will be multiple, we will separate them with a comma.
I think that we should introduce those new converters as default ones. So that when you are creating a new feature that uses a marker you will not need to provide converters. This means that we need also a default converter for "marker tags". At the moment, I don't really understand why we decided to go with <comment id="..."> instead of unifying it, for example: <ck-marker data-name="...">. I propose that when used with default marker conversion, markers will simply use their names and will be converted to <ck-marker data-name="markerName">.
I wonder whether we should mark already existing marker conversion helpers as deprecated. I think that there should be no need to use anything else other than a marker name to save the marker with the data.
Of course, we will be backward compatible. Already existing features should remain as they are.

BTW. we also discussed using HTML comments instead of regular tags to show marker boundaries but we decided that data sanitizers might remove them as well.

engine improvement

Source

scofalik

Most helpful comment

Instruction for anyone who is processing the editor data.

First, please get familiar with the description of marker-to-data downcast, where we described what kind of HTML may be generated.

If you processed suggestions or comments you have two possibilities:

change your current solution,
apply another layer of processing, which will change the new HTML into the old HTML, and use your current solution.

Below are notes for comments. Suggestions are similar.

Earlier, you have been looking only for <comment> tags, in which id attribute kept the comment id (and sometimes "count" part, which was recently added with multi-cell comments). Also, there was type attribute to know whether the tag points to the beginning or to the end of the marker (however, start tag was always before end tag, so there might be no need to check type attribute).

Also, collapsed markers did not have closing tag, however, in case of comments or suggestions, collapsed markers did not exist (collapsed comment or suggestion was removed).

Now, type attribute got "included" in the tag name, so we have <comment-start> tag and <comment-end> tag. Also, collapsed markers have both tags now. Other than that, id attribute is no longer available, instead, whole marker name (except of comment: prefix) is set in name attribute. For comments, id and marker name were similar, for suggestions the suggestion marker name is more complex.

Earlier: <comment id="commentId:count" type="start">
Now: <comment-start name="commentId:count">

For suggestions:

Earlier: <suggestion id="e8ghd7:e390dk" suggestion-type="formatInline:i4u3xm" type="end" count="3erh">
Now: <suggestion-end name="formatInline:i4u3xm:e8ghd7:e390dk:3erh">

Since marker name for suggestions is complex, below is a function that breaks down marker name for suggestions into meaningful parts:

// `name` is the value from the name attribute.
parseSuggestionName( name ) {
    // Name template is `<type>[:<subType>]:<id>:<authorId>[:<count>]`.
    const parts = name.split( ':' );

    return {
        type: parts[ 0 ], // "insertion", "formatInline", etc.
        subType: parts.length >= 4 ? parts[ 1 ] : null, // hash or null
        id: parts.length < 4 ? parts[ 1 ] : parts[ 2 ], // unique id
        authorId: parts.length < 4 ? parts[ 2 ] : parts[ 3 ], // user id
        count: parts.length == 5 ? parts[ 4 ] : null // additional unique id
    };
}

Again, marker count is an additional unique id for multi-range suggestions. If count is in the suggestion name, a subType is present as well. The opposite is not true.

Keep in mind that the marker name in the editor also contains "group prefix" (for example, marker name for comment would be comment:ek45nd:ep0v not just ek45nd:ep0v.

If this feels too complex, you can always get ids of existing suggestions from the editor and match them with parsed marker names. See track changes API and comments API. This might be even a more future-proof solution.

Other than new tag names, we also introduced marker attributes. These are used if a marker boundary was in a place where text nodes are disallowed. Quoting from the previously linked API docs, we have:

data-[group]-start-before="[name]", for example: data-comment-start-before="commentId",
data-[group]-start-after="[name]", for example: data-comment-start-after="commentId",
data-[group]-end-before="[name]", for example: data-comment-end-before="commentId",
data-[group]-end-after="[name]", for example: data-comment-end-after="commentd".

For example, the following HTML in the new format:

<td data-suggestion-end-after="formatInline:i4u3xm:e8ghd7:e390dk:3erh" data-suggestion-start-before="formatInline:i4u3xm:e8ghd7:e390dk:3erh">Foo</td>

Is equivalent to the following HTML in the old format:

<suggestion id="e8ghd7:e390dk" suggestion-type="formatInline:i4u3xm" type="start" count="3erh"></suggestion>
<td>Foo</td>
<suggestion id="e8ghd7:e390dk" suggestion-type="formatInline:i4u3xm" type="end" count="3erh"></suggestion>

start-before and end-after attributes are preferred, however, start-after and end-before are used when the former cannot be used (for example, a collapsed marker at the end of an element).

If there are multiple markers start/ending before/after given element, their names put one after another in marker attribute: data-[group]-start-before="[name],[name2],[name3]", for example: data-comment-start-before="commentId,commentId2:countId,commentId3" (in this example, the second comment is a multi-range comment).

scofalik on 3 Aug 2020

👀1 ❤1 👍1

All 9 comments

Additionally, we could also get rid of using <$marker> tag, which is now used as an extra step in marker conversion. As a reminder, what we do right now is more-or-less that:

<p>Foo<comment></comment>Bar<comment></comment></p>
<paragraph>Foo<$marker></$marker>Bar<$marker></$marker></paragraph>
<paragraph>Foo[Bar]</paragraph>

We could directly use ck-marker element.

For backward compatibility, <comment> will still be converted to <ck-marker>.

scofalik on 6 Jul 2020

Those attributes will simply contain a marker name. If there will be multiple, we will separate them with a comma.

Can marker have more information than just the name? I'd expect you can preserve more information there, so I'm not sure this is a good format.

I wonder whether we should mark already existing marker conversion helpers as deprecated. I think that there should be no need to use anything else other than a marker name to save the marker with the data.

:+1:

Proposed attribute names for view container elements: data-marker-start, data-marker-end.

I was a bit hesitant at fist whether this isn't leaking implementation specific aspects to the data, in which case that should be avoided. Markers are a mechanism in CKE5 and in that sense this should not be reflected in the data. However, "marker" is also a common word and a legit word in this case, so perhaps it's not a big deal.

However, if all features that use markers in the model will use the same attributes in the data, that will be actually leaking the implementation. If we'll decide to rewrite a feature X to not use markers, will we change the data format for it? If not, then how will a non-marker feature write to the same attribute that is used by marker-related converters?

It will also be quite impractical if all features will use the same marker. Simple data traversal will not let you find out where comments or suggestions are. You will need to parse attribute values for that.

Therefore, IMO, we should use:

data-comment-start, data-suggestion-start, etc.

Reinmar on 6 Jul 2020

Can marker have more information than just the name? I'd expect you can preserve more information there, so I'm not sure this is a good format.

No. Marker keeps all the data. We do keep additional data for comments and suggestions but this data is not sent together with a marker.

Therefore, IMO, we should use:

data-comment-start, data-suggestion-start, etc.

I think that my biggest point against is that if we use different attribute names we cannot make it a default converter. My idea was that we could deprecate the current helper and then not to introduce any and everything would just work.

You will need to parse attribute values for that.

I agree. OTOH, this will be an easy parse. And there can be multiple suggestions on one element, so you might still need some processing of the attribute.

scofalik on 6 Jul 2020

Okay, so the final agreement is that we go with data-comment-start etc., not default converter. One of the additional arguments that popped up is that you might have some markers that you don't want to convert.

scofalik on 7 Jul 2020

We also decided not to go with <ck-marker> instead, we will go with <markerType-start name="..."> and <markerType-end name="..."> to match data-markerType-start and data-markerType-end attributes. This means that there will be no default converter after all.

For example, for comments it will be:

<figure data-comment-start="commentId:uid" ...>...</td>
<comment-start name="commentId:uid"></comment-start>

Multiple comments:

<figure data-comment-start="commentId:uid,commentId2:uid" ...>...</td>
<comment-start name="commentId:uid"></comment-start><comment-start name="commentId2:uid"></comment-start>

While the marker name will still remain comment:commentId:uid

scofalik on 7 Jul 2020

Another change: collapsed markers will have both tags: <markerType-start> and <markerType-end> in the data.

scofalik on 7 Jul 2020

Some final changes:

We decided to go with four attributes for marker boundaries:

data-group-start-before
data-group-start-after
data-group-end-before
data-group-end-after

This was due to some scenarios that we forgot about earlier. For example:

<tr><td><p></p>[</td><td>]<p></p></td></tr>

In such case, only having data-group-start and data-group-end we weren't able to store such marker in the data.

Also, in the final solution, we don't check whether the marker boundary is after/before the view container element. Instead, we check in the model schema whether a text is allowed at the marker boundary position. If so, a tag is added there. If not, an attribute is added to the closest view element.

This prompted adding schema instance to conversionApi.

scofalik on 10 Jul 2020

In this iteration, we introduced new, safer way to convert markers to data. DowncastHelpers#markerToData() and UpcastHelpers#dataToMarker() were introduced. DowncastHelpers#markerToElement() should not be used in the data pipeline (it is still okay to use in the editing pipeline). Also, we marked UpcastHelpers#elementToMarker() as deprecated (and it fires a warning on the console).

We should inform our users about how to cope with this change.

If our users used markerToElement() in the data pipeline of their features, the output generated by this converter will not be compatible with dataToMarker() conversion. It means that for some transition period, documents saved in users' databases will have "old" values. Which means that users would have to use elementToMarker() upcast, which is deprecated.

Fortunately, elementToMarker() upcast converter can be easily substituted with elementToElement() converter:

view part of the configuration object stays the same.
model part of the configuration should return a model element with name $marker and attribute data-name equal to the marker name, that was earlier set in model property, or returned by the callback assigned to the model property.

For transition period, users should introduce new markerToData() and dataToMarker() converters but also keep elementToElement() upcast converter for backward compatibility with not-yet-regenerated documents.

See the example below to better understand how to change the code.

Before:

editor.conversion.for( 'dataDowncast' ).markerToElement( { ... } );

editor.conversion.for( 'upcast' ).elementToMarker( {
    view: oldViewConfiguration,
    model: oldModelCallback
} );

After:

// Use the new helper for data downcast conversion.
// See the documentation to learn how to configure the new converter.
editor.conversion.for( 'dataDowncast' ).markerToData( { ... } );

// Backward compatibility with old documents:
editor.conversion.for( 'upcast' ).elementToElement( {
    view: oldViewConfiguration,
    model: ( viewElement, modelWriter ) => {
        const markerName = oldModelCallback( viewElement );

        return modelWriter.createElement( '$marker', { 'data-name': markerName } );
    }
} );

// Use the new helper for upcast conversion.
// See the documentation to learn how to configure the new converter.
editor.conversion.for( 'upcast' ).dataToMarker( { ... } );

Note, that view property is not changed, while in model callback, instead of returning a marker name, you need to return $marker element with the marker name set as data-name property.

Other than that, the comma (`,`) character is now prohibited in marker names.

scofalik on 29 Jul 2020

👍1

Instruction for anyone who is processing the editor data.

First, please get familiar with the description of marker-to-data downcast, where we described what kind of HTML may be generated.

If you processed suggestions or comments you have two possibilities:

change your current solution,
apply another layer of processing, which will change the new HTML into the old HTML, and use your current solution.

Below are notes for comments. Suggestions are similar.

Also, collapsed markers did not have closing tag, however, in case of comments or suggestions, collapsed markers did not exist (collapsed comment or suggestion was removed).

Earlier: <comment id="commentId:count" type="start">
Now: <comment-start name="commentId:count">

For suggestions:

Earlier: <suggestion id="e8ghd7:e390dk" suggestion-type="formatInline:i4u3xm" type="end" count="3erh">
Now: <suggestion-end name="formatInline:i4u3xm:e8ghd7:e390dk:3erh">

Since marker name for suggestions is complex, below is a function that breaks down marker name for suggestions into meaningful parts:

// `name` is the value from the name attribute.
parseSuggestionName( name ) {
    // Name template is `<type>[:<subType>]:<id>:<authorId>[:<count>]`.
    const parts = name.split( ':' );

    return {
        type: parts[ 0 ], // "insertion", "formatInline", etc.
        subType: parts.length >= 4 ? parts[ 1 ] : null, // hash or null
        id: parts.length < 4 ? parts[ 1 ] : parts[ 2 ], // unique id
        authorId: parts.length < 4 ? parts[ 2 ] : parts[ 3 ], // user id
        count: parts.length == 5 ? parts[ 4 ] : null // additional unique id
    };
}

Again, marker count is an additional unique id for multi-range suggestions. If count is in the suggestion name, a subType is present as well. The opposite is not true.

Keep in mind that the marker name in the editor also contains "group prefix" (for example, marker name for comment would be comment:ek45nd:ep0v not just ek45nd:ep0v.

data-[group]-start-before="[name]", for example: data-comment-start-before="commentId",
data-[group]-start-after="[name]", for example: data-comment-start-after="commentId",
data-[group]-end-before="[name]", for example: data-comment-end-before="commentId",
data-[group]-end-after="[name]", for example: data-comment-end-after="commentd".

For example, the following HTML in the new format:

<td data-suggestion-end-after="formatInline:i4u3xm:e8ghd7:e390dk:3erh" data-suggestion-start-before="formatInline:i4u3xm:e8ghd7:e390dk:3erh">Foo</td>

Is equivalent to the following HTML in the old format:

<suggestion id="e8ghd7:e390dk" suggestion-type="formatInline:i4u3xm" type="start" count="3erh"></suggestion>
<td>Foo</td>
<suggestion id="e8ghd7:e390dk" suggestion-type="formatInline:i4u3xm" type="end" count="3erh"></suggestion>

start-before and end-after attributes are preferred, however, start-after and end-before are used when the former cannot be used (for example, a collapsed marker at the end of an element).

scofalik on 3 Aug 2020

👀1 ❤1 👍1

Was this page helpful?

0 / 5 - 0 ratings