Ckeditor5: Using is( elementName ) considered harmful

Created on 12 Jul 2020  路  13Comments  路  Source: ckeditor/ckeditor5

馃摑 Provide detailed reproduction steps (if any)

I've been recently trying to pass this SVG through the DOM \<-> view conversion:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:wrs="http://www.wiris.com/xml/cvs-extension" height="21" width="17" wrs:baseline="16"><defs><style type="text/css"></style></defs><text font-family="NexusSerifWebPro" font-size="16" font-style="italic" text-anchor="middle" x="4.5" y="16">x</text><text font-family="NexusSerifWebPro" font-size="12" text-anchor="middle" x="12.5" y="9">2</text></svg>

Unfortunately, it throws at one point with:

foo.js:775 TypeError: Cannot read property 'charAt' of undefined
    at DomConverter._processDataFromViewText (boxes.js:70783)
    at DomConverter.viewToDom (boxes.js:70026)
    at DomConverter.viewChildrenToDom (boxes.js:70099)
    at viewChildrenToDom.next (<anonymous>)
    at DomConverter.viewToDom (boxes.js:70071)
    at DomConverter.viewChildrenToDom (boxes.js:70099)
    at viewChildrenToDom.next (<anonymous>)
    at DomConverter.viewToDom (boxes.js:70071)
    at HtmlDataProcessor.toData (boxes.js:47402)
    at stringifyViewItem (boxes.js:570)

The error comes from https://github.com/ckeditor/ckeditor5/blob/2f1568d/packages/ckeditor5-engine/src/view/domconverter.js#L960.

The issue is that the SVG uses a <text> element and that falls into this case:

https://github.com/ckeditor/ckeditor5/blob/2f1568d/packages/ckeditor5-engine/src/view/domconverter.js#L204-L209

Unfortunately, we made a very bad decision to allow checking element names with is( elementName ) instead of forcing a 聽is( 'element', elementName ) notation.

Proposal

Unfortunately, I can't see an option to avoid introducing a breaking change.

Even if we'd replace all our is( elementName ) usage with is( 'element', elementName ) and add warning to is( elementName ) for cases where elementName != NODE_TYPE_ENUM to notify people about using a deprecated and unsafe API, the is( 'text' ) check that we would still have in our code (cause we're looking for a text node) would still make the SVG case fail.

To avoid a BC we'd need to introduce a new API for checking element type but that makes no sense as is() was created for primarily this reason.

Therefore, I'm for:

  • Completely removing support for is( elementName ).
  • Replacing all is( elementName ) usage with is( 'element', elementName ).
  • Optionally: Adding a console warning/error in is() methods when they are used with first param from beyond the NODE_TYPE_ENUM to help with migration. However, since it's fairly simple to find all is() usage, this would not be a big help.

If you'd like to see this fixed sooner, add a 馃憤 reaction to this post.

major engine 3 bug refactor

Most helpful comment

@niegowski & @Reinmar I still find the RFC messy, so I'd try to summarize it in ADRish form (I've used a compilation from example templates). I've also compiled the solution and it sill might need some improvements (you're welcome to update below ADR)


Checking model/view element type and name

Context

The model/view nodes check for type and name:

text.is( 'text' );
text.is( 'model:text' );
element.is( 'tableCell' );
element.is( 'element', 'tableCell' );
element.is( 'model:element', 'tableCell' );

cause troubles if some view element has name that happens to be one of our element type (text, attributeElement, element, etc). The problem touches mostly the view, however nothing is preventing AFAIK defining 'element' in model's schema and conversion.

Decision drivers

  • performance - we already discovered performance gain while refactoring is() check - the new solution should not impact the performance
  • solid API design - we should spend more time discovering potential problems to not refactor this every now and then
  • DX - keep developer experience

Considered options

1. Special notation of text nodes in check :x:

Keep is check as now but change text nodes check:

text.is( '$text' )

for text nodes and keep current consistent form (already ditched as other elements types would also cause troubles).

2. A three part string for is() check

Special notation of is() argument: (model|view:)NODE_TYPE(:ELEMENT_NAME)

text.is( 'text' )
text.is( 'model:text' );
// no form for only "name check" element.is( 'tableCell' );
element.is( 'element:tableCell' );
element.is( 'model:element:tableCell' );

3. Two arguments is() check

Always require first argument as a type and optional second for name. Type argument can be prefixed with model: or view:.

text.is( 'text' )
text.is( 'model:text' );
element.is( 'element' );
element.is( 'element', 'tableCell' );
element.is( 'model:element', 'tableCell' );

Cons & Pros of the Options

1. Special notation

Pros:

  • Less refactoring.
  • In line with model schema declaration, differ, etc.

Cons:

  • Does only work for text nodes, doesn't solve element, attributeElement cases.
  • Minor breaking change.

2. Three part string for is() check

Pros:

  • By providing expected type (either by prefix or using 2 arguments) we avoid conflicts with text and maybe some other elements in future.

Cons:

  • It's breaking change but we don't see any better generic solution.
  • Might impact performance (depending on implementation)
  • Hard to explain in jsdoc (you'd need to read method docs to know how to construct string for this check).
  • Breaking change.

3. Two arguments is() check

Pros:

  • Clear definition of what the method check (no ambiguous typeOrName parameter).
  • No need to read jsdoc - only parameter names in most cases. Check for mode/view would still require to learn about special notation.

Cons:

  • More typing for element.is( 'tableCell' ).
  • Breaking change.

Recommendation

_(to be filled after discussion)_

All 13 comments

Therefore, I'm for:

  • Completely removing support for `is( elementName )`.
  • Replacing all `is( elementName )` usage with `is( 'element', elementName )`.
  • Optionally: Adding a console warning/error in `is()` methods when they are used with first param from beyond the `NODE_TYPE_ENUM` to help with migration. However, since it's fairly simple to find all `is()` usage, this would not be a big help.

My few loose notes on this, for now I don't have strong opinion on where to head with this:

What about model's API? We also have element.is() and from most of the usages I remember (outside engine) we check for element name, like element.is( 'image' ). (Con of removing this is a PITA in if( element.is( 'element', 'codeBlock ) ) .... I'm writing this because we have a pretty model/view symmetrical API and we tend to keep this in other places.

The other thing to consider is - why is() anyway - we already have .name property and it is not obvious which to use. (Pro for removing elementName support by completely removing this. However this would also means that .name cannot be used for "type" thus if BC is an option - why not introduce _type property just for node types.

I'd test the change with logging to the console in this part of API - I worry for a performance hit.

I was wondering if we could make different breaking change (possibly with smaller impact).

Currently in source files we have:

  • all is() calls: 339
  • is( 'text' ) calls: 63
  • is( 'textProxy' ) calls: 20

Also we use $text notation in:

  • Schema
  • Conversion ('insert:$text')
  • Differ entries
  • dev utils stringify() of model data

So maybe it would be more consistent, less breaking and still providing clean API if we replace is( 'text' ) with is( '$text' ) and also maybe is( 'textProxy' ) with is( '$textProxy' )?

Unfortunately, I've got second doubts about is( '$text' ). Since this change will be breaking, it'd be perfect if it's THE_FINAL solution. We can't have THE_FINAL_2 in a year or week.

The problem is with is( 'attributeElement' ) or is( 'uiElement' ). I started thinking about these cases when reviewing https://github.com/ckeditor/ckeditor5/pull/7621/files.

We can't say for sure that no one uses <attributeElement> or <rawElement> in some specific cases. Actually, AFAIR, currently DomConverter does not maintain the letter case so you'd get <attributeelement> which would work, but there's a big chance that we'll have to fix that one day. And then, crashing at <attributeElement> would become a real threat.

What's more. There are more cases. <model:element> and it blows. <view:$text> perhaps as well. And more and more.

So, I'm again thinking about is( 'element', elementName ). It's just absolutely safe and future proof. Something, that we'll never have to revisit.

I know that your concern was that typing is( 'element', 'p' ) will be tiring, but to me it's an over exaggeration. Like with semicolons in JS. I'm for robustness rather than illusory usefulness. Also,

\.is\( '(?!element|text|selection|documentFragment|node|model|view|position|range|\w+Element|CKEditor)

gives me 150 results in src/ compared to 392 of the is( type scenario. So, we check the type more often anyway.

As discussed F2F we came to a solution that would use existing namespacing that we use already (i.e., is( 'model:element' ), is( 'model:node' )) so for elements we would allow (and require) notation: element:paragraph or model:element:paragraph.聽

But since the above would resolve the issue with <text> elements because it would require strict check for is( 'element:text') then there would not be original issue, but since we use $text notation in those places:

Maybe we should also change Text#is( 'text' ) to Text#is( '$text' ) for consistency with mentioned places. We could also allow both notations (current one and new one).

tlldr: To rephrase Kuba's comment, the notation would look like this: (model|view:)NODE_TYPE(:ELEMENT_NAME) and would be similar to event namespaces.

tlldr: To rephrase Kuba's comment, the notation would look like this: (model|view:)NODE_TYPE(:ELEMENT_NAME) and would be similar to event namespaces.

But also $text notation is a case to discuss.

_Trying to put this into some more readable form._

Case 1

node.is( 'text' ) could return true for both Element with name 'text' and Text node. This is causing problems with SVG <text> nodes. Also the same conflict could happen for any other editor objects.

Currently in code we use is() in two scenarios:

  1. Testing if object is an instance of one of editor classes (eg. element, selection, documentFragment, node, position, range...)
  2. Testing name of Element (eg. paragraph, tableCell, p, td...)

Currently is() in some cases accepts prefixed argument like model:element, model:node, view:element. As Elements with specific name could be considered as virtual extension of generic Element so current implementation also accepts tableCell and model:tableCell.聽

Proposal

We could change implementation of Element#is() to always require argument if following form: (model|view:)NODE_TYPE(:ELEMENT_NAME) so NODE_TYPE would always be required and calls like element.is( 'tableCell' ) would become element.is( 'element:tableCell' ) (i guess current alternative notation element.is( 'element', 'tableCell' ) would be not changed. Only first argument would require type prefix if we want to check for specific element name.

Pros

By providing expected type (either by prefix or using 2 arguments) we avoid conflicts with text and maybe some other elements in future.

Cons

It's breaking change but we don't see any better generic solution.

This is RFC so maybe anyone could see come cons?

Case 2聽

At first it was prepared as changing Text#is() behavior to expect $text instead of text, this is consistent with mentioned above other places in editor (Schema, Differ, Conversion, model serialization). This would also avoid SVG text node issue. But this not solves problem of possible name conflict with any of the other build-in classes.聽

Pros

Code consistency with Schema, Differ, Conversion, model parser and serializer

Cons

Solves only case of text and there could be any other conflicts in the future.

Since this is considered "not enough" and SVG problem would be solved by case 1 then this change might be not needed. OTOH this would give more consistency in code. The question is if you see any cons for this change if we would push both cases at once. Both cases require breaking change so it would be better to push it once than in multiple iterations.

@niegowski & @Reinmar I still find the RFC messy, so I'd try to summarize it in ADRish form (I've used a compilation from example templates). I've also compiled the solution and it sill might need some improvements (you're welcome to update below ADR)


Checking model/view element type and name

Context

The model/view nodes check for type and name:

text.is( 'text' );
text.is( 'model:text' );
element.is( 'tableCell' );
element.is( 'element', 'tableCell' );
element.is( 'model:element', 'tableCell' );

cause troubles if some view element has name that happens to be one of our element type (text, attributeElement, element, etc). The problem touches mostly the view, however nothing is preventing AFAIK defining 'element' in model's schema and conversion.

Decision drivers

  • performance - we already discovered performance gain while refactoring is() check - the new solution should not impact the performance
  • solid API design - we should spend more time discovering potential problems to not refactor this every now and then
  • DX - keep developer experience

Considered options

1. Special notation of text nodes in check :x:

Keep is check as now but change text nodes check:

text.is( '$text' )

for text nodes and keep current consistent form (already ditched as other elements types would also cause troubles).

2. A three part string for is() check

Special notation of is() argument: (model|view:)NODE_TYPE(:ELEMENT_NAME)

text.is( 'text' )
text.is( 'model:text' );
// no form for only "name check" element.is( 'tableCell' );
element.is( 'element:tableCell' );
element.is( 'model:element:tableCell' );

3. Two arguments is() check

Always require first argument as a type and optional second for name. Type argument can be prefixed with model: or view:.

text.is( 'text' )
text.is( 'model:text' );
element.is( 'element' );
element.is( 'element', 'tableCell' );
element.is( 'model:element', 'tableCell' );

Cons & Pros of the Options

1. Special notation

Pros:

  • Less refactoring.
  • In line with model schema declaration, differ, etc.

Cons:

  • Does only work for text nodes, doesn't solve element, attributeElement cases.
  • Minor breaking change.

2. Three part string for is() check

Pros:

  • By providing expected type (either by prefix or using 2 arguments) we avoid conflicts with text and maybe some other elements in future.

Cons:

  • It's breaking change but we don't see any better generic solution.
  • Might impact performance (depending on implementation)
  • Hard to explain in jsdoc (you'd need to read method docs to know how to construct string for this check).
  • Breaking change.

3. Two arguments is() check

Pros:

  • Clear definition of what the method check (no ambiguous typeOrName parameter).
  • No need to read jsdoc - only parameter names in most cases. Check for mode/view would still require to learn about special notation.

Cons:

  • More typing for element.is( 'tableCell' ).
  • Breaking change.

Recommendation

_(to be filled after discussion)_

After compiling the above ADR, my thoughts on this:

  1. I worry that the 2 option is hard to explain in the jsdoc (you'd have to read the docs)
  2. The 2 option must be checked for performance.
  3. I'm for option 3 even though I was kinda against it before but taking attribureElement into consideration I think that it might be the best from the proposed.

Thank you, @jodator for writing it this way.

I'm ok with option 2, but I completely agree with you about its cons. Therefore, I also find 3, so the initial idea, a better option.

@jodator I like this ADR approach. I think this should become a standard in our project.聽

3 sounds good (lesser evil) to me.

@Reinmar & @niegowski I've found one place confusing a bit in the current/PR code.

It is about checking for Element being Node. The original is() check could check for is( 'node', 'div' ) AFAICS. The check for element.is( 'node' ) was removed by this comment: https://github.com/ckeditor/ckeditor5/issues/3979#issuecomment-280364928.聽

However, it looks like the check was later on brought back (ckeditor/ckeditor5-engine#1368) - probably due to foo.is( 'node' ) vs foo.is( 'position' ) checks (I didn't dig into it more).

So the current implementation (not the original one) https://github.com/ckeditor/ckeditor5/blob/master/packages/ckeditor5-engine/src/model/element.js#L118-L127:

will check:

element.is( 'node' ); // true
element.is( 'element' ); // true
element.is( 'element', 'name' ); // true
element.is( 'node', 'name' ); // false

Now, I'd like to have a confirmation that element.is( 'node', 'name' ) returning false is OK (nodes do not have name but OTOH if I would do node.is( 'node', 'name' ) would return true because the second parameter would be omitted by Node.is() check.

ps.: This topic Element vs Node comes back again - for me those are the same (AFAIR the Node uses code from Element).

To sum up, this change modifies arguments of is() methods of some model (Element, RootElement, Text, TextProxy) and view (Element, AttributeElement, ContainerElement, EditableElement, EmptyElement, UIElement, Text, TextProxy) classes.

Element name parameter

Before this change is() method allowed to omit element type argument and use only element name. This notation is no longer supported because of name vs type conflicts.

Model

Before:

element.is( 'image' );
rootElement.is( '$root' );

After:

element.is( 'element', 'image' );
rootElement.is( 'element', '$root' );
rootElement.is( 'rootElement', '$root' );

View

Before:

element.is( 'img' );
containerElement.is( 'div' );
emptyElement.is( 'img' );
attributeElement.is( 'b' );
editableElement.is( 'div' );
rootEditableElement.is( 'div' );
uiElement.is( 'span' );

After:

element.is( 'element', 'img' );
containerElement.is( 'element', 'div' );
emptyElement.is( 'element', 'img' );
attributeElement.is( 'element', 'b' ); 
editableElement.is( 'element', 'div' );
rootEditableElement.is( 'element', 'div' );
uiElement.is( 'element', 'span' );

Text and TextProxy type parameter

Before this change is() method expected 'text' / 'textProxy' as an argument but now it is changed to '$text' / '$textProxy' to be more consistent with Schema, Differ, etc. Note that old notation is still accepted for backward compatibility.

Before:

text.is( 'text' );
textProxy.is( 'textProxy' );

After:

text.is( '$text' );
textProxy.is( '$textProxy' );
Was this page helpful?
0 / 5 - 0 ratings

Related issues

pjasiun picture pjasiun  路  3Comments

benjismith picture benjismith  路  3Comments

Reinmar picture Reinmar  路  3Comments

MansoorJafari picture MansoorJafari  路  3Comments

PaulParker picture PaulParker  路  3Comments