Ckeditor5: Markdown autoformat italics bug

Created on 7 Mar 2018  Â·  11Comments  Â·  Source: ckeditor/ckeditor5

Sorry if this was fixed in latest or not, but I wanted to make to discuss this.

From a user:

I was writing some notes and tried writing wrote word1_word2_word3 and while writing it, it did word1word2word3 and word 2 was in italics. I am assuming that's because of the underlines but I am not sure this is intentional behavior. Thanks!

STEPS TO REPRODUCE

  1. Type: word_1 and word_2

EXPECTED RESULTS

  • word_1 and word_2

ACTUAL RESULTS

  • word1 and word2

Ref: https://github.com/mozilla/notes/issues/595

I wonder if auto-format should only format if there are spaces around _ or if it starts the line?

intro autoformat ux improvement

Most helpful comment

Hi guys,
any update on this?
A lot of our clients are complaining because of this issue as they use texts like item_item_item.
We made the following change in the plugin:
this
/(?:^|[^_])(_)([^_]+)(_)$/g
we changed to:
/(?:^|\s)(_)([^_]+)(_)$/g
but this way it is erased after we update the editor.

All 11 comments

I think that GitHub disabled foo_bar_bom at some point because _ is commonly used in variable names. Interestingly, this doesn't apply to foo*bar*bom which works. I guess we could do the same, although, to make it simpler, I'd treat foo_bar_bom and foo*bar*bom in the same way.

The important thing here would be to not filter out too many scenarios because it may be confusing why autoformatting doesn't work in some cases then.

I stuck by this underscore-autofomat issue too

The important thing here would be to not filter out too many scenarios because it may be confusing why autoformatting doesn't work in some cases then.

I think there is no need for filtering since markdown syntax already has BACKSLASH ESCAPES to escape those auto-format characters(*/_), for example, I tried \_ to escape _, but sadly failed. so just bring markdown's backslash escapes here.

Hi guys,
any update on this?
A lot of our clients are complaining because of this issue as they use texts like item_item_item.
We made the following change in the plugin:
this
/(?:^|[^_])(_)([^_]+)(_)$/g
we changed to:
/(?:^|\s)(_)([^_]+)(_)$/g
but this way it is erased after we update the editor.

Hey all, we are running into this issue as well and I just wanted to check to see if there are any updates with this issue? Thanks!

This works better in GitHub Writer. I think we could make this regexp narrower.

I checked other Markdown implementations (technical details at the end) and decided to do as below:

  • Simple regex update, that will require a whitespace or punctuation character before opening delimiter should be enough for most cases. 

    • Considering that Autoformat is mostly an aid in typing, not a full-featured Markdown parser, it does not seem viable to copy the entire regex for punctuation from CommonMark. Still, I am not sure how far we should go with that - any ideas?

  • Seems reasonable and should be cheap to update Bold functionality in the same manner.
  • Nice-to-have: Autoformat catches cases where first the opening delimiter (* or _) is typed in, then closing one. Being able to catch a reversed case would be great. 

    • word → word_ → _word_ ⇒ _word_

Most of the above are based on CommonMark spec. Some key elements of that spec:

  • Delimiter: _ or * (single or many).
  • Whitespace: space, tab, the start of the line, end of the line, etc.
  • Punctuation: A punctuation character is an ASCII punctuation character or anything in the general Unicode categories Pc, Pd, Pe, Pf, Pi, Po, or Ps.
  • ASCII punctuation character: !, ", #, $, %, &, ', (, ), *, +, ,, -, ., / (U+0021–2F), :, ;, <, =, >, ?, @ (U+003A–0040), [, \\, ], ^, _, (U+005B–0060),{,|,}, or~` (U+007B–007E).
  • Simple regex update, that will require a whitespace or punctuation character before opening delimiter should be enough for most cases.

What about what's after the closing delimiter?

  1. aa _bb^cc
  2. type _
  3. aa _bb_^cc

It's a less severe case than the one with foo_bar_^, but still might be irritating.

  • Seems reasonable and should be cheap to update Bold functionality in the same manner.

:+1:

Nice-to-have: Autoformat catches cases where first the opening delimiter (* or _) is typed in, then closing one. Being able to catch a reversed case would be great.

This is a nice catch. It always bothered me that it does not work in our implementation of autoformat. If you could check quickly how big that change would be, that'd help us planning this change for the future.

What about what's after the closing delimiter?

Good point. Another reason to treat whitespace as a separator.

Also, in the example below, you can see better why punctuation characters should join whitespace as a separator for autoformat:

  1. word _word^, word
  2. type _
  3. word _word,_ word

This is a nice catch. It always bothered me that it does not work in our implementation of autoformat. If you could check quickly how big that change would be, that'd help us planning this change for the future.

Regarding this, autoformat seems to grab text from the current line, up to the cursor. Without seeing what is after the cursor we cannot solve the problem of stopping the formatting inside the longer word. This also explains why reversed autoformatting cannot work right now.

The first idea that comes to mind is to grab the entire text line, then check what is after the cursor. If we can find a matching delimiter, followed by whitespace or punctuation, that's great - let's format that, otherwise stop.

Looking ahead only up to the end of the line should stop autoformatting from getting too excited about formatting what's further down in the text :)

Two things I noticed already:

  • With text like a _b_ d^ (which had the autoformatting undone) every time I type something, I still get that _b_ autoformatted. 

    • I am thinking about limiting the scope for autoformatting, nothing concrete yet, unfortunately.

  • Calculation done on the regex match doesn't like the non-capturing groups I used for matching whitespace (and later punctuation). 

    • I can change Autoformatter to expect 5 groups instead of 3: 2 groups for separator matches (whitespace/punctuation), 2 groups for delimiters, and 1 group for content. Wouldn't be that a breaking change?

    • Or maybe work with the code I have now - still trying to figure it out.

At the moment, we expect to have space preceding the opening delimiter (_, *, __ and **). Only then we allow the autoformat to do its job.

Achieving reverted autoformatting starts pretty easy, but, in connection with undo, gets complicated quite fast.

To properly format a reverted case, we have to look ahead of the cursor, possibly up to the end of the line. This breaks undo and makes undone autoformatting format again. We could look ahead and behind the cursor up to a certain point, e.g. first opened/closed delimiter, but the implementation of that becomes too complex for the scope of this bug.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

oleq picture oleq  Â·  3Comments

benjismith picture benjismith  Â·  3Comments

pjasiun picture pjasiun  Â·  3Comments

Reinmar picture Reinmar  Â·  3Comments

hybridpicker picture hybridpicker  Â·  3Comments