Draft-js: Empty blocks not included by convertFromHTML

Created on 25 Mar 2016 · 16Comments · Source: facebook/draft-js

HTML equivalent to a block with no text are omitted from the output of Draft.convertFromHTML.

Steps to reproduce:

const html = '<div>first block</div><div></div><div>third block</div>';
Draft.convertFromHTML(html) //returns an array of 2 blocks (the non-empty ones) instead of three

@hellendag mentioned in Slack that @CLowbrow might know most about this - I'm happy to PR a change to fix it but could use a bit more context on what the condition in joinChunks is used and not used for. Looks like this is the source of the removal of the block when it's called from here. When removing that last character of chunk A does its block ever need to be popped? My interpretation is that chunk A has a delimiter and chunk B is empty since its first character is \r.

Thank you!

question

Source

benbriggs

Most helpful comment

Agreed the user should have a choice, or at least this should be documented.

Taakn on 5 Oct 2016

👍13 ❤1

All 16 comments

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up. I'm open to modifying this. What kinds of things are you trying to preserve?

CLowbrow on 25 Mar 2016

👍2 😄1

Ah I see. I'm using this module along with another that I built to go to/from HTML for saving the contents, so the empty block tags are actually coming from a transformation of ContentState. I didn't use this as my initial example since it added extra complexity but having three ordered-list-item blocks:

first
1. third

That ContentState is going to have three blocks that'll map to <ol><li>first</li><li></li><li>third</li></ol>. Throwing that back into convertFromHTML omits that middle block making the last one which causes noticeable differences from what Draft was rendering originally.

benbriggs on 25 Mar 2016

Storing draft.js content as html is probably not the way to go. Internally, we store a serialized content state and just re-instantiate/render from that. The paste handler is more tailored to minimize the amount of follow-up editing after a paste than faithfully recreate document states.

CLowbrow on 25 Mar 2016

👍1

@CLowbrow question, what are the negatives you'd point to to storing content as HTML?

It totally makes sense that the internal convertFromHTML is used for copy-pasting, which is going to have different needs than an HTML (de)serializer, so it probably shouldn't be used for that. But just wondering if beyond that, are there specific things that would make you still not use HTML even if someone wasn't using that util?

(Wondering because my current plan is to store content as HTML, since the API I'm building needs to give it to clients in HTML form, so I figure better to (de)serialize on the client, instead of having to do that at the API level, but I may be missing big downsides.)

ianstormtaylor on 25 Mar 2016

👍1

Nothing huge. In my brain it makes more sense to store document data in the format the editor uses since you don't have to worry about doing the transformation both ways (just have to worry about converting to html on demand). Parsing html client-side to initialize the editor state is a bunch of extra work. If you really need html stored, I would probably double write an editor state and the html representation.

That being said there a probably a bunch of use cases for writing as html only that I can't think of :P

CLowbrow on 25 Mar 2016

My use case is similar to that of @ianstormtaylor where I have clients that needs HTML to process, so I use HTML to persist to the server. In my implementation I'm actually customizing a separate version of convertFromHTML to accommodate a plugin system I'm building (i.e. it's never being used for pasted data) so I'll make the changes to that version to not omit blocks. I figured I'd explore whether this was a change that could be used in the original as well.

benbriggs on 25 Mar 2016

For future reference, I found a solution that seems to work. In the condition in joinChunks linked above, the block was being omitted when joining the ending 'block divider chunk" (a chunk with text of \r) to the block's chunk that is just another divider chunk, since it has no text. So the join was essentially of two divider chunks, something that (from what I can tell) only happens when creating an empty block. So I added an additional check that (A.text !== '\r' || B.text !== '\r') before popping off the end of block A.

benbriggs on 31 Mar 2016

I created a duplicate issue about this (https://github.com/facebook/draft-js/issues/578)

I understand that it may be a good idea to remove extra whitespace in many use cases, but I really would prefer there be an option to load the extra whitespace accurately, especially when there's nothing in the documentation that refers to this behavior. My use case: I'm using DraftJS as an email composer, and I have a snippet menu that allows me to insert snippets into the editor.

When I initially create my snippets (with Draft), the HTML _in_ the editor itself as well as the HTML I export from the editor (to save as the snippets in my database) includes <div><br></div>, which gets rendered as a blank line in the email.

When I try to insert then these snippets into the email composer, I'm using convertFromHTML and then inserting the result as a fragment. The problem is that the convertFromHTML function strips out these blank lines.

My current solution is to instead create fragments line by line, insert then, then split the blocks manually, but this is pretty error prone as it depends on specifics aspects of the HTML I'm inserting. I'll probably switch to the custom solution @benbriggs noted above, but that feels pretty heavy handed for something that should be configurable.

gdehmlow on 4 Aug 2016

👍4

@gdehmlow since I last wrote that I've open sourced my version: https://github.com/HubSpot/draft-convert

benbriggs on 4 Aug 2016

👍7 ❤2 🎉1

:O that's awesome. Exactly what I've needed.. I had taken the lazy way out and forked from conversion to HTML as well so I could support custom entity exports. Will try to contribute as possible..

gdehmlow on 5 Aug 2016

Agreed the user should have a choice, or at least this should be documented.

Taakn on 5 Oct 2016

👍13 ❤1

Just came across this while having some issues with the problem myself, and I think the answer given originally makes sense:

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up.

Going to close this for now, but we may revisit it in the future.

flarnie on 10 Sep 2017

Just came across this while having some issues with the problem myself, and I think the answer given originally makes sense:

This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up.

Going to close this for now, but we may revisit it in the future.

Whilst I agree, in most cases it makes sense to have it do that,

I feel that it should be an optional param for those many cases where it's not the behaviour which is wanted. Choice is always better than no choice and having to hack your way around it.

GeorgeWL on 16 Mar 2018

I honestly have never hated a library more than this one because of this problem. Watching you just scoff this off is infuriating. I'm sitting here trying to regex replace newline characters with sections just to get custom mentions with the ability to save text as paragraphs working.

Smaxor5 on 26 May 2020

😄1 👍1

@Smaxor5 have you tried using zero width characters like &zwnj; It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.

some context for zero width characters

f-ricci on 3 Jul 2020

@Smaxor5 have you tried using zero width characters like &zwnj; It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.

some context for zero width characters

that's kinda then avoiding the point of the function surely?

of it reading html and converting it, without requiring pre-parsing by the user

GeorgeWL on 6 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings