HTML equivalent to a block with no text are omitted from the output of Draft.convertFromHTML.
Steps to reproduce:
const html = '<div>first block</div><div></div><div>third block</div>';
Draft.convertFromHTML(html) //returns an array of 2 blocks (the non-empty ones) instead of three
@hellendag mentioned in Slack that @CLowbrow might know most about this - I'm happy to PR a change to fix it but could use a bit more context on what the condition in joinChunks is used and not used for. Looks like this is the source of the removal of the block when it's called from here. When removing that last character of chunk A does its block ever need to be popped? My interpretation is that chunk A has a delimiter and chunk B is empty since its first character is \r.
Thank you!
This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up. I'm open to modifying this. What kinds of things are you trying to preserve?
Ah I see. I'm using this module along with another that I built to go to/from HTML for saving the contents, so the empty block tags are actually coming from a transformation of ContentState. I didn't use this as my initial example since it added extra complexity but having three ordered-list-item blocks:
That ContentState is going to have three blocks that'll map to <ol><li>first</li><li></li><li>third</li></ol>. Throwing that back into convertFromHTML omits that middle block making the last one which causes noticeable differences from what Draft was rendering originally.
Storing draft.js content as html is probably not the way to go. Internally, we store a serialized content state and just re-instantiate/render from that. The paste handler is more tailored to minimize the amount of follow-up editing after a paste than faithfully recreate document states.
@CLowbrow question, what are the negatives you'd point to to storing content as HTML?
It totally makes sense that the internal convertFromHTML is used for copy-pasting, which is going to have different needs than an HTML (de)serializer, so it probably shouldn't be used for that. But just wondering if beyond that, are there specific things that would make you still not use HTML even if someone wasn't using that util?
(Wondering because my current plan is to store content as HTML, since the API I'm building needs to give it to clients in HTML form, so I figure better to (de)serialize on the client, instead of having to do that at the API level, but I may be missing big downsides.)
Nothing huge. In my brain it makes more sense to store document data in the format the editor uses since you don't have to worry about doing the transformation both ways (just have to worry about converting to html on demand). Parsing html client-side to initialize the editor state is a bunch of extra work. If you really need html stored, I would probably double write an editor state and the html representation.
That being said there a probably a bunch of use cases for writing as html only that I can't think of :P
My use case is similar to that of @ianstormtaylor where I have clients that needs HTML to process, so I use HTML to persist to the server. In my implementation I'm actually customizing a separate version of convertFromHTML to accommodate a plugin system I'm building (i.e. it's never being used for pasted data) so I'll make the changes to that version to not omit blocks. I figured I'd explore whether this was a change that could be used in the original as well.
For future reference, I found a solution that seems to work. In the condition in joinChunks linked above, the block was being omitted when joining the ending 'block divider chunk" (a chunk with text of \r) to the block's chunk that is just another divider chunk, since it has no text. So the join was essentially of two divider chunks, something that (from what I can tell) only happens when creating an empty block. So I added an additional check that (A.text !== '\r' || B.text !== '\r') before popping off the end of block A.
I created a duplicate issue about this (https://github.com/facebook/draft-js/issues/578)
I understand that it may be a good idea to remove extra whitespace in many use cases, but I really would prefer there be an option to load the extra whitespace accurately, especially when there's nothing in the documentation that refers to this behavior. My use case: I'm using DraftJS as an email composer, and I have a snippet menu that allows me to insert snippets into the editor.
When I initially create my snippets (with Draft), the HTML _in_ the editor itself as well as the HTML I export from the editor (to save as the snippets in my database) includes <div><br></div>, which gets rendered as a blank line in the email.
When I try to insert then these snippets into the email composer, I'm using convertFromHTML and then inserting the result as a fragment. The problem is that the convertFromHTML function strips out these blank lines.
My current solution is to instead create fragments line by line, insert then, then split the blocks manually, but this is pretty error prone as it depends on specifics aspects of the HTML I'm inserting. I'll probably switch to the custom solution @benbriggs noted above, but that feels pretty heavy handed for something that should be configurable.
@gdehmlow since I last wrote that I've open sourced my version: https://github.com/HubSpot/draft-convert
:O that's awesome. Exactly what I've needed.. I had taken the lazy way out and forked from conversion to HTML as well so I could support custom entity exports. Will try to contribute as possible..
Agreed the user should have a choice, or at least this should be documented.
Just came across this while having some issues with the problem myself, and I think the answer given originally makes sense:
This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up.
Going to close this for now, but we may revisit it in the future.
Just came across this while having some issues with the problem myself, and I think the answer given originally makes sense:
This was on purpose to keep pastes of long text from having a lot of extra spaces between paragraphs. The content coming in can be pretty messy sometimes and this is one of the things we try to do to clean it up.
Going to close this for now, but we may revisit it in the future.
Whilst I agree, in most cases it makes sense to have it do that,
I feel that it should be an optional param for those many cases where it's not the behaviour which is wanted. Choice is always better than no choice and having to hack your way around it.
I honestly have never hated a library more than this one because of this problem. Watching you just scoff this off is infuriating. I'm sitting here trying to regex replace newline characters with sections just to get custom mentions with the ability to save text as paragraphs working.
@Smaxor5 have you tried using zero width characters like ‌ It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.
@Smaxor5 have you tried using zero width characters like
‌It renders "empty lines", the cursor is at the beginning and is not messed up when you convert with to json and back so you don't even have to strip it on the server-side.
that's kinda then avoiding the point of the function surely?
of it reading html and converting it, without requiring pre-parsing by the user
Most helpful comment
Agreed the user should have a choice, or at least this should be documented.