Gutenberg: Out of memory in parser

Created on 4 Dec 2017  路  18Comments  路  Source: WordPress/gutenberg

When trying to load a very old and large post from my blog and I get these errors:

parser-line-96

As far as I've tested this fails randomly at lines 96, 349, or 372 of the parser when I reload the page.

I've tried with increasing the maximum file size allowed for the network but it didn't solve the issue. For context, if I deactivate the Gutenberg plugin, the post is loaded and shows the content.

Not sure how I can share any more details for debugging, just ping me and I'll provide what you need.

[Feature] Parsing [Priority] High [Type] Bug

Most helpful comment

Thanks for the follow ups, @nosolosw. :)

All 18 comments

cc @aduth @dmsnell @pento as you did recent work on that part of the code.

Can you share the source of the post?

@dmsnell you can find it here. Expect broken links, non-existing media, and the like. I haven't touched it since I discovered the bug, so we can reproduce and debug what happened.

Hm. Not that big of a document. I'll start thinking about this - thanks!

I can reproduce similar excessive memory usage with Gutenberg on this post:

  • Hello World post, Gutenberg disabled: 4.1MB
  • Hello World post, Gutenberg enabled: 4.2MB
  • This post, Gutenberg disabled: 4.3MB
  • This post, Gutenberg enabled: 46.6MB

I believe that I know _why_ it's occurring and it's probably because of how we're handling those freeform HTML blocks. we build a list of each character in the parser and then later join them into a single string.

this was a fear of mine when converting to the nested parser with our move back to the idea of tokens (necessary) but not sure yet how to fix this.

A follow-up to this in Gutenberg 2.0: the post loads and the memory issue is gone.

Not sure if related to the fix or not, but when the post loads it has the raw HTML and the text is grey:

post-load

The console shows an error:

post-load-console

Attached the error log.

When I convert this to blocks, it's done but some paragraphs and items within lists are stripped.

For the record, the mismatch that fails the validation is based on   around tags:

screen shot 2018-01-15 at 14 43 32

In contrast, _pasting_ the raw content works, though it's very slow (several seconds on a 3.1 GHz i7).

I can confirm that @aduth https://github.com/WordPress/gutenberg/pull/4591 fixed the issue: convert to blocks works. Thanks!

I have found some issues with a couple of lists when they are converted to Gutenblocks, but will report in a separate issue. The source of the error is likely an originally bad-formatted HTML.

Thanks for the follow ups, @nosolosw. :)

I've been battle-testing Gutenberg again with this post I wrote 10 years ago. I've tried to convert it to blocks using Gutenberg 4.4:

  • Open the post in Gutenberg, the whole text is embedded in the classic block.
  • Check the document stats: it's 10.605 words long.
  • Convert it to blocks.
  • Check the document stats: it's 8.269 words long. One of the last sections was entirely gone (anything following the "Una panor谩mica general de las herramientas").

cc @dmsnell @mcsf @mtias @aduth @youknowriad

Going to see if I can bisect when this started to happen. Will post any follow ups here.

Bisected from v4.0.0-rc.1: it looks like the content loss may have started happening in 489eb792391a4d2bcb2bde701b52507f1d7334d1 cc @iseulde

Note that before that commit, Gutenberg reported 9.981 words after converting the old post to blocks (less than 10.605 in the original) but I couldn't find noticeable content loss.

@nosolosw could you report a new issue with it?

Thanks for battle-testing and dog-fooding, @nosolosw. Out of curiosity, since this issue was originally about the parser, have you noticed any issues with calling the server-side parser on that post of yours? Because since #10463 we no longer skip the parser (as fixed in #4591) but rather use the improved parser implementation added in #8083.

I wasn't able to become familiar with all Gutenberg components in these few weeks in the project, so I'm not sure what the server-side parsing means. I'm happy to try if I'm given some steps to reproduce.

Oh, now that I had some coffee, it occurred to me that by _server-side parsing_ you meant if the content within the classic block in Gutenberg was the same that the content in the classic editor? I've just compared both and yes, the content is equal.

cc @mcsf

Oh, now that I had some coffee

馃槃

it occurred to me that by _server-side parsing_ you meant if the content within the classic block in Gutenberg was the same that the content in the classic editor? I've just compared both and yes, the content is equal.

@nosolosw, actually I meant whether PHP's gutenberg_parse_blocks ever acts up with your largest posts. At its simplest, this means: when loading the front-end for a post, does anything ever fail? In more detail, this could mean examining the output of gutenberg_parse_blocks (called in do_blocks) for anything unexpected.

Was this page helpful?
0 / 5 - 0 ratings