Hugo: --minify breaks on JSON since 59.0

Created on 1 Nov 2019  Â·  23Comments  Â·  Source: gohugoio/hugo

Since 59.0 hugo throws errors when trying to minify JSON files.

Building sites … ERROR 2019/11/01 16:57:44 parse error:1:1: unexpected character
    1: {"output":{"data":{"created":"2010-02-11T12:46:02Z","draf...
       ^
ERROR 2019/11/01 16:57:44 parse error:1:1: unexpected character
    1: {"output":{"data":{"created":"2010-03-08T16:25:25Z","draf...
       ^

It appears the minify lib has been updated to v2.5.2 with Hugo 59.0 so wondered if that could trigger the issue... https://github.com/gohugoio/hugo/commit/b401858ebd346c433dd69a260eba7098bded5a30

I'll try and share a repo later.

Thanks!

Bug NeedsInvestigation Upstream

Most helpful comment

here you go @anthonyfok : https://github.com/theNewDynamic/gohugo-6472

Thanks a lot for your patience.

All 23 comments

/cc @anthonyfok

Hi Regis and Bjørn Erik,

Thank you for bringing this to my attention. Unfortunately, partly due to my unfamiliarity with the JSON output feature, or perhaps the JSON test cases that I have on hand are too simplistic, that I am unable to reproduce the error that you are seeing, so please do share a test repo when you have time. Many thanks!

Yea, I added a quick test for this myself, and it doesn't look like "JSON is broken" is true in the general sense (which would have surprised me), so there must be something "special" about the JSON file used.

Will try and look into it next week. Thanks a lot.

here you go @anthonyfok : https://github.com/theNewDynamic/gohugo-6472

Thanks a lot for your patience.

  • Before: parse v2.3.5 and minify v2.3.7 (Hugo ≤ v0.58.3), no error.
  • After: parse v2.3.9 and minify 2.5.2 (Hugo ≥ v0.59.0), error: unexpected character.

The error is reported from parse in commit tdewolff/parse@776314151e8e151960ee21059413a965427422fc (between v2.3.5 and v2.3.6): "Improve error messages for NULL and unknown values"

So, according to @regisphilibert's MWE (minimal working example) https://github.com/theNewDynamic/gohugo-6472:

Error: Error building site: failed to render pages: parse error:1:1: unexpected character
    1: <!DOCTYPE html><html><head><title>http://example.org/arti...

or fuller report after I tweaked tdewolff/parse code:

Error: Error building site: failed to render pages: parse error:1:1: unexpected character '<'
    1: <!DOCTYPE html><html><head><title>http://example.org/article/</title><link rel="canonical" href="http://example.org/article/"/><meta name="robots" content="noindex"><meta charset="utf-8" /><meta http-equiv="refresh" content="0; url=http://example.org/article/" /></head></html>

It would appear that somehow Hugo is trying to ask Minify to process HTML as if it were JSON, and before commit tdewolff/parse@776314151e8e151960ee21059413a965427422fc, such error was silently ignored by tdewolff/parse. So...

But then, it does not seem to explain the error message that @regisphilibert's originally encountered:

Building sites … ERROR 2019/11/01 16:57:44 parse error:1:1: unexpected character
    1: {"output":{"data":{"created":"2010-02-11T12:46:02Z","draf...
       ^
ERROR 2019/11/01 16:57:44 parse error:1:1: unexpected character
    1: {"output":{"data":{"created":"2010-03-08T16:25:25Z","draf...
       ^

As { the "open curly bracket" is certainly a valid starting character for a JSON file, so, calling it an "unexpected character" hints at a bug in tdewolff/parse... ?

So:

  • Hypothesis 1: Newer version of parse uncovers a previously hidden bug in Hugo
  • Hypothesis 2: A bug in the newer versions of parse and/or minify
  • Hypothesis 3: Both of the above.

@regisphilibert, I know you are very busy, but, if possible, could you please also try to produce a MWE in order to get the error you see initially, i.e. "unexpected character" on seemingly valid JSON code?

@tdewolff, please help take a look and see if you can shed some light on this. Thank you!

(Sorry, I am really slow at trying to understand the code, and I have exceeded my quota for today, hence this "progress report" and plea for help.)

@anthonyfok I just updated the example repo so it compiles with what you requested. Not sure what did it, but it seems to be assigning JSON on RegularPage as well as lists.

Note to self: In the example that @regisphilibert provided, Hugo apparently calls Minify() from func Transformer() in minifier/minifier.go:

    return min.Minify(m.m, ft.To(), ft.From(), params)

I was going to try to decipher what m.m (mediatype) reads, but ran out of time.

@anthonyfok I just updated the example repo so it compiles with what you requested. Not sure what did it, but it seems to be assigning JSON on RegularPage as well as lists.

Wow, that was really quick! Thank you so much @regisphilibert!

Error: Error building site: failed to render pages: parse error:1:1: unexpected character '<'
    1: {"content":"Be With by Forrest Gander wins  2019 Pulitzer Prize for PoetryNew Directions  overjoyed by  news that our poet, translator,  dear friend Forrest Gander—whom we first published  1998—has won  year\u0026rsquo;s Pulitzer Prize!Read \u0026ldquo;Epitaph\u0026rdquo; 
       ^

Hmm... strange, the ^ points at {, but the unexpected character seems to be < still. Intriguing!

That '<' I got from modifying tdewolff/parse/json/parse.go from:

    p.err = parse.NewErrorLexer("unexpected character", p.r)

to

    p.err = parse.NewErrorLexer("unexpected character"+string(c), p.r)

What I think is happening is that passing HTML is causing the error (ie. the < character), but that the error display message is displaying the "first character is wrong" on the wrong file. This hypothesis means a bug in Hugo for putting HTML through the JSON parser, and a bug in the error reporting of parse.

When the parser finds an error, it registers the i th position it occured in the file. The error recovery process then parses the file again to find the line number and column number, as well as the context (surrounding text). When the file buffer is changed in between, the error recovery process reads the i th position from the wrong file. This is probably a subtle bug in parse, but might be caused by how Hugo uses it...let me investigate better next week when I have some time.

I have fixed this problem partially in tdewolff/[email protected] (with tdewolff/[email protected]). What remains is that Hugo parses a generated HTML document by a JSON parser.

This can be confirmed by adding:

start := ft.From().Bytes()[:0]
if len(ft.From().Bytes()) > 30 {
    start = ft.From().Bytes()[:30]
}
fmt.Printf("Transform: %s %s\n", mediatype, string(start))

before line 54 in hugo/minifiers/minifiers.go. It outputs:

Transform: application/json <!DOCTYPE html><html><head><ti

Stack trace:

goroutine 121 [running]:
runtime/debug.Stack(0x3b, 0x0, 0x0)
    /usr/lib/go/src/runtime/debug/stack.go:24 +0x9d
runtime/debug.PrintStack()
    /usr/lib/go/src/runtime/debug/stack.go:16 +0x22
github.com/gohugoio/hugo/minifiers.Client.Transformer.func1(0x1e5e580, 0xc00149be80, 0xc001691260, 0x115)
    /home/taco/go/src/github.com/gohugoio/hugo/minifiers/minifiers.go:60 +0x249
github.com/gohugoio/hugo/transform.(*Chain).Apply(0xc00163f298, 0x1e4d940, 0xc001ea3110, 0x1e4d920, 0xc001accfc0, 0x0, 0x0)
    /home/taco/go/src/github.com/gohugoio/hugo/transform/chain.go:105 +0x25e
github.com/gohugoio/hugo/publisher.DestinationPublisher.Publish(0x1e94f20, 0xc00057d4a0, 0x1, 0xc00057a2c0, 0x1e4d920, 0xc001accfc0, 0x1aa051e, 0x4, 0x1ac7621, 0xb, ...)
    /home/taco/go/src/github.com/gohugoio/hugo/publisher/publisher.go:100 +0x373
github.com/gohugoio/hugo/hugolib.(*Site).publishDestAlias(0xc00023b500, 0xc001471e00, 0xc000cf6580, 0x1a, 0xc001471ee0, 0x1b, 0x1aa051e, 0x4, 0x1ac7621, 0xb, ...)
    /home/taco/go/src/github.com/gohugoio/hugo/hugolib/alias.go:124 +0x3d4
github.com/gohugoio/hugo/hugolib.(*Site).writeDestAlias(...)
    /home/taco/go/src/github.com/gohugoio/hugo/hugolib/alias.go:97
github.com/gohugoio/hugo/hugolib.(*Site).renderPaginator(0xc00023b500, 0xc000ff0090, 0xc0004e4900, 0x24, 0x30, 0x13, 0xc000ff0090)
    /home/taco/go/src/github.com/gohugoio/hugo/hugolib/site_render.go:181 +0x35e
github.com/gohugoio/hugo/hugolib.pageRenderer(0xc00147cc20, 0xc00023b500, 0xc00179e120, 0xc00100b020, 0xc0015be9f0)
    /home/taco/go/src/github.com/gohugoio/hugo/hugolib/site_render.go:157 +0x675
created by github.com/gohugoio/hugo/hugolib.(*Site).renderPages
    /home/taco/go/src/github.com/gohugoio/hugo/hugolib/site_render.go:73 +0x160

Edit:
Looks like Page(/article/_index.md) with output format JSON (and thus mediatype application/json) is send through the pages channel in hugolib/site_render.go:127 while the content is clearly HTML. I'm not very familiar with the code base of Hugo, so I don't know where this channel item is coming from. @anthonyfok any ideas?

FWIW, I upped from 0.59.1 to 0.61.0 and now --minify is producing malformed AMP HTML Pages.. will turn off until it's sorted out..

Could you please give me the HTML page that is getting malformed? The only changes made for v2.6.0 is that now only HTML5 tags are minified (not AMP introduced tags for example) and that entities (such as &quot;) get minified. These new functionalities might have the potential of containing bugs, though they seem to work fine for the test cases. For CSS there have only been bugfixes.

The repo shared https://github.com/gohugoio/hugo/issues/6472#issuecomment-549832471 is still getting the error but I think something needs to be done on the Hugo part even after upgrading the dep.

@tdewolff

Files attached - i believe it does have something to do with the CSS.
if you run through https://validator.ampproject.org/ you'll see that the minified one fails.

minified_vs_nonmin.zip

@mlake Phew, it's terrible what some specifications require. In this case, it is checking the content of the <style> tag for it must match a predefined string of content. Naturally, we minify the CSS and the result is that the string doesn't match anymore. Specifically:

-webkit-animation:-amp-start 8s steps(1,end) 0s 1 normal both;

becomes

-webkit-animation:-amp-start 8s steps(1,end)0s 1 normal both; (notice the lack of space after ))

which is valid CSS but doesn't match the regex that AMP validator has. Not sure how to proceed. You could disable the CSS minifier for your webpage? We could file an issue with the AMP regex? I could add spaces after )? In fact, in the future this will probably be minified further to:

-webkit-animation:-amp-start 8s steps(1,end)both;

ouch..
Does hugo have a way of using --minify and excluding particular pages from minification?

In the short term, I need to keep google search console happy which I guess must use the same regex because that's what notified me of the errors. guess I'll just go unminified for the time being..

Google Search Console must recognize it is AMP HTML and use the AMP HTML validator. In any case, it should not influence how the website is displayed, but it may be that the Search Console ranks you lower when it has AMP warnings.

In any case, I've added an exception in the HTML minifier so that it does not minify the contents of a <style amp-boilerplate> tag. See https://github.com/tdewolff/minify/commit/3b9d2321f0e915b8771d7ed4415970e3944796d3. New release will come soon.

@tdewolff I noticed when testing the CSS minification here that it works correctly. Is that using this change and detecting the CSS? Any update on when a new release will be out?

Un-minified:

body {
    -webkit-animation: -amp-start 8s steps(1,end) 0s 1 normal both;
    -moz-animation: -amp-start 8s steps(1,end) 0s 1 normal both;
    -ms-animation: -amp-start 8s steps(1,end) 0s 1 normal both;
    animation: -amp-start 8s steps(1,end) 0s 1 normal both
}

Minified:

body{-webkit-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-moz-animation:-amp-start 8s steps(1,end) 0s 1 normal both;-ms-animation:-amp-start 8s steps(1,end) 0s 1 normal both;animation:-amp-start 8s steps(1,end) 0s 1 normal both}

The website is running an older version of minify which doesn't include the breaking change for AMP. A new version for minify is coming but I have (very) limited time and there are some bigger changes that need testing.

@braderhart I split out the new changes and pushed this fix on master. It's added from version v2.6.2 upwards. Reminder that what remains is a bug in Hugo: https://github.com/gohugoio/hugo/issues/6472#issuecomment-559238453

Hi, I got the following error using hugo --minify

Total in 1624 ms
Error: Error building site: failed to render pages: JSON parse error: expected comma character or an array or object ending on line 27 and column 1
27: {
^
I am using ubuntu 18.04 with snap hugo (extended/stable) 0.72.0 from Hugo Authors refreshed.

Looks like your JSON is malformed, the error tells you where the syntax error is. However, if you think your JSON is well formatted, please open a bug report at https://github.com/tdewolff/minify

Was this page helpful?
0 / 5 - 0 ratings

Related issues

VoidingWarranties picture VoidingWarranties  Â·  3Comments

antifuchs picture antifuchs  Â·  3Comments

chrissparksnj picture chrissparksnj  Â·  3Comments

artelse picture artelse  Â·  3Comments

mumblecrunch picture mumblecrunch  Â·  3Comments