Hugo: plainify should remove <style> and <script> sections

Created on 24 Mar 2017  ·  18Comments  ·  Source: gohugoio/hugo

plainify doesn't strip contents of <style> tags

See source of https://www.liwen.id.au/barbershop-charts/, line 34

Where

<meta property="og:description" content="{{ .Content | plainify | htmlEscape | truncate 200 }}">

is resulting in

<meta property="og:description" content="Here are some easy-to-follow flowcharts to help you remember the words to popular barbershop songs.
  figure { border: 1px solid lightgrey; } 
(Click to zoom)
Hello Mary Lou .gallery .img4 …">

plainify should probably strip out the contents of any non-displaying tags e.g. <style>, <script>, (any others?)

Enhancement Keep

Most helpful comment

If no one has started working on this, I would be interested in tackling this issue.

All 18 comments

I think this is an issue where the tags gets escaped before they get to plainify.

Did another test to confirm:

{{ plainify "One <span class='red'>red</span> apple. <style>.red {color:red;}</style>" }}

results in

One red apple. .red {color:red;}

Then I don't see how we can do anything about this ...?

Having looked at how the StripHTML function is implemented - I guess not.

If I understand correctly (I've never programmed in Go before), StripHTML walks through the string one char at a time, so there is no way for it to tell what type of tag it has encountered. I'm guessing it's done this way for speed? (Faster than using regexp?)

I have a workaround:

{{ "One <span class='red'>red</span> apple. <style>.red {color:red;}</style>" 
    | replaceRE "(<style.+?</style>|<script.+?</script>)" "" | plainify }}

I think the docs for plainify should state that it doesn't remove the innerHTML of any tags particularly