Cheerio: Keeps line and indentation on remove()

Created on 17 Aug 2012  路  7Comments  路  Source: cheeriojs/cheerio

When I remove an indented element that makes up the whole line it preserves the line and the indentation. This makes a document with many remove() look dirty, since it's filled with indented whitespace on empty lines.

    <span>lorem ipsum</span>

It should IMO remove the line if all the contents is gone.

All 7 comments

We use Cheerio in Yeoman to do HTML manipulations in our scaffolder and it's great, but because of this, it leaves a lot of empty lines and trailing whitespace, which is annoying to the end-user. Hopefully this can be fixed soon :)

I'm not sure I understand the example, but a while ago we removed the tidying features. IMO it was feature creep and should be left to a tidy library.

If I remove() the div#test

<div>
    <div id="test">dsf</div>
</div>

The resulting HTML is (with trailing whitespace):

<div>

</div>

The resulting HTML should be:

<div>
</div>

Or even

<div></div>

In this situation.

<div>
    <div id="test">dsf</div>
</div>

This is a div element with three children. A textnode containing a newline and 4 spaces, a div element with an id of "test", and a textnode containing a newline. Removing the div child element does not (and should not) remove the text nodes.

I do understand what you want, you want the HTML resulting from Cheerio to be reformatted accoring to your preferred style. However this is not (currently) a functional goal of Cheerio, and is functionality that can best be achieved by processing the output of Cheerio with another function. I would recommend the mature and stable js-beautify) for HTML post-processing. It provides a number of options to format HTML to your standards

Ok, didn't think Cheerio concerned itself about textnodes.

I do however think that Cheerio should have an option in $.html() or something to run the html-beautify. I can't think of any scenario where I would want trailing spaces left in the source.

Right, but most node modules use cheerio to do screen scraping, where content, not source, is most important.

From a quick look at js-beautify and node-beautifier, it looks to be as simple as:

var html = $.html(),
    beauty = beautify.html_beautify;

html = beauty(html);

Would be super simple to add to yeoman, and it would do a better job than something we hack together for cheerio.

Closing this issue, unless there becomes a more compelling reason to add a tidy.

Not saying it would be hard to add to Yeoman, obviously it's not, just would be a nice convenience in Cheerio, not having to evaluate the options, add as dep, import it, look up the API and the finally beautify, but whatever.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dandv picture dandv  路  5Comments

rajkumarpb picture rajkumarpb  路  3Comments

trevorfrese picture trevorfrese  路  4Comments

collegepinger picture collegepinger  路  3Comments

francoisromain picture francoisromain  路  5Comments