When I remove an indented element that makes up the whole line it preserves the line and the indentation. This makes a document with many remove() look dirty, since it's filled with indented whitespace on empty lines.
<span>lorem ipsum</span>
It should IMO remove the line if all the contents is gone.
We use Cheerio in Yeoman to do HTML manipulations in our scaffolder and it's great, but because of this, it leaves a lot of empty lines and trailing whitespace, which is annoying to the end-user. Hopefully this can be fixed soon :)
I'm not sure I understand the example, but a while ago we removed the tidying features. IMO it was feature creep and should be left to a tidy library.
If I remove() the div#test
<div>
<div id="test">dsf</div>
</div>
The resulting HTML is (with trailing whitespace):
<div>
</div>
The resulting HTML should be:
<div>
</div>
Or even
<div></div>
In this situation.
<div>
<div id="test">dsf</div>
</div>
This is a div element with three children. A textnode containing a newline and 4 spaces, a div element with an id of "test", and a textnode containing a newline. Removing the div child element does not (and should not) remove the text nodes.
I do understand what you want, you want the HTML resulting from Cheerio to be reformatted accoring to your preferred style. However this is not (currently) a functional goal of Cheerio, and is functionality that can best be achieved by processing the output of Cheerio with another function. I would recommend the mature and stable js-beautify) for HTML post-processing. It provides a number of options to format HTML to your standards
Ok, didn't think Cheerio concerned itself about textnodes.
I do however think that Cheerio should have an option in $.html() or something to run the html-beautify. I can't think of any scenario where I would want trailing spaces left in the source.
Right, but most node modules use cheerio to do screen scraping, where content, not source, is most important.
From a quick look at js-beautify and node-beautifier, it looks to be as simple as:
var html = $.html(),
beauty = beautify.html_beautify;
html = beauty(html);
Would be super simple to add to yeoman, and it would do a better job than something we hack together for cheerio.
Closing this issue, unless there becomes a more compelling reason to add a tidy.
Not saying it would be hard to add to Yeoman, obviously it's not, just would be a nice convenience in Cheerio, not having to evaluate the options, add as dep, import it, look up the API and the finally beautify, but whatever.