Cheerio: Get text with html tags replaced by white spaces

Created on 9 Jul 2019  路  3Comments  路  Source: cheeriojs/cheerio

Today we're using .text()

but when html is

<div>
  By</div><h2 class="authorh2">John Smith</h2></div>
</div>

Visually on the page, the /div after the word "by" ensures there is a space or a line break.
but when applying cheerio text(), we get as result

ByJohn smith=> which is wrong.

Generally speaking, is it possible to get the text but in a little special way so that ANY html tag is replaced by a white space. (We're ok to trim afterwards all multiple whites spaces ...)

We'd like to have as output By John smith
thanks

Not a bug

Most helpful comment

I know that it would not be consistent with .text() but isn't there another way within the richness of cheerio methods to achieve this ?

All 3 comments

This would break legitimate content, such as <a href="">My great link</a>!. Unfortunately, this is not something we can fix on our side.

I know that it would not be consistent with .text() but isn't there another way within the richness of cheerio methods to achieve this ?

The question might need to be rephrased but the point stands, currently facing the same problem trying to get all

tags, there are no spaces or breaks and there is no way to do so without doing specific code per tag since the text itself its in an array and the position changes based on the tags contents

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rajkumarpb picture rajkumarpb  路  3Comments

Canop picture Canop  路  3Comments

tndev picture tndev  路  4Comments

miguelmota picture miguelmota  路  3Comments

clayrisser picture clayrisser  路  4Comments