URL preview does not convert charset for pages with non-unicode encoding.
Sample URL that shows the problem: https://freak.no/forum/forumdisplay.php?f=45
The error is shown in the meta-tags "description" and/or "og:title". One of these is used for the URL preview which comes out wrong.
That particular site sends content-type: text/html; charset=ISO-8859-1
So the fix would be converting it to UTF-8.
Hi,
I looked into this issue and the only way I see it can be fixed is the following:
Get page encoding from html head (should be no problem with the already used cheerio lib) and maybe use the already added is-utf8 lib
Use iconv to convert if not utf-8
This would however add some overhead.
I looked at some popular norvegian sites and they all use utf-8. This particular example seems like an older site.
@xPaw what is your opinion?
The way I see this is there are two options:
I lean towards the edge case too.
Unfortunately a bunch of japanese sites are still not using utf8 so I see garbled text a bit more frequent than I'd like.
Those are just the places that I either frequent or just happen to have the tabs open.