HTML filters convert text to Unicode, but Firefox still chooses language-based encoding.
https://vk.com/rbc
https://new.pikabu.ru/best


Cyrillic (Windows).
vk.com##^.current_text.Cyrillic (Windows).Thanks for the report, it's going to involve some investigation/thoughts.
uBO uses TextEncoder to convert back the modified DOM into a buffer array expected by the browser's StreamFilter. TextEncoder however does not allow to encode into anything else than utf-8. I made uBO add a <meta charset="utf-8"> at the top of the DOM to be sure the browser decode properly the data, however this does not seem to work in your repro case.
So I will need to investigate about what can be done, if anything.
Ultimately, I figure if all fail and nothing can be done when using TextEncoder, I might have to consider custom encoders, possibly using WebAssembly, if I want HTML filtering to be available to more than just utf-8-encoded pages.
I made uBO add a
Just checking. You meant <meta charset="utf-8">. = missing in your reply.
Ok initial findings suggest I will have to create encoders. For when a document is encoded in a charset for which there is no encoder, response data modification will have to be forfeited.
I think I will go on a usage basis. According to this page, probably worth to provide encoding for the top most likely:
charset | usage Jan 2018
---- | ----
UTF-8 | 90.5% (nothing to fix)
ISO-8859-1 | 4.3% (fixed)
Windows-1251 (case here) | 1.5% (fixed)
Shift JIS (#3399) | 0.8%
Windows-1252 | 0.7% (fixed)
GB2312 | 0.6%
... |
Windows-1250 (#3397) | 0.1% (fixed)
Total fixed | 97.1%
The Shift_JIS and GB2312 mapping will have to be loaded dynamically, on a per-need basis, these are huge character sets.
Most helpful comment
I think I will go on a usage basis. According to this page, probably worth to provide encoding for the top most likely:
charset | usage Jan 2018
---- | ----
UTF-8 | 90.5% (nothing to fix)
ISO-8859-1 | 4.3% (fixed)
Windows-1251 (case here) | 1.5% (fixed)
Shift JIS (#3399) | 0.8%
Windows-1252 | 0.7% (fixed)
GB2312 | 0.6%
... |
Windows-1250 (#3397) | 0.1% (fixed)
Total fixed | 97.1%