Notepad3: What is wrong with auto detect encoding ?

Created on 28 Jan 2020  路  9Comments  路  Source: rizonesoft/Notepad3

On Win7 x64 with v5.20.127.2715

I don't know if the problem is when i (firefox) save the file or when open...

Check image

1掳 Just " 铆 "
ImgSnap - 28-01-20 ~ 03 09 30

2掳 "Subt铆tulos"
ImgSnap - 28-01-20 ~ 03 12 30

3掳 Now change again, writing (bad gramatic) "S煤btitulos"

ImgSnap - 28-01-20 ~ 03 12 57

2 and 3 with same text, just changing two letters...

With notepad++
ImgSnap - 28-01-20 ~ 03 02 49

With notepad 2 fork is working fine.

Files for test
files.zip

encoding detection

All 9 comments

Hello @RaiKoHoff ,
I's the same issue with a "Single UTF-8 Character" like #1831 and #1848 馃槦

2020-01-28_100832

The Encoding Detection is based on byte-sequence analysis (frequency counts of occurrences in typical text-files of this encoding) and probabilities for unknown text to fit their byte-sequences into that encoding.
To base a probability on a single character in the text will be a bad idea. So if you have lots of those kind of text, it would make sense to switch OFF the the ANSI Encoding Detection (ore at least increase the confidence threshold).
So if you work mostly witrh UTF-8 files, it makes sense to switch OFF the ANSI Encoding Detection.
image
or
[Settings2] AnalyzeReliableConfidenceLevel=85

With [Settings2] AnalyzeReliableConfidenceLevel=85 work fine.

With [Settings2] AnalyzeReliableConfidenceLevel=85 work fine.

Hello @Mitezuss ,
I read your previous message (deleted) with the problem that the bad encoding remains after having changed the parameter "AnalyzeReliableConfidenceLevel=85" in Notepad3.ini.

This is due to the setting "Remember Recent Files" ("Recordar archivos recientes"), that Notepad3.ini keeps a list with the name, path and encoding (incorrect or correct) of each file that has been opened.

To resolve this issue, it is not necessary to manually modify Notetepad3.ini.
Simply empty this list by "unchecking" "Remember Recent Files", quit Notepad3, then reopen Notepad3.
After that, if you wish, you can again "check" "Remember Recent Files".

2020-01-28_232514

@hpwamr Oh, i see. Good know that.

Hello @Mitezuss ,

Feel free to test the BETA version "Notepad3Portable_5.20.131.2720_BETA.paf.exe.7z" or higher.
See "Notepad3 BETA-channel access #1129" or here Notepad3Portable_5.20.131.2720_BETA.paf.exe.7z.

Note: "Notepad3Portable BETA" can be used in "2 flavors" (with or without the extension ".7z").

Your comments and suggestions are always welcome... 馃槂

So, do not need set that setting?

Increasing the "out-of-the-box" default confidence threshold (reliability of detection) means:
Using only "Encoding Detector" results if it is very sure of that encoding.
This is related to the fact, that the Encoding Detector is optimized on prosaic texts, but Notepad3 is targeting more scripting and development source code, mostly a mix of ASCII keywords with (sometimes) language related strings ...

So, do not need set that setting?

AnalyzeReliableConfidenceLevel=92 is now the new internal value by default.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bravo-hero picture bravo-hero  路  3Comments

hpwamr picture hpwamr  路  3Comments

hpwamr picture hpwamr  路  4Comments

craigo- picture craigo-  路  4Comments

RaffaeleBianc0 picture RaffaeleBianc0  路  3Comments