The files.autoGuessEncoding=true doesn't work well in some circumstances.
I think that would be good if you guys add some features like files.forceEncoding="encode1:encode2,encode3:encode4".
So it can force 'encode1' to 'encode2'. That's a solution for wrong encoding detection I think.
Yes, I'm totally agree because It is so weak for auto guess.
Add a candidate may be better!
For me, of may be Many Chinese Coder, only UTF-8 and GB18030 are most commonly meet, but auto-guess give me the Windows 1532??? I think is is easier to detect in users' encoding candidates.
I agree. In my environment we have files in two encodings - UTF-8 and Windows1251 (most popular text file encoding in Russia), so I need to use encoding detection. However, it sometimes detects windows1251-encoded files as "maccyrillic" or "Windows1252" or some other encoding that I've never seen in my life :D
Definitely need a setting like
files.detectEncodings=["utf8","windows1251]
So instead of just "true", you can specify which encodings you want it to detect from. As far as I know, encoding detection works based on probabilities (you can't 100% say which files is which encoding, so the software has to pick the most probable answer), so I think it is possible to implement - just filter out the list of possible encoding to those user selected.
Verification: There is now a files.guessableEncodings setting where you can fill in encodings to support when guessing. From the explanation: If provided, will restrict the list of encodings that can be used when guessing. If the guessed file encoding is not in the list, the default encoding will be used.
Update: I decided to rename the setting to files.guessableEncodings
@bpasero With these settings:
"files.autoGuessEncoding": true,
"files.guessableEncodings": [
"gbk"
]
I still get this file as UTF-8. It is in gbk encoding with two Chinese characters.
@octref you have to use a file that jschardet can detect properly. In your case it tells me:

So it makes sense that UTF-8 if used
To verify you can use src/vs/base/test/node/encoding/fixtures/some.cp1252.txt with CP1252 encoding!
@bpasero I see, the logic is
files.guessableEncodingsBut I would argue this doesn't solve the users' problems. Let's say the user has a bunch of files that he knows is gbk encoding, but jschardet could have guessed either of these:

If the user wants all files to be opened as gbk. This setting would not work for him.
The original request is more for being able to set fallbacks. For example,
gb2312, gb18030, fall back to gbk.utf-8.A setting like this would be more useful:
{
"files.encodingAssociations": {
"gbk": ["gb2312", "gb18030"],
"cp950": ["big5hkscs"]
// Everything else falls back to "utf-8"
}
}
Maybe someone from this issue could comment if that was the desired solution or not (@JasonJunMa).
@bpasero in the implementation from original pull request, the encoding falls back to the first one in the list instead of utf-8. It was not a great solution, definitely. I consider that @octref solution will resolve an issue.
It looks like @JasonJunMa and @phobos2077 both made different suggestions and the current solution is more towards https://github.com/Microsoft/vscode/issues/36951#issuecomment-344534006 while https://github.com/Microsoft/vscode/issues/36951#issue-268634326 is more towards https://github.com/Microsoft/vscode/issues/36951#issuecomment-425162895
Since we are late for the endgame and the feature is not clear, I will remove it from the release until we figured out what is the best solution.
I'm facing this issue too since a large portion of codebase I work with is encoded with windows-1251 but often guessed as maccyrillic.
From my point of view, @octref's solution is bulletproof but requires a user to learn what encoding will be guessed by jschardet for pretty much each file in the codebase and fine-tune preferences every time new false positive encoding is guessed. I think that this behaviour can be implemented as a temporary solution.
From my point of view the best solution is to fork jscardet and make it return a list of possible encodings with a probability for each encoding. Then we can make a new setting (something like files.preferableEncodings) which represents an encoding list and if encoding from this list passes a certain threshold (which also may be configurable) it's chosen instead of the most probable for opening the file. I think this solution will cover most of the cases, but if not, a user can fallback to files.encodingAssociations setting proposed by @octref.
@bpasero @octref what do you think about this solution. Use the same settings as was previously implemented (a single list of encodings), but the last one on the list will be used as a fall back? It makes sense in terms of my original suggestion (narrow down the list of possible encodings to only the ones you need). But it is not as flexible as @octref suggestion.
Edit: noticed this was already suggested before... How about this:
This should be easier to set up for most cases (like my case), but at the same time flexible enough for more complicated cases.
I use only 2 types of coding: UTF-8 and Windows-1250 (Central European ANSI code page)
I setted the Auto Guess encoding = True
The problem is that the Visual Studio Code incorrectly detects Windows-1250 as ISO 8859-2 and some letters are not displayed correctly.
What and where should I set files.guessableEncodings to use Windows-1250 (polish letters)?
I have the same use case as @Tomek-PL, we only use either utf-8, windows-1250 or windows-1252. Files get detected as ISO 8859-7 rendering characters incorrectly.
Neither files.restrictGuessedEncodings or files.guessableEncodings work.
Click "upvote" in the first post. This will increase the chance that someone will take care of it
HI, all;
What I need is just like fileencodings in vim (see https://vim.fandom.com/wiki/Working_with_Unicode );
It just give a ordered encoding list to let the vim test. I think it can solve the most ambiguous encoding detecting, as I haven't get mess when I use vim with correct setting.
for example, I only use GB18030 and UTF8, so I set as following in .vimrc
fileencodings=gb18030,utf8
I think it is trivial to Impl it. @octref make a bit complex logic, but in my view it may not needed.
@bpasero 's impl may be ok if let the guess list ordered as define order (But I haven't see the impl in vscode release)
Overall, we may
A coarse suggestion, forgive me if error or bother. Thanks.
I just wanna say, the general issue here is that VSCode guesses encodings that are - from a human perspective - unlikely to appear in the user's environment.
I like @memeda's approach with the ordered list, that way you can specify what's most likely and VSCode takes that into account when guessing. It's just teaching the tool what's common sense to the user.
Think like humans would interact:
That's IMHO the smartest and most user friendy way.
I am also patiently waiting for this feature, at work we only use Windows-1252 and UTF-8, but VS Code keeps guessing Greek or maccyrillic or whatever.
Please click up-vote to this thread. This will increase the chance that someone will take care of it
Why ist this still open? It's so annoying. The solution from 2,5 years ago would have been great...
It looks like @JasonJunMa and @phobos2077 both made different suggestions and the current solution is more towards #36951 (comment) while #36951 (comment) is more towards #36951 (comment)
Since we are late for the endgame and the feature is not clear, I will remove it from the release until we figured out what is the best solution.
@bpasero
We need this https://github.com/microsoft/vscode/issues/36951#issuecomment-600964911
I am currently not able to catch up on this, but if someone can come up with a reasonable PR that includes the outcome of the discussions we had, then I can try to review it, time permitting.
It is issue grooming month and I am looking into this issue to understand the latest thinking. There are different proposals here but I think my attempt I did initially showed that e.g. something like VIMs fileencodings config will not work, because of this case:
Let a user configure fileencodings: "gbk", "utf8". Let the user open a gbk file that jschardet wrongly detects as something else. Now we would use utf8 and not gbk because that other encoding is not in the list and also not wanted.
Bottom line, unless jschardet changes to a different model or we switch to another encoding guessing library, I do not really see how VSCode can solve this?
PS: I would like to merge https://github.com/microsoft/vscode/issues/84503 and this issue into one as I think they are very similar.
As I think, the most ideal way is the chardet lib itself can guess in a certain range of encoding.
Otherwise if the lib can return a list of guessing result with confidence value, filter by user's setting "fileencodings".
When the lib can only return one result and not in "fileencodings", which seems to be current case, if not change the lib, maybe show a notice saying the guess fails? It's not really solving the problem, but it's better than now.
Most helpful comment
I agree. In my environment we have files in two encodings - UTF-8 and Windows1251 (most popular text file encoding in Russia), so I need to use encoding detection. However, it sometimes detects windows1251-encoded files as "maccyrillic" or "Windows1252" or some other encoding that I've never seen in my life :D
Definitely need a setting like
files.detectEncodings=["utf8","windows1251]So instead of just "true", you can specify which encodings you want it to detect from. As far as I know, encoding detection works based on probabilities (you can't 100% say which files is which encoding, so the software has to pick the most probable answer), so I think it is possible to implement - just filter out the list of possible encoding to those user selected.