Vscode: Search with non-standard encodings not supported

Created on 8 Feb 2019  路  29Comments  路  Source: microsoft/vscode

screenshot_20190208_122607
Issue Type: Bug

Set worskpace encoding to cp437.
Do a worskspace search for anything.

The search box is surrounded in red, a popup appears underneath it saying "Unknown encoding: cp437".

I had the same problem once and found I had to unset the option search.useRipgrep to have it working. That worked. But now, I have a warning on this preference that says "deprecated" and to use pcre (which doesn't work).

That's a regression.

VS Code version: Code 1.31.0 (7c66f58312b48ed8ca4e387ebd9ffe9605332caa, 2019-02-06T08:51:24.856Z)
OS version: Linux x64 4.15.0-1032-oem


System Info

|Item|Value|
|---|---|
|CPUs|Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz (8 x 2874)|
|GPU Status|2d_canvas: enabled
checker_imaging: disabled_off
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
native_gpu_memory_buffers: disabled_software
rasterization: disabled_software
surface_synchronization: enabled_on
video_decode: unavailable_off
webgl: enabled
webgl2: enabled|
|Load (avg)|2, 3, 3|
|Memory (System)|15.39GB (1.47GB free)|
|Process Argv||
|Screen Reader|no|
|VM|0%|

Extensions (20)

Extension|Author (truncated)|Version
---|---|---
project-manager|ale|10.3.2
quitcontrol-vscode|art|3.0.0
better-toml|bun|0.3.2
whitespace-plus|dav|0.0.5
mustache|daw|1.1.1
gitlens|eam|9.5.0
EditorConfig|Edi|0.12.8
githd|hui|2.1.0
rpm-spec|Lau|0.2.3
vscode-duplicate|mrm|1.2.1
indent-rainbow|ode|7.2.4
vscode-subword-navigation|ow|1.2.0
vscode-docker|Pet|0.5.2
rust|rus|0.5.3
whitespace|san|0.0.5
crates|ser|0.3.6
code-settings-sync|Sha|3.2.4
local-history|xyz|1.7.0
plsql-language|xyz|1.7.0
markdown-all-in-one|yzh|2.0.1


feature-request search upstream

Most helpful comment

It is not the solution to say "it is no longer supported". How do we do now, that's the question!
The workaround proposed (utf8) is not a solution. We have accented characters in our codes, and the cp437 encoding search is a real need.

All 29 comments

Yes, searching this encoding is no longer supported. We could default to searching it as utf8 which would work for a-z at least, if that helps.

Yes I can't change the workspace encoding to utf8 because file contents would be mangled. So only the search should change.
But then without a collation (I think that's the term) to flatten for example 茅, 猫 锚 and 毛 to e (and also capitals), the search would fail in most cases, so I don't know to what extent it would be useful.

It is not the solution to say "it is no longer supported". How do we do now, that's the question!
The workaround proposed (utf8) is not a solution. We have accented characters in our codes, and the cp437 encoding search is a real need.

Yes I can't change the workspace encoding to utf8 because file contents would be mangled

I mean that you could convert the actual encodings of the files. Unfortunately I don't have another solution for this right now.

If it could help:

When typing a letter and then type ENTER, it displays the results for less than one second:
image

And then the message "Unknown encoding : cp437" appears:

image

For information: the search with encoding cp437 worked until last release

You are seeing some results from files that you have open because there we just search in the open buffer. But it is not able to search the full workspace.

Will be great if this feature will be added. I need it for CP 852.
Anyway the vscode is awesome!

Why is it no longer supported when it's needed very often? How are we expected to go around it?
The VS Search function is basically useless atm

I use Windows-1252 encoding in some projects.
The files are being displayed and saved correctly.
The search shows broken characters every once in a while.

I am having this issue as well. Changing the encoding of the source files is not an option because, among other reasons, they contain strings that work with certain legacy devices (serial printers, displays, etc.) that only understand a particular encoding. Sure, I could add translation functions to those cases, convert everything to UTF8 to preserve comments, but that is considerable additional work just to accommodate to a feature that stopped working in the text editor we were using.

Same here for CP850 - this is really bad, since i have to seek for Regex-Patterns and can't seek for my language specific characters due to this issue.

I've just updated my VsCode to "April 2019 (version 1.34)", and the issue is still there...
Can we hope a solution for that?

2019-05-17_10-08-31

set files.encoding by language is work for me.
Just add below setting in .vscode/settings.json.
"[c]": {
"files.encoding" : "cp950"
},
"[cpp]": {
"files.encoding" : "cp950"
}

@MaiGuybrush That way it works only partially. Git diff support and multi-file searching are broken for files with those encodings then.

@Somnium7 I confirm !

The issue is still there

I had the same problem. For me _maccenteuro_ was my non-standard encoding. It is used in specific language files where enconding just can't be changed no matter what.

Solution (in settings.json):
1) I had default/global parameter _"files.encoding": "maccenteuro",_ - changed it back to default _"files.encoding": "utf8",_
2) I had _"search.useRipgrep": false,_ - commented that out
3) and under [myspecificlanguage] add that _"files.encoding": "maccenteuro"_

multi-file search works, open file search works, file is still in _maccenteuro_ and special characters didn't break
other language file searching (like .js for example) isn't affected

--
Make sure you don't have extra .vscode/settings.json in your source code folder. If you do, then either remove it or apply/merge the changes from global settings.json(user folder) to this .vscode/settings.json file as well.
I didn't have anything meaningful in my .vscode/settings.json so i just removed it.

Hi @gjans !

It works!

My problem was finally because I have two "settings.json"
The first is located in [Drive]:\USERS[User]AppDataRoaming\Code\User
The second is in the folder .vscode located in the root folder of the git directory

The modification given by @gjans must be done in both files!

@JSchiffmacher, happy this worked for you.
Good point about folder-specific settings - i've updated my previous message with extra info about that as well.

The problem for me it's that I don't have an encoding per language but per project. Some legacy C++98 projects are in CP437 because the platform are related to that encoding (serial printers, segment displays, etc.); new platforms use UTF8 and C++17, for example. So encoding settings per project are mandatory for me.

@Trucoto, if projects can be referenced as folders. Then you should try creating a separate .vscode/settings.json file for each folder(project). Don't use [language] sub-group, just put a single parameter _"files.encoding": "CP437"_.
As I understand this should overwrite your default encoding setting, but only inside the scope of a specific folder.

@gjans, that's how I have it. It works searching within a file, but not find in files (it retorts "Unknown encoding: cp437"). I tried commenting out "search.useRipgrep": false and "search.useLegacySearch": true but to no avail.

@gjans That does not work entirely. Search only works for string without accented characters. Also git diff incorrectly shows rows with accented characters as changed. Earlier search.useRipgrep": false worked, but not now.

@Somnium7 : search with accented characters is partially functional: it does not return all the files containing search string with accented characters. It's still better than before because before that did not work.

image

In the search above, I should have more than 1000 results ... and I only have 4!

People, please put "thumbs up" on this issue, so VS Code team can see it's important!

I had the same problem. For me _maccenteuro_ was my non-standard encoding. It is used in specific language files where enconding just can't be changed no matter what.

Solution (in settings.json):

  1. I had default/global parameter _"files.encoding": "maccenteuro",_ - changed it back to default _"files.encoding": "utf8",_
  2. I had _"search.useRipgrep": false,_ - commented that out
  3. and under [myspecificlanguage] add that _"files.encoding": "maccenteuro"_

multi-file search works, open file search works, file is still in _maccenteuro_ and special characters didn't break
other language file searching (like .js for example) isn't affected

--
Make sure you don't have extra .vscode/settings.json in your source code folder. If you do, then either remove it or apply/merge the changes from global settings.json(user folder) to this .vscode/settings.json file as well.
I didn't have anything meaningful in my .vscode/settings.json so i just removed it.

I tried that, but it only gives results from the files that I've opened. Is there any other setting to change? I mainly use java and javascript:

{
    "workbench.startupEditor": "newUntitledFile",
    "workbench.colorTheme": "Oceanic Next",
    "explorer.confirmDragAndDrop": false,
    "explorer.confirmDelete": false,
    "window.zoomLevel": 0,
    // "search.useRipgrep": true,
    "files.encoding": "utf8",
    "java.errors.incompleteClasspath.severity": "ignore",
    "terminal.integrated.rendererType": "dom",
    "editor.suggestSelection": "first",
    "vsintellicode.modify.editor.suggestSelection": "automaticallyOverrodeDefaultValue",
    "java.configuration.checkProjectSettingsExclusions": false,
    // "editor.formatOnSave": true,
    // "prettier.trailingComma": "es5"
}

@DOHere You cannot do anything to allow it to work completely (aside from converting all your files to standart encoding, which is not possible in many cases). That's the point of this issue.

@Somnium7 thanks for letting me know
rather than converting them to standard encoding, all of my scripts should have utf8 encoding, so shouldn't it work?

@DOHere I meant UTF8 as standart encoding. When dealing with legacy software (and hardware), it's often not possible to convert your codebase to UTF8 because then it won't work.

Just had it today. I commit files with Windows-1252 encoding to a git repo.
Left is version in git, right is local version. I did not touch that line; the left pan just uses the standard encoding UTF-8 when reading the file from history; not the one I set in setting.json which reads

{
    "[cfml]": {
    "files.encoding": "windows1252"
  }
}

grafik

Was this page helpful?
0 / 5 - 0 ratings