Issue Type: Bug
This issue will happen when code is copied from an external source such as the internet.
At C:\Scripts\fail.ps1:2 char:12
+ New-Item –Path $env:TEMP\hello2.txt
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
The string is missing the terminator: ".
Extension version: 1.7.0
VS Code version: Code 1.22.2 (3aeede733d9a3098f7b4bdc1f66b63b0f48c1ef9, 2018-04-12T16:38:45.278Z)
OS version: Windows_NT x64 10.0.15063
System Info
|Item|Value|
|---|---|
|CPUs|Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz (4 x 2808)|
|Memory (System)|15.92GB (7.72GB free)|
|Process Argv|C:\Program Files\Microsoft VS Code\Code.exe|
|Screen Reader|no|
|VM|0%|
When the file created above is opened in PowerShell ISE line 2 is highlighted and the En dash is shown as:
–
e.g.
New-Item -Path $env:TEMP\hello1.txt
New-Item –Path $env:TEMP\hello2.txt
Vscode displays the above as:
New-Item -Path $env:TEMP\hello1.txt
New-Item –Path $env:TEMP\hello2.txt
Thanks for this! Just curious, what do you expect to happen? That PowerShell would be able to handle the En dash? Or that VSCode was more outward on the fact that it is an en dash?
For me if I use an en dash or em dash it does break highlighting which is an indicator of the wrong dash being used although not super in your face. Seems like something that should perhaps be handled by PSScriptAnalyzer though as an en/em dash could be valid in a string and with positional parameters "\ PSScriptAnalyzer could have some sort of "potential illegal dash" warning that could be suppressed when it's intentional.
That's a great idea @omniomi 👍
We should open an issue on:
https://github.com/PowerShell/PSScriptAnalyzer
to create such a rule.
To clarify, is the problem that an em-dash is working like a dash, or that it's not working like a dash? Or is the problem that the encoding is bad and we're displaying it not as a dash but as a nasty character sequence?
PowerShell accepts all dash variants equally by design. So I guess there's a problem if we're not doing that.
If I type:
Get-Item —Path ~
And execute it with F8, it works.
But we don't get any completions on it, so we should definitely fix the regex/completions mechanism there.
@gbuktenica Can you explain what the hold-Alt functionality does and what provides it?
@rjmholt The ALT function is just to show you how to recreate the problem. A developer would never normally do this. Normally this issue arises when a segment of code is copied from an external source, say github, stackoverflow or more likely a blog and during the copy and paste a valid dash is converted to an en dash. Potentially the original hosting platform had already converted a dash.
Similar behavior can be seen with double quotes
Type Write-Output "Blah" into Microsoft word then copy and paste it into Vscode and save.
Word has converted the double quotes into start and finish double quotes.
Doing the same into PowerShell ISE and the start and finish double quotes are correctly converted back to normal double quotes.
@omniomi and @tylerl0706 I have raised a request on PSScriptAnalyzer.
New Rule: Detect common Unicode character substitutions.
https://github.com/PowerShell/PSScriptAnalyzer/issues/981
So I've done some research and think I finally understand the Alt-code thing as a Windows CP1252 code-point entry feature, except that I can't find any specification of how it's supposed to function these days (i.e. how much code-point translation it does).
In CP1252, en-dash is code-point 150, but it looks like VSCode converts that to UTF-8, where it is encoded as 0xE2 0x80 0x93. Those bytes, when interpreted as CP1252, render as – (see the CP1252 link above again).
Because I haven't been able to reproduce the Alt-code input, I don't know if VSCode is displaying – into the buffer, or putting – in as it should be. If it's the first case, it looks like that would be a bug in VSCode not dealing with Windows Alt-codes. In the second case, that seems like pretty reasonable behaviour — converting the Alt-code to the correct encoding to display the intended glyph in the document. The third possibility, that VSCode is displaying the en-dash but sending – to EditorServices for execution, is especially bad. But would definitely be a bug in VSCode.
But I did copy and paste an em-dash (—) into the buffer and it worked properly — recognised as a dash by the PowerShell parser/tokenizer, so that New-Item —Path ~/myfile.ps1 works when I press F8 (just not completions or syntax highlighting, which we should fix).
In terms of other applications converting characters and then copying them back into VSCode, I don't really see how we can filter characters copied into the document. Even if VSCode allows the extension to do it, PowerShell supports unicode strings, so backquotes and other things generated by MSWord are legitimate string contents.
Also, PowerShell already supports using left/right double quotes and en/em-dashes where ASCII quotes and dashes are conventionally used respectively. That is, they're not wrong or invalid — you're allowed to use them.
Also @gbuktenica, hello from a fellow sandgroper.
@rjmholt Yes, I included the Alt-code thing so that a Windows developer recreate the Unicode character I had the issue with (em-dash).
My issue arose when I copied example code into a ps1 file using vscode and one of the dashes was an em-dash. Another developer using PowerShell ISE pulled my code from source control and had the execution error above and the syntax highlighting showed the dash as –
Ah, I see!
Well, for completeness, VSCode lets you change your file encoding.
And it also looks like there's a way to make the ISE use UTF-8.
So now that we have an issue on PSSA, do we have any action to take in the extension? I think no - and we can probably close this.
@rjmholt and I are discussing this. I think we generally agree that this is already being handled correctly in PowerShell, and that we shouldn't automatically transform what people paste into VS Code.
That being said, it does make sense to me to have a PSSA rule that can help detect where these characters are showing up in your file (and said PSSA rule can have a "quick fix" in VS Code that allows you to snap to one specific dash universally). However, I wouldn't place it at a particularly high priority.
Most helpful comment
That's a great idea @omniomi 👍
We should open an issue on:
https://github.com/PowerShell/PSScriptAnalyzer
to create such a rule.