Vscode-powershell: Incorrect handling of Umlaut when debugging with PowerShell Extension

Created on 5 Feb 2019  Â·  7Comments  Â·  Source: PowerShell/vscode-powershell

Issue Type: Bug

debugging this code

$MyHashTable = [ordered]@{}
$MyHashTable.Add("Key1","Value1")
$MyHashTable.Add("KeyÄ2","ValueÄ2")
$MyHashTable

in VSCode with PowerShell Extension, incorrectly throws the following exceptions:

At C:\Untitled-1.ps1:3 char:24
+ $MyHashTable.Add("KeyÄ2","ValueÄ2")
+                        ~
Missing ')' in method call.

At C:\Untitled-1.ps1:3 char:24
+ $MyHashTable.Add("KeyÄ2","ValueÄ2")
+                        ~~~~~~~~~~~~~
Unexpected token '2","ValueÄ2"' in expression or statement.

At C:\Untitled-1.ps1:3 char:37
+ $MyHashTable.Add("KeyÄ2","ValueÄ2")
+                                     ~
Unexpected token ')' in expression or statement.

However the code runs fine in PowerShell and debugs correct in ISE.

Extension version: 1.11.0
VS Code version: Code 1.30.2 (61122f88f0bf01e2ac16bdb9e1bc4571755f5bd8, 2019-01-07T22:54:13.295Z)
OS version: Windows_NT x64 10.0.17763


System Info

|Item|Value|
|---|---|
|CPUs|Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz (8 x 2712)|
|GPU Status|2d_canvas: enabled
checker_imaging: disabled_off
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
native_gpu_memory_buffers: disabled_software
rasterization: enabled
video_decode: enabled
video_encode: enabled
webgl: enabled
webgl2: enabled|
|Memory (System)|31.92GB (23.73GB free)|
|Process Argv||
|Screen Reader|no|
|VM|0%|


Area-Debugging Area-Documentation

Most helpful comment

@SydneyhSmith that is actually fixing the problem. If I manually set the file to UTF8 Encoding debugging and running gives the same correct output. VSCode opened the file as UTF8 with BOM which then produces the error.
So I'm closing this thread.
Thanks for your support!

All 7 comments

Thanks @MarcusLerch this seems like the same issue as #1680
We need to figure out how to properly configure the encoding settings and then document that.

This occurs because your PowerShell encoding and VSCode encoding are not configured the same way; your script file is encoded in UTF8 and PowerShell is trying to read it in Latin-1 which is a setting that the PowerShell extension cant see or change. Can you try this https://github.com/PowerShell/vscode-powershell/issues/1680#issuecomment-453278540 and let us know if it works for you?

@SydneyhSmith that is actually fixing the problem. If I manually set the file to UTF8 Encoding debugging and running gives the same correct output. VSCode opened the file as UTF8 with BOM which then produces the error.
So I'm closing this thread.
Thanks for your support!

@MarcusLerch would you be able to share the output of $PSVersionTable? I'm interested in the fact that the BOM seemed to cause rather than resolve the problem.

My reasoning is:

  • UTF-8 is generally a good choice for non-ASCII characters, since the extended ASCII encoding for each character set is often different (i.e. latin-1 might solve the problem for 'ü' but not for Cyrillic characters, whereas UTF-8 solves it once and for all).
  • UTF-8 without BOM has become the dominant standard encoding scheme, but looks just like ASCII or latin-1 to things like Windows PowerShell. You can try to change the settings of that, but I see mixed reports about that working.
  • A lot of .NET BCL reader types (like System.Text.TextReader) look for a BOM, and WIndows PowerShell automatically "just works" when it sees a BOM like this (possibly why the configuration settings are so hard to make work)

@rjmholt sure, here you go

PS C:\> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.1.17763.134
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.17763.134
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

Taken from the VSCode PowerShell Terminal.
IF I can check anything else to help find the cause just let me know.

Just tried to reproduce the behavior and @rjmholt you are correct.
UTF8 with BOM solves the problem not UTF8.
I saved the file as UTF8 with BOM and now it runs and debugs correct!
I then saved it as UTF8 and debugging now longer works.
So here are the steps to reproduce the error:

  1. Create a new file in VSCode
  2. paste the follwing PowerShell code:
$MyHashTable = [ordered]@{}
$MyHashTable.Add("Key1","Value1")
$MyHashTable.Add("KeyÄ2","ValueÄ2")
$MyHashTable
  1. Save the file as PS1 file and VSCode automatically saves with UTF8 Encoding
  2. Set a breakpoint and start debugging -> you get the Error
  3. Save the file with encoding set to UTF8BOM -> everything works fine

I then saved it as UTF8 and debugging now longer works.

Yeah I thought this might be the case -- basically the PowerShell tokenizer will detect the BOM and reconfigure itself. The .NET StreamReader classes all do something like this, and it often causes headaches.

When there's no BOM, PowerShell assumes by default that the encoding is latin-1/CP-1252 and that's when things break.

You can try to configure PowerShell's encoding (@rkeithhill being a StackOverflow legend as always!), but in versions prior to 6, trying to get PowerShell to keep and honour your encoding settings is quite a dance.

So a BOM is the easiest way to be sure.

PowerShell 6+ defaults to UTF-8 without a BOM (but will happily accept a BOM as well), so you'll likely find this isn't a problem in PowerShell 6+.

Was this page helpful?
0 / 5 - 0 ratings