Powershell: Variable assigment to files ignores all encoding settings and uses ... ASCII?

Created on 15 Sep 2016  ·  5Comments  ·  Source: PowerShell/PowerShell

On PowerShell 5.1 and 6 we're able to control the encoding of the pipe redirection operators so we can get UTF8 encoding when we redirect to file, however, using variable syntax and assignment does not work.

Steps to reproduce

Setting (or appending) the content using variable notation and = or += doesn't respect any of the encoding settings.

PS C:\Users\Jaykul> ${C:gear.txt} = "$([char]0x263C)"
PS C:\Users\Jaykul> gc .\gear.txt -Raw -Encoding Byte | % { "{0:x2} " -f $_ }
3f 0d 0a
PS C:\Users\Jaykul> ${C:gear.txt}
?

Expected behavior

It should work the same way that pipe redirection works:

PS C:\Users\Jaykul> [char]0x263C > gear.txt
PS C:\Users\Jaykul> ${C:gear.txt}
☼
PS C:\Users\Jaykul> gc .\gear.txt -Raw -Encoding Byte | % { "{0:x2} " -f $_ }
ff fe 3c 26 0d 00 0a 00

In PS 5.1 and 6, I can change the default to UTF8, and it encodes it that way:

PS C:\Users\Jaykul> $PSDefaultParameterValues["Out-File:Encoding"] = "utf8"
PS C:\Users\Jaykul> [char]0x263C > gear.txt
PS C:\Users\Jaykul> gc .\gear.txt -Raw -Encoding Byte | % { "{0:x2} " -f $_ }
ef bb bf e2 98 bc 0d 0a

Actual behavior

Neither the DefaultParameterValues for Out-File/Set-Content/Add-Content, nor the $OutputEncoding are respected ...

Environment data

> $PSVersionTable

Name                           Value
----                           -----
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
GitCommitId                    v6.0.0-alpha.9
BuildVersion                   3.0.0.0
PSEdition                      Core
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
PSVersion                      6.0.0-alpha
WSManStackVersion              3.0
CLRVersion
Issue-Enhancement Up-for-Grabs WG-Language

Most helpful comment

I'm not bothered about whether it's done via cmdlet or not, as long as there is _some way_ to control the encoding when necessary. And honestly, I _personally_ only want control over the encoding because I want consistency -- I filed this because someone on slack asked if there was a way and didn't want to file it themselves when I observed there was no mechanism.

I think the syntax is particularly useful (and therefore worth improving with an encoding setting).

  • += doesn't inject line breaks the way >> does
  • I can append to the last line, rather than a new line
  • I can control whether the linebreaks are \r\n or \n
  • I can _get_ content with the same syntax
  • I can cast the data, modify it, and put it back...

I don't need encoding for this, but consider this as a simple example of the usefulness of the syntax. Keeping a persistent counter on disk:

(++[int]${C:counter})

All 5 comments

I'm curious why you think it's worth changing (improving?) this obscure syntax.

I did notice that a curious performance difference though, which does suggest changing the implementation to use a cmdlet _could_ have some value:

#153 PS> $x = "a"*100mb
#154 PS> measure-command { $x > .\foo.txt }

0.843s

#155 PS> measure-command { ${.\foo.txt} = $x }

1.443s

It was brought up in the community Slack last night and I agree its obscure but it should really just be a different way of performing file redirection.

File redirection is currently limited to writing files. Variable assignment can do much more.

So by that logic, should file redirection support everything that variable assignment does? I'd say probably not.

I'm not bothered about whether it's done via cmdlet or not, as long as there is _some way_ to control the encoding when necessary. And honestly, I _personally_ only want control over the encoding because I want consistency -- I filed this because someone on slack asked if there was a way and didn't want to file it themselves when I observed there was no mechanism.

I think the syntax is particularly useful (and therefore worth improving with an encoding setting).

  • += doesn't inject line breaks the way >> does
  • I can append to the last line, rather than a new line
  • I can control whether the linebreaks are \r\n or \n
  • I can _get_ content with the same syntax
  • I can cast the data, modify it, and put it back...

I don't need encoding for this, but consider this as a simple example of the usefulness of the syntax. Keeping a persistent counter on disk:

(++[int]${C:counter})

Intriguing, but the syntax indeed needs rescuing from obscurity: neither Get-Help about_Variables nor Get-Help about_Scopes mention it.

At the very least, given that it still works on Windows, it should work on Unix too, which appears not be the case as of v6-alpha16:

> Set-Location $HOME; ${/:foo.txt} = 'bar'
Access to the path '/:foo.txt' is denied.
...

Note that using /: as the namespace modifier is the analog to how the feature works on Windows: you use the _name of a PowerShell drive_ as the prefix. (More on that below.)

@lzybkr: That's why ${.\foo.txt} in your snippet doesn't actually create a file, but creates a regular variable literally named .\foo.txt, so your performance comparison doesn't apply (its results are surprising, but that's a moot point); try ${.\foo.txt} = 'bar'; Get-Variable '.\foo.txt'.


The PowerShell in Action, 2nd Edition book calls the feature _namespace variable notation_ and explains it as follows (emphasis added):

Along with the scope modifiers, the namespace notation lets you get at any of the resources surfaced in PowerShell as drives. For example, to get at the environment variables, you use the env namespace:
[...]
Using variable notation to access a file can be startling at first, but it’s a logical consequence of the unified data model in PowerShell. Because things like variables and functions are available as drives, things such as drives are also available using the variable notation. In effect, this is an application of the Model-View-Controller (MVC) pattern. Each type of data store (file system, variables, Registry, and so forth) is a “model.” The PowerShell provider infrastructure acts as the controller, and there are (by default) two views: the “file system” navigation view and the variable view. The user is free to choose and use the view most suitable to the task at hand.

This tells us that expression such as $env:PATH are an application of this technique - and probably the only application that is widely in use, without general awareness of the underlying mechanism.

In practice, as of v5.1, only the following drive providers support namespace notation:

  • Environment (Env:)
  • Function (Function:)
  • Alias (Alias:)
  • FileSystem (C:, ...)
  • Variable (variable:) - though virtually pointless, given that omitting the namespace accesses variables by default

Notably, the following standard providers do not, with PowerShell complaining about IContentCmdletProvider not being implemented: Registry, Certificate, WSMan.

That is, only those providers that support the IContentCmdletProvider interface, i.e., getting and setting an item's _content_, (implicitly) support namespace variable notation ; e.g.:

  • $env:HOME is the same as Get-Content env:HOME
  • $env:foo = 'bar' is the same as Set-Content env:foo bar

Perhaps another interesting application is in-place updating of files - all operations happen in-memory, and the original file is directly updated, but note that _the default encoding is invariably applied_:

${c:foo.txt} = 'hi'        # Write to file foo.txt.
${c:foo.txt} = ${c:foo.txt} -replace 'i', '@' # Update foo.txt *in place*

In addition to the noted inability to control the character encoding when writing to files with namespace notation, namespace notation's general limitation of only supporting _literal_ paths means that you're limited to targeting files with literal names / paths - either relative to the current location or with absolute paths.

Was this page helpful?
0 / 5 - 0 ratings