Set-Content -Value $null -Path .\zero.txt -NoNewline -Encoding Ascii
(get-item .\zero.txt).Length -eq 0 # returns $true
$content = gc .\zero.txt -Raw -Encoding Ascii
$content -eq $null # returns $true
$null -eq $content # returns $true
$content -match 'anything' # returns nothing, but should return $false
[System.Management.Automation.Internal.AutomationNull]::Value -match 'anything' # ditto
$null -match 'anything' # returns $false
$true
$true
$true
$false
$false
$false
$true
$true
$true
# returns nothing at all
# returns nothing at all
$false
This makes it more difficult to write scripts that process content in files, because tests that should fail return nothing instead, so if you were checking for a failure and then jumping to the next iteration of the loop with continue, your continue does not get called and then unpredictable things can happen as a result.
Reproduced in PowerShell 5.1 and 6.0.
@KirkMunro Thanks for your report! Do you plan to make the fix?
Interesting issue.
AutomationNull.Value
is intended to convey "no results" which is different than $null
. So I think this is by design.
And indeed, here is the code that explicitly treats AutomationNull.Value
as an empty collection when we are checking if an object is a collection.
Maybe AutomationNull.Value
could have been an empty collection in the first place (and hence not equal to $null
), but that decision was made before I started.
I hope someone will resolve this annoying bug.
If the AutomationNull.Value
behaviour is by design, maybe this is an issue with Get-Content in the FileSystem provider.
My expectation is that when I invoke Get-Content _filename_ on a text file, especially when I use the -Raw switch but I have this expectation even when I don't use -Raw, that I will get back either a string or an array of strings, depending on the content and whether or not I used -Raw. Certainly in a zero-byte text file, this should give me back an empty string. This expectation is not surprising given that the command metadata reports the OutputType as System.Byte
or System.String
. Further evidence that supports my expectation is the following:
# Create a zero-byte, empty ASCII file
Set-Content -LiteralPath .\empty.txt -Value '' -NoNewLine -Encoding Ascii
# I created the content using an empty string, so when I Get-Content -Raw,
# shouldn't I get back an empty string?
$content = Get-Content -LiteralPath .\empty.txt -Raw
$content -is [string] # returns $false
That script shows that you cannot round-trip empty content into a text file and back out again, because the command is returning AutomationNull.Value
instead.
For this specific issue, given the questions about whether or not AutomationNull.Value
should be treated as a collection, I think fixing Get-Content would be helpful; however, would that be a breaking change?
Maybe this is going to force me into using strong typing for my variables, because forcing the results of Get-Content into a string makes this problem go away. I feel that force shouldn't be necessary though, especially because I asked for the -Raw string output from the file.
From docs :
This cmdlet returns strings or bytes. The output type depends upon the content that it gets.
So I expect:
Why would you expect Get-Content -Raw
to return $null? It always returns a string. Even for binary files. Unless the file is empty (in which case it makes sense for it to return an empty string, no?).
From the FileSystem provider documentation:
-Raw
Ignores newline characters. Returns contents as a single item.
Yes, Get-Content always returns a string with -Raw
and w/o. I agree that the cmdlet should returns an empty string for an empty file.
"Certainly in a zero-byte text file, this should give me back an empty string."
How does one know what the format of a zero byte file is? If PowerShell was required to maintain a mapping of all known file extensions/mime types based on extensions, and an associated "empty" result, that would be unmanageable. It would also be impossible on Linux, which has no hoots to give about file TLEs. :)
Update: This isn't directed at you, Kirk. Just a general statement. I realize that the standard seems to be byte or string. I guess strings are seen as just more manageable than void or $null - even empty ones.
Also, isn't $null coerced to an empty string if required? I think the AutomationNull.Value idea was sound but it seems difficult to be consistent with. Argh...
@oising But with Get-Content -Raw, in my testing it always returns string, regardless of the format of the file. PDF, BMP, ZIP, etc. So the file format has nothing to do with it. I may have seen this at one point, but right now I'm not sure when it returns bytes instead of strings, and since it returns strings for all files, that's why I think it makes sense to return an empty string for a zero-byte file.
I think AutomationNull.Value when you invoke a command to get object data like services or processes and nothing comes back is sound. I'm not sold on AutomationNull.Value as a way to represent an empty file though, when all other file content comes back as string.
I can work around this all sorts of ways (strong typing a variable as string and assigning the results of Get-Content to that variable, for example), but beyond the inconsistency, I think the potential to cause scripts to do unexpected (or maybe undesirable) things if a script encounters a zero-byte file warrants re-thinking the original design decision (while evaluating whether or not it's a breaking change that could break someone's code). The current behaviour is not intuitive enough to be considered in scripts, which is why I brought it here as a bug to discuss.
Get-Content
will return bytes if you specify -Encoding Byte
. If the file is empty, then you get a $null
. So the user can determine if the file is read as text
(encoded as ascii, unicode. utf8) or binary
by specifying the appropriate encoding.
Thanks @rkeithhill. I knew I had seen it before, but it wasn't something I have used frequently.
There are two distinct issues here:
(a) Get-Content
's behavior
(b) [System.Management.Automation.Internal.AutomationNull]::Value
behavior as the LHS of array-aware operators.
Get-Content
's behaviorI agree that Get-Content -Raw
when given an empty input file should return a _scalar_ rather than [System.Management.Automation.Internal.AutomationNull]::Value
, the latter signaling an empty _collection_.
By contrast, it is appropriate - and consistent with current behavior - for Get-Content
_without_ -Raw
to return [System.Management.Automation.Internal.AutomationNull]::Value
, because a _collection_ is expected - be that one of _lines_ or _bytes_ (with -Encoding Byte
).
Arguably, with -Raw
that scalar should be ''
(the empty string, which unambiguously implies an empty file), but even a _bona fide_ $null
is preferable to the current behavior.
@PetSerAl has done great sleuthing on SO to come up with a way to inspect whether a given value is actually $null
or [System.Management.Automation.Internal.AutomationNull]::Value
:
New-Item -Type File zero.txt # create 0-byte file
$refEquals=[Object].GetMethod('ReferenceEquals')
# Should be and is $True
$refEquals.Invoke($null, @((Get-Content zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))
# Should be and is $True
$refEquals.Invoke($null, @((Get-Content -Encoding Byte zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))
# !! Should be $False, but is $True
$refEquals.Invoke($null, @((Get-Content -Raw zero.txt), [System.Management.Automation.Internal.AutomationNull]::Value))
[System.Management.Automation.Internal.AutomationNull]::Value
behavior as the LHS of array-aware operators.In short: The treatment of [System.Management.Automation.Internal.AutomationNull]::Value
is _inconsistent_:
-match
interprets [System.Management.Automation.Internal.AutomationNull]::Value
as an _array_ (collection)-eq
, -le
, ge
and their variations treat it as (scalar) $null
(haven't looked at others)[System.Management.Automation.Internal.AutomationNull]::Value -match 'anything'
returning "nothing" (an empty [System.Object[]]
instance) is defensible: an empty _collection_ as the LHS to which a filtering operator is applied can only ever return that empty collection, albeit converted to an _empty array_ by PowerShell.
By contrast, here are some sample commands that demonstrate (scalar) $null
treatment with -eq
, -le
, and -ge
:
> [System.Management.Automation.Internal.AutomationNull]::Value -eq $null; $null -eq $null
True
True
> [System.Management.Automation.Internal.AutomationNull]::Value -eq 0; $null -eq 0
False
False
# Any negative value yields $False.
> [System.Management.Automation.Internal.AutomationNull]::Value -le 0; $null -le 0
True
True
# Any negative value yields $True
> [System.Management.Automation.Internal.AutomationNull]::Value -ge 0; $null -ge 0
False
False
On a side note, I find that comparing $null
to anything other than $null
returning $true
baffling: for instance, why are $null -lt 0
and $null -gt -1
$true
?
You actually don't need to use reflection to identify automation null. For example:
$x = $null
$y = [System.Management.Automation.Internal.AutomationNull]::Value
foreach ($item in 'x','y') {
$value = Get-Variable -Name $item -ValueOnly
if ($value -eq $null) {
if (@($value).Count -eq 0) {
"`$${item} is [System.Management.Automation.Internal.AutomationNull]::Value"
} else {
"`$${item} is `$null"
}
}
}
I just added a comment to @PetSerAl's post sharing the same information.
For Get-Content's behaviour, I still expect an empty string when invoking Get-Content with encoding set to anything other than Byte. If you invoke Get-Content against a file containing a single line of text, you get back a string, not an array. An empty string is a much better representation of an empty file than $null. Consider non-ASCII files (e.g. UTF-8). They have a byte order mark included in them, so would $null be a good representation of their content when retrieved using the proper encoding?
All of these details aside, before I spend more time on this and before I could consider looking at the code to apply a fix for this, my concern is that these changes, regardless of what form they would take, would be breaking changes and rejected accordingly, resulting in wasted time and effort. The more I think about it, the more I feel that is what would happen, because someone may very well have scripts written that look something like this:
foreach ($filePath in Get-ChildItem -Recurse -Filter *.txt) {
$content = @(Get-Content $filePath)
# If the file is empty, skip it
if ($content.Count -eq 0) {
continue
}
# Other file processing goes here...
}
Or, considering the use of -Raw, someone may have scripts that do this:
foreach ($filePath in Get-ChildItem -Recurse -Filter *.txt) {
$content = Get-Content $filePath -Raw
# If the file is empty, skip it
if ($content -eq $null) {
continue
}
# Other file processing goes here...
}
With those possibilities in mind, the proposed changes to Get-Content should be rejected as breaking changes, regardless of whether or not we change the result when it is not invoked with -Raw, shouldn't they?
That brings me back to how AutomationNull.Value is treated like a collection when used with -match/-notmatch or -like/-notlike, but not -eq/-ne. If it looks like $null but doesn't act like $null, it must be AutomationNull.Value. Try explaining how AutomationNull.Value works, coupled with considerations you should take into account when you are scripting around AutomationNull.Value, to a classroom and see how well they understand it afterwards.
@KirkMunro:
Thanks for that handy alternative for detecting [System.Management.Automation.Internal.AutomationNull]::Value
; to summarize:
> $scalarNull = $null; $collectionNull = [System.Management.Automation.Internal.AutomationNull]::Value
> @($scalarNull).Count
1
> @($collectionNull).Count
0
@KirkMunro:
For Get-Content's behaviour, I still expect an empty string when invoking Get-Content with encoding set to anything other than Byte
I would expect that with -Raw
_only_, but not otherwise. Without -Raw
, Get-Content
inherently retrieves a _collection_ of lines, and returning a value that signals "no items in this collection" seems appropriate.
Consider non-ASCII files (e.g. UTF-8). They have a byte order mark included in them, so would $null be a good representation of their content when retrieved using the proper encoding?
With -Raw
, distinguishing between a true zero-byte file and one _solely_ comprising a BOM (Unicode signature) would be the only argument for using $null
for a zero-byte file and ''
for an Unicode-signature-only file.
But my sense is that this distinction is not worth making.
would be breaking changes and rejected accordingly
Note that a proposed change being breaking _may_ be, but doesn't have to be a reason for rejection.
because someone may very well have scripts written that look something like this
$content = @(Get-Content $filePath)
# If the file is empty, skip it
if ($content.Count -eq 0) {
That should continue to work fine, if a change is restricted to -Raw
's behavior.
Or, considering the use of -Raw, someone may have scripts that do this:
$content = Get-Content $filePath -Raw
if ($content -eq $null) {
continue
}
That would indeed be a breaking change (unless we make -Raw
return $null
with zero-byte _and_ Unicode-signature-only files - which is still worth considering, given that $null
behaves like ''
in most contexts).
That brings me back to how AutomationNull.Value is treated like a collection when used with -match/-notmatch or -like/-notlike, but not -eq/-ne. If it looks like $null but doesn't act like $null, it must be AutomationNull.Value. Try explaining how AutomationNull.Value works, coupled with considerations you should take into account when you are scripting around AutomationNull.Value, to a classroom and see how well they understand it afterwards.
I agree that the current behavior is inconsistent and confusing.
No PowerShell user should ever have to learn about [System.Management.Automation.Internal.AutomationNull]::Value
(unless they like that sorta thing), but if the fundamental scalar / collection distinction worked _consistently_, they wouldn't _need_ to.
To summarize: If backward compatibility _weren't_ an issue, the following _should_ be fixed:
What Get-Content -Raw
returns.
Ensuring consistent behavior of array-aware operators with [System.Management.Automation.Internal.AutomationNull]::Value
as the LHS.
Assuming we're in agreement there: What do the powers that be think?
There's one point in your comment that I get stuck on.
Without -Raw, Get-Content inherently retrieves a collection of lines, and returning a value that signals "no items in this collection" seems appropriate.
That's actually not true. If a file contains one line, Get-Content
does not return a collection of one item. It just returns the only line that is in the file (i.e. it returns a string). That's why I was leaning towards the behaviour of both Get-Content
and Get-Content -Raw
being consistent when the file is either empty or when there is one line.
Regardless, I also want to hear from the PowerShell team because the point may be moot otherwise.
It just returns the only line that is in the file (i.e. it returns a string).
And that is the PowerShell way
, no? I mean that is one reason we have @()
to force an array when we get a scalar. It is also why foreach
will iterate a scalar (exactly once).
If you'll indulge me:
The arc of PowerShell history is long, but it bends toward collections.
The special-casing of one-element collections has always been a pain point - until PSv3, the Great Unifier, came along and allowed us to treat even scalars as if they're collections.
that is one reason we have @()
In that vein: and now _mostly do not need anymore_ (except if there's a chance that an element of the collection has .Length
/ .Count
properties or itself supports indexing).
In short: The PowerShell Way, methinks, is: Everything's a collection, unless told otherwise (such as with Get-Content -Raw
).
Based on the discussion above, and considering that Get-Content
behavior can be simulated by other (third-party) cmdlets we should exclude Get-Content
from the issue and consider only "LHS of array-aware operators" option.
Good idea to separate the two discussions:
The Get-Content
issue, irrespective of its _specific_ significance, is worth considering separately, because I think getting clarity on the underlying scalar-vs.-collection debate is important for the future (as an aside: I don't think deciding whether or not to change Get-Content
should be based solely on whether the issue _can be worked around_ ):
Now that the focus of this issue on the behavior of array-aware operators with [System.Management.Automation.Internal.AutomationNull]::Value
as the LHS, let me summarize:
Note: For brevity, and to give the construct a more memorable name, I'll refer to [System.Management.Automation.Internal.AutomationNull]::Value
as a _null collection_ from now on.
All array-aware operators should treat _the same_, which is currently not the case: among the operators discussed so far, only -match
treats a null collection as an _array_, whereas the others (-eq
, ge
, ...) treat it like (scalar) $null
.
If null collections should categorically be treated like _arrays_ (which makes sense to me), the secondary question is whether they should - _invariably, by definition_ - return:
[System.Object[]]
instance, as -match
currently does.[System.Management.Automation.Internal.AutomationNull]::Value
) too.The alternative approach is to categorically treat null collections as $null
in the context of expressions, because, according to the documentation:
When received in an evaluation where a value is required, it should be replaced with null.
To help with experimenting, I thought I'd provide convenience function Test-Null
that makes it easier to distinguish between $null
and [System.Management.Automation.Internal.AutomationNull]::Value
:
A few sample calls:
> New-Item -Type File zero.txt
> (Get-Content zero.txt) | Test-Null
(null collection)
> (Get-Content -Raw zero.txt) | Test-Null
(null collection)
> (Get-Content -Encoding Byte zero.txt) | Test-Null
(null collection)
> $null | Test-Null
$null
> Test-Null ((Get-Content -Raw zero.txt) -match 'anything')
[System.Object[]] # an empty array - null collection was treated like empty array
> Test-Null ((Get-Content -Raw zero.txt) -gt 0)
[System.Boolean] # Boolean - null collection was treated like scalar $null
> Test-Null ($null -gt 0)
[System.Boolean]
Important:
To distinguish a null collection from an empty collection object, use the (implied) -InputObject
_parameter_.
To distinguish $null
from a null collection ([System.Management.Automation.Internal.AutomationNull]::Value
), use the _pipeline_.
<#
.SYNOPSIS
Tests if the (first) input object is non-$null, an explicit (scalar) $null, or
a null collection.
.DESCRIPTION
IMPORTANT: Choose between pipeline and parameter input depending on what cases
you need to distinguish:
* To distinguish between $null and a null collection, use *pipeline* input.
* To distinguish between a null collection and an empty collection object,
use the (implied) -InputObject *parameter*.
* Note: Any collection you specify is treated as a *single* input object.
Output is a string that indicates one of 3 conditions; if there is more than
1 (non-null-collection) value in the pipeline, ' ...' is appended.
* '$null' ... an explicit, scalar $null value
* '(null collection)' ... the [System.Management.Automation.Internal.AutomationNull]::Value
singleton that is returned behind the scenes by cmdlet or function calls
that produce no output.
* '[<type>]' ... the full type name of the (first) input object, which implies
an object that is neither $null nor the null collection.
.NOTES
Caveat re multiple pipeline input objects:
The type of the 1st object OTHER THAN
[System.Management.Automation.Internal.AutomationNull]::Value is reported.
Hypothetically, you could send something like
[System.Management.Automation.Internal.AutomationNull]::Value, 'foo' |
Test-Null
in which case it is "foo"'s type - [System.String] - that is reported.
.EXAMPLE
> $noSuchVar | Test-Null
$null
.EXAMPLE
> Get-ChildItem noSuchFiles* | Test-Null
(null collection)
.EXAMPLE
> Get-ChildItem / | Test-Null
[System.IO.DirectoryInfo] ...
.EXAMPLE
> & { return } | Test-Null
(null collection)
.EXAMPLE
> & { return $null } | Test-Null
$null
#>
function Test-Null {
param(
[AllowEmptyCollection()]
[AllowEmptyString()]
[AllowNull()]
[Parameter(ValueFromPipeline)]
$InputObject
)
begin {
$havePipelineInput = $MyInvocation.ExpectingInput
$didEnumerate = $false
$multiplePipelineObjects = $False
$firstInputObj = $InputObject
}
process {
if ($didEnumerate) { $multiplePipelineObjects = $true; return }
$firstInputObj = $InputObject
$didEnumerate = $True
}
end {
if ($havePipelineInput -and -not $didEnumerate) {
'(null collection)'
# Issue a courtesy hint re inability to detect an *empty collection* object via the pipeline.
Write-Verbose -Verbose 'Hint: To distinguish a null collection from an empty collection object, use -InputObject.'
} elseif (-not $havePipelineInput -and -not $PSBoundParameters.ContainsKey('InputObject')) {
Throw "Please provide input either via the pipeline or via the (implied) -InputObject parameter."
} else {
if ($null -eq $firstInputObj) { # $null
'$null' + ' ...' * $multiplePipelineObjects
if (-not $havePipelineInput) {
# Issue a courtesy hint re inability to detect a null collection as a *parameter* value.
Write-Verbose -Verbose 'Hint: To distinguish $null from a null collection, use the pipeline.'
}
} else { # (at least 1) non-$null object
"[$($firstInputObj.GetType().FullName)]" + ' ...' * $multiplePipelineObjects
}
}
}
}
Most helpful comment
And that is the
PowerShell way
, no? I mean that is one reason we have@()
to force an array when we get a scalar. It is also whyforeach
will iterate a scalar (exactly once).