Create a file named "logs.log"
2017-09-06T19:37:17.115+00:00 node2 2017-09-06 19:37:17,114 Level="WARNING" Name="support.pulse" Message="Irregular pulse duration detected" Duration="2.31735133001"
2017-09-06T21:20:27.829+00:00 node1 2017-09-06 21:20:27,827 Level="WARNING" Name="support.pulse" Message="Irregular pulse duration detected" Duration="6.652835013"
2017-09-06T21:21:04.163+00:00 node1 2017-09-06 21:21:02,207 Level="WARNING" Name="support.pulse" Message="Irregular pulse duration detected" Duration="2.33411307697"
Now I run (I would usually use a complex RegEx pattern to match, but this is added for simpicity):
Get-Content logs.log | Select-String -Pattern '6.6' | Out-File out.txt
I would expect that in all cases, I get a file with one line
2017-09-06T21:20:27.829+00:00 node1 2017-09-06 21:20:27,827 Level="WARNING" Name="support.pulse" Message="Irregular pulse duration detected" Duration="6.652835013"
Depending on the size of the console window when the command is run will depend on the number of lines in the file. If I run the console maximise (and on this system, I am running a 1920x1200 display, 100% font scaling), I get the desired result. However, if I window the console and run, I could get something like
2017-09-06T21:20:27.829+00:00 node1 2017-09-06 21:20:27,827 Level="WARNING" Name="support.pulse"
Message="Irregular pulse duration detected" Duration="6.652835013"
That's two separate lines
> $PSVersionTable
Name Value
---- -----
PSVersion 5.1.16353.1000
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.16353.1000
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
This is the documented behavior of Out-File
. See the -Width
parameter.
Get-Content logs.log | Select-String -Pattern '6.6' | Set-Content out.txt
performs the behavior you are expecting.
@markekraus thanks, however, simply running
Get-Content logs.log | Out-File out.txt
Yields a normal width file - all the original lines are still on one line when you run the command in a windowed console. What is the difference after the string as been piped through Select-String
?
Either way, this is confusing behaviour.
The difference is the type of object being processed. Select-String
returns a Microsoft.PowerShell.Commands.MatchInfo
(docs) where as Get-Content
is returning String
s. The formatting is different.
This also produces the expected results:
Get-Content logs.log | Select-String -Pattern '6.6' | %{$_.ToString()} | Out-File out.txt
# or
Get-Content logs.log | Select-String -Pattern '6.6' | %{$_.Line} | Out-File out.txt
Thanks @markekraus :)
It may be the documented behavior of Out-File
, but I don't anyone who likes this behavior.
@lzybkr Kind of. The documented behavior of Out-File
is "somewhat" useful for "somewhat" emulating what the console would look like with the output. Most of the behavior complaints I have personally seen are addressed by using Set-Content
instead. I always just assumed the purpose of Out-File
was intended for preserving formatting.
In this particular instance, perhaps changing the default format behavior of Microsoft.PowerShell.Commands.MatchInfo
makes sense (if possible). One would reasonably expect Select-String
to return a String
and for its output to behave like strings... even though it is documented that it returns MatchInfo
or Bool
.
This behavior seems to break the principle of least astonishment to me.
If we send anything as formatted output on screen how we can route the same in file? Is Out-File cmdlet designed for that? What is Out-* cmlets designed for?
IMO, it is Select-String
that breaks the principle of least astonishment. Out-File
has its purpose. Maybe the default behavior is debatable, but, that behavior is what distinguishes it from the *-Content
cmdltes.
Select-String
, on the other hand, should probably be returning String
s. If it did, the formatting would not result in Out-File
line-wrapping or truncating. This is not to say returning MatchInfo
is not useful, just not expected as the default behavior, IMO. The name implies it selects String
s. The output behavior could be changed but I have seen code that does rely on the MatchInfo
objects. That behavior would need to be preserved (possibly with a switch parameter or something).
Select-String
returning a MatchInfo
is definitely useful for the usual reasons returning objects is useful. Here is one example - custom formatting to highlight the match: https://gist.github.com/lzybkr/dcb973dccd54900b67783c48083c28f7
@lzybkr Absolutely! That's why I originally suggested possibly modifying the default format of MatchInfo
instead of switching to String
. If the default format of MatchInfo
behaved like String
, It would behave closer to what one would expect in most situations without breaking all the many things that have been made to work with MatchInfo
.
The issue is not specific to Select-String
(whose naming may be debatable, but there's only so much information you can cram into a cmdlet name, and it does operate on strings _as input_), it is fundamental to PowerShell's default output formatting:
Unfortunately - from what I can tell, without having dug into the source - _all_ Out-*
cmdlets use the same window width-sensitive formatting, _even though respecting the host's window width really only makes sense for actual host (console) output (by default or via Out-Host
)_.
Yes, the behavior may be documented, but as @lzybkr and @SteveL-MSFT observe, it is undesirable / unexpected.
This is especially true for the >
redirection operator, which is _effectively_ an alias of Out-File
.
Thus, the real fix - which is a breaking change, though hopefully a bucket 3 change - would be to remove any host window-width sensitivity from Out-*
cmdlets that don't target the host (console), namely Out-File
and Out-String
.
A simpler demonstration of the problem:
# Default formatting truncates the property value and appends "..." to fit the window width.
> [pscustomobject] @{ str = 'x' * 1024 + '!' }
str
---
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
# Unfortunately, the same truncation happens when you use Out-File, resulting in LOSS OF DATA:
> [pscustomobject] @{ str = 'x' * 1024 + '!' } | out-file tmp.txt; Get-Content tmp.txt
str
---
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
As an aside: There seems to be a bug with respect to using >
to redirect a [pscustomobject]
instance to a file - see #4812.
If '>' is alias for Out-File
can we replace it with Set-Content
?
Unfortunately - from what I can tell, without having dug into the source - all
Out-*
cmdlets use the same window width-sensitive formatting, _even though respecting the host's window width really only makes sense for actual host (console) output (by default or viaOut-Host
)._
Right, which is what I have always assumed the Out-*
commands are intended for. And yes, it does make sense, IMO, to have that functionality for files as well as the console. Sometimes you want the UX captured and Out-File
accomplishes that goal
I don't see the point in making Out-File
behave outwardly like Set-Content
/Add-Content
with a different internal implementation.
In defense of Out-File
:
Out-File
always did what I expected it to do _when I was new to the language_. What I saw on screen I wanted in a file and Out-File
did just that. It wasn't until I was more familiar with the language and understood the "object" aspect of output (came from heavy text based languages) that I even saw this as a limitation.
@iSazonov: That would be a huge breaking change, because Set-Content
performs simple _stringification_ on its input objects, whereas Out-File
uses the default formatter (there's also the different character encodings, though that will go away soon, at least in Core):
> @{ one = 1 } > t.txt; Get-Content t.txt
Name Value
---- -----
one 1
> @{ one = 1 } | Set-Content t.txt; Get-Content t.txt
System.Collections.Hashtable # same as: @{ one = 1 }.psobject.ToString()
On a side note: The fact that Set-Content
seems to use (the equivalent of) .psboject.ToString()
means that it acts culture-_sensitively_, unlike string _expansion_ (string interpolation in double-quoted strings):
> [cultureinfo]::CurrentCulture = 'de-DE'; 1.2 | sc t.txt; gc t.txt; "$(1.2)"
1,2 # Set-Content: culture-SENSITIVE
1.2 # string expansion: culture-INsensitive
Out-File
is generally culture-sensitive as well, including with respect to .NET _base_ types such as [double]
.
@markekraus:
Sometimes you want the UX captured and Out-File accomplishes that goal
Perhaps, but making capturing of console output _as-is_ the _default behavior_ of Out-File
and especially >
is very problematic. (I could see an _opt-in_ to mimicking console in _all_ aspects via -Width
as helpful.)
Out-File
and Out-String
definitely need to continue to use the default formatter, but, as stated, without the _incidental_ aspect of window width; what would make sense to me is to make them behave as if there's no limit on line length.
What I saw on screen I wanted in a file and Out-File did just that
Yes, but not _literally_, just like users understand that, for instance, word wrapping in a text editor is an artifact of _windowed display_, and that the line breaks aren't part of the underlying file.
To recap: line wrapping and value truncation (...
) are fine for _console display_, but have no business in an output file (unless explicitly requested).
As a (naive) user, I expect Out-File
/ >
to send _data_ to a file, not a "screen shot" of the console.
(The fact that default formatting is not the right format to use for stable long-term storage is a separate, more advanced topic.)
Yes, but not literally, just like users understand that, for instance, word wrapping in a text editor is an artifact of windowed display, and that the line breaks aren't part of the underlying file.
Which is why I'm suggesting that Select-String
or the default format for MatchInfo
is at fault here. Normal strings do not wrap in Out-File
. Select-String
does not return a string, but one would expect its output to behave like strings if that's what it is supposed to be selecting as indicated by its name. In every other case where you are dealing with strings you will get the expected result with Out-File
. If Select-String
is treated like the PowerShell equivalent of grep
, one would expect what @swinster initially reported to work. And, if Select-String
actually returned strings or objects that format like strings then it would. I don't think Out-File
is to blame for doing what it is supposed to do.
Compare
Get-Content logs.log | Select-String -Pattern '6.6' | Out-File out.txt
to
Get-Content logs.log | Where-Object {$_ -match '6.6'} | Out-File out.txt
The latter performs as expected where as the former line-wraps. The same is true of the console. depending on how you have your console oriented, the latter will extent further before wrapping where as the former will wrap at whatever width the console is set to.
In every other case where you are dealing with strings you will get the expected result with Out-File
Yes, but you're typically _not_ dealing with strings when you use Out-File
/ >
(if you know you're dealing with strings only, just use Set-Content
- no need to involve the default formatter).
As such, it is the _typical_ use case that - potentially, circumstantially - exhibits the undesirable / unexpected behavior, especially with respect to >
if you come from a text-only shell such as cmd.exe
or bash
.
That redirection operator >
/ a cmdlet named Out-_File_ would have a coupling with _console window size_ is absolutely counter-intuitive - the fact that it is documented notwithstanding.
As for >
we have Issue for it.
As for Out-File
if it's behavior is documented so maybe add switch to dsiable the console width formatting?
The question I have is when do you determine when formatting applies or not in Out-File
?
For example:
[pscustomobject]@{
Property01="aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
Property02="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
} | Out-File C:\temp\test.txt
Would you really expect the Out-file
to show the entire "stringified" contents of each property?
Would you really expect the Out-file to show the entire "stringified" contents of each property?
Indeed I would - in a file, line length needn't be constrained, and including the whole value - i.e., _avoiding loss of data_ - is more important than limiting line length (at least _by default_)
Also, remember that not all hosts are consoles and in some the concept of a window may not even apply.
avoiding loss of data
This is an "out" operation, not an "export" operation. I would not expect complete serialization from Out-File
commands, I would expected WYSIWYG. If i need the full contents of a property, I would pipe that property to Out-File
. Again, I would expect strings to be unmolested by Out-File
(and they are), but objects I would expect formatting. If I have custom formatting on an object, I would not want that formatting lost to Out-File
, even if that means data is truncated or wrapped. If I need the unmolested data I would serialize and export. That is what differentiates Out-File
from Set-Conetent
or Export-Clixml
This is an "out" operation, not an "export" operation.
Yes, as stated, output formatting is primarily for _display_ purposes (human eyeballs), and therefore not suitable as a complete, machine-parseable format with longterm format stability.
That doesn't mean it shouldn't exhibit the most helpful / least surprising _default_ behavior, however.
I would expected WYSIWYG.
That's the crux: my sense is that most people do NOT expect that, especially with >
, and they may only notice when it's too late.
Yes, you _can_ conceive of Out-File
/ >
as a _console transcript_ (which is not documented as such, though the role of -Width
is, somewhat), but I think it is ill-advised, for all the reasons discussed (truncation, unexpected line breaks, non-obvious length limit for non-console hosts).
I would not want that formatting lost to Out-File, even if that means data is truncated or wrapped
You still get the formatting, just not the truncation / wrapping - because that is the unhelpful / unexpected part.
We could still preserve the ability to limit line length on an _opt-in_ basis; we already have -Width
; for convenience, perhaps a new -ConsoleWidth
_switch_ could be added that uses the console window's current width.
@iSazonov:
As for > we have Issue for it.
I couldn't find any existing issue - can you point us to it?
As for Out-File if it's behavior is documented so maybe add switch to dsiable the console width formatting?
If we reversed the logic (_opt-in_ to limit line length), it would technically be a breaking change, but my sense is that it would fall into Bucket 3: Unlikely Grey Area, and is therefore worth making.
Re still getting the formatting: I do see that with _tabular_ output you may end up with misaligned columns in the file if values are never truncated, and imposing a line-length limit - albeit a high, fixed one - still makes sense.
Taking another step back: it would be unfortunate, but it's conceivable that there's code out there that relies on parsing Out-File
/ >
output and would be affected by a change in default behavior.
That's the crux: my sense is that most people do NOT expect that, especially with >, and they may only notice when it's too late.
And they don't... for strings.. they would.. for objects. and in that respect, Out-File
works as intended, except when a command like Select-
String
doesn't actually return strings. :)
Taking another step back: it would be unfortunate, but it's conceivable that there's code out there that relies on parsing Out-File / > output and would be affected by a change in default behavior.
With object streams, there really is no need to parse the text, so formatting has never been considered a contract in PowerShell.
@markekraus: Overall, we're talking about sensible default behavior that doesn't violate the principle of least astonishment: When I send output to a _file_, I'm not thinking about the (usually incidental) width of my console window - if there even is one.
Considering the specific output type of a command and choosing between Set-content
and >
accordingly is definitely an advanced technique.
As for Select-String
specifically: the String
name part is defensible in that it refers to the type of the _input_ objects - they either already are strings or they're forced to strings, using (culture-invariant) string expansion.
And in terms of _functionality I think we're in agreement that it is helpful to output [Microsoft.PowerShell.Commands.MatchInfo]
instances rather than mere strings.
There is no good reason for the default output to actively _insert_ a newline instead of just letting the console wrap lines that exceed the window width.
(Another perhaps unexpected aspect of current Select-String
default output formatting is that it produces leading and trailing empty lines).
Therefore, the right resolution for the specific Select-String
issue is the one you already suggested yourself:
If the default format of MatchInfo behaved like String, It would behave closer to what one would expect in most situations without breaking all the many things that have been made to work with MatchInfo.
Do note, however, that Select-String
's default output format is context-dependent: the input line only, if the input doesn't come from _files_, and a filename + line-number _prefix_, if it does; to always get the input line only, the .Line
property must be used.
There is no good reason for the default output to actively insert a newline instead of just letting the console wrap lines that exceed the window width.
Do you suggest remove the window width formatting at all from formatting subsystem and migrate to console wrap?
@iSazonov:
That's a good question: I guess I shouldn't have related this to the _console_ at all, given that the formatting subsystem is really console-independent and explicitly creates lines with a hard length limit.
It is the fact that it derives that limit situationally from the console window width when _not_ printing to the console that is problematic.
So perhaps simply _defaulting to_ -Width ([int]::MaxValue)
in the case of Out-File
/ >
and Out-String
is the right solution - or perhaps a lower, but reasonably high value, so as not to let file sizes get out of hand:
E.g., [implicit] Format-Table
output composed of a large number of rows with even only a single long value could create unexpectedly large files; in the following example, with no limit on line length, all 3 lines end up 205 characters long, because right-space-padding is applied to the rightmost column too:
([pscustomobject] @{ one = 1; two = 'x' * 100; three = 3 }),
([pscustomobject] @{ one = 1; two = 2; three = 'x' * 100 }),
([pscustomobject] @{ one = 1; two = 2; three = 3 })
At the end of the day, users still need to understand that the only robust use of Out-File
/ >
is with _strings_ only - in which case Set-Content
is the better choice anyway - but by not imposing the console window width we can make the default behavior of Out-File
/ >
and Out-String
more useful.
The formatting subsystem is a complex beast, however, and I only know it superficially, so do tell me if I'm missing something.
This was my point of the original post. I'm glad it has provoked discussion and it certainly has enhanced my understanding, but yes, my goal was to pipe out results to a text file, just as you would with grep
in Linux. What I ended up with was unexpected behaviour, be it documented or otherwise.
Now all I have to understand is "why o why" is the native PowerShell Select-String
so poor in performance terms to a ported version of grep
that appears On the GoW tools (https://github.com/bmatzelle/gow). In a basic test across a number of files (33), Select-String
appears to be 400% slower than this grep
. Of course, this is likely is topic of another discussion entirely.
Years ago I tried setting the default width to int.MaxValue - in the current implementation that kinda works in some places, but in others, it turns out to be a terrible idea - you will get padding to the width you asked for.
I do think we need to detect where the output is going - and not just for the width. If the output is to a file, we might also want to strip ansi escape sequences (with some sort of option to not do so).
As for performance - I lost any incentive to improve the perf of Select-String
after finding ripgrep. It even has an option to output in a PowerShell friendly way so it wouldn't be too hard to write a wrapper and get objects, but I haven't done that yet.
CoreFX RegEx implementation is so bad?
I haven't compared the CLR regex to Rust's regex directly, but I'm confident we can do much better without touching the CLR's regex code.
To continue on the tangent, more generally:
The ability to call out to external utilities is important - even more so in the Unix world.
I'm sure there are other cmdlets whose performance with large input sets will necessitate using external utilities instead.
Conversely, other shells may want to take advantage of PowerShell's high-level features on demand, by calling PowerShell's CLI as needed.
The interfaces to properly support these scenarios aren't quite there yet / may never get there, unfortunately:
Problems with calling _out_ of PowerShell (calling external utilities) - from what I understand, these _will_ be fixed:
bash -c 'echo "hi there"'
(prints just hi
; #3049) and /bin/echo "a b\"
(prints a b"
, #4358) being two notable cases.Problems with calling _into_ PowerShell (powershell -command ....
):
PowerShell's nonstandard startup behavior is problematic, most notably that the profile is loaded by default.
The above link lists all issues, but two are worth calling out, relating to the -Command
and -File
parameters:
They exhibit problematic interactive shell-like behaviors (with or without -NonInteractive
) and lack consistent argument support - see #3223.
-File -
- to provide a script via stdin (-
) - is not officially supported and doesn't accept _arguments_ (in addition to exhibiting problematic interactive-like behavior)-Command -
is officially supported, but _by design_ doesn't accept arguments (and also exhibits problematic interactive-like behavior); see next point.The way -Command
currently parses the remaining arguments is problematic in itself and fundamentally incompatible with POSIX-like shells - see https://github.com/PowerShell/PowerShell/issues/4024#issuecomment-312267611
don't
stored in a Bash variable to powershell -command
as-is currently requires the following acrobatics:v="don't"; powershell -command Write-Output "'${v//\'/\'\'}'"
-Command
argument were treated like an ad-hoc _script_ to which _arguments can be passed_, as in POSIX-like shells (instead of reassembling all arguments into a single string that is then _as a whole_ interpreted as PowerShell code, as currently happens), this would work as follows:v="don't"; powershell -command 'Write-Output $Args' "$v"
The -Command
issue will not be fixed so as to preserve backward compatibility, which I personally find very problematic; not sure about the others.
I went through ripgrep repo and see that it's a very young and single man's project. I saw a message that its public API should be re-designed. Also it still haven't library - I'd prefer the same way we use Newton.
I read ripgrep perf analize. It is veryinteresting. We can benefit from that. For example, make our FileSystem provider (dir traversal) more fast. It seems again related to migration on ETS.
Most helpful comment
This behavior seems to break the principle of least astonishment to me.