Powershell: Make Select-String more intelligent for non-primitive types

Created on 8 Oct 2019  路  31Comments  路  Source: PowerShell/PowerShell

Summary of the new feature/enhancement

As a user,
I want to be able to easily use Select-String to find string data in formatted output,
so that I can find and process data in PowerShell more quickly.

By default, when you pass non-string data into the Select-String cmdlet, it will find matches based on the object's ToString() method results. This is great when the ToString output actually represents the data you want to match, but very often ToString results do not represent the data that you see in PowerShell, and this can result in incorrect or failed matches.

For example, consider this script:

# First, capture the date:
$date = Get-Date

# Now, get the current month in string format:
$month = $date.ToString("MMMM")

# Now, look at how the date renders in PowerShell, showing the month as a string
$date

# Now, try to match the month using Select-String. This returns nothing.
Get-Date | Select-String -Pattern $month

That script shows the current date, including the month in string format, but if you try to select that string based on the actual string month, there are no matches. Why does this happen? Because ToString() on DateTime objects returns the date time with a numeric month, not the string month.

Now let's look at a more realistic example:

# Get some services, including the Windows Update service, and filter output on the
# string "Update"
Get-Service wuauserv,bits | Select-String Update

That returns nothing. Why? Because the ToString method on service objects returns the name of service, so you can't filter output based on a partial match of a display name string this way.

Here is one more example:

Get-Process -Id $PID | Format-List * | Select-String Memory

This also returns nothing, because of the ToString method on Format cmdlet output objects returns their type name, none of which match Memory.

_It is reasonable for a user to expect to be able to easily and consistently parse/filter output that is rendered in the PowerShell console, but this is not possible unless they pipe to Out-String -Stream before they then pipe that streamed result to Select-String._

Proposed technical implementation details (optional)

I want to make Select-String better by adding a new -ConsoleOutput switch parameter (or some better parameter name: suggestions welcome) -FromFormattedOutput switch parameter that indicates that you want to select a string based on the console output of the data that is piped into Select-String, which would automatically take care of the formatting and output of non-value and non-string types (and non-MatchInfo, but that's a special case internally for Select-String), and select string matches based on that output rather than based on the ToString method output of individual objects.

Additional details

Personally I would prefer if Select-String worked this way by default for non-value and non-string types, but that would be a breaking change at this point, so that's not an option; however, users who want it to work this way by default can use $PSDefaultParameterValues['Select-String:FromFormattedOutput'] = $true, and that will have the same result.

Committee-Reviewed Issue-Enhancement Up-for-Grabs

Most helpful comment

I was just watching a 30-minute demo of Docker streamed from Ignite, when the presenter ran this command:

Get-Process | findstr smss

This is exactly what Select-String should be able to do by default, rather than switching to findstr or grep. In fact, consulting the Select-String documentation shows that it is even documented to work similarly to both grep and findstr, yet as demonstrated by the difference in output from gps | findstr smss and gps | sls smss, you can see that Select-String does not function like findstr or grep, which work against the textual output of a command.

Given that's the case, I'm hopeful the PowerShell Committee votes in favor of not having an additional parameter to get this behavior.

All 31 comments

From examples above it is not clear that is desired output. Can we get it with a workaround like Out-String or anything like?

I both love the idea and agree that it should have been the default behavior.

@iSazonov, I think of it as the equivalent of piping to Out-String -Stream first, and then performing the usual line-by-line search.

In fact, in my profile I have the following _simple_ function:

Set-Alias slsd Select-DisplayString
function Select-DisplayString
{
  # Note: Since we want to use $Input for simplicity, we canNOT
  #       make this an advanced function.
  param([string[]] $Pattern)

  if ($args.Count) { Throw "Unexpected arguments specified: $args" }

  if ($Pattern) {
    $input | Out-String -Stream | Select-String -Pattern $Pattern
  }
  else {
    $input | Out-String -Stream
  }

}

Example use, in a directory that has files / directories that are symlinks, such as / on macOS:

PS> Get-ChildItem | Select-DisplayString ^l
l-r--            7/2/2012  3:31 PM                User Guides And Information -> /Library/Documentation/User Guides And Information.localized
l-r--          12/20/2012  5:55 PM                User Information -> /Library/Documentation/User Information.localized

That is, the for-display output of Get-ChildItem was filtered by the first character on each line, which is the Mode column, whose first character being l indicates a symlink.

I wonder if we could do a simple check... Something like:

string objectString = LanguagePrimitives.ConvertTo<string>(InputObject);
if (objectString == InputObject.BaseObject.GetType().Fullname)
{
    // Use out-string on InputObject
}
else
{
    // Use objectString
}

That way we only have the object being resolved this way if there are no other conversion paths available. This would also be a good candidate for an experimental feature if we have concerns about the possibly breaking change.

To get straight to bike-shedding regarding what to name the new switch:

We already have switches that modify _output_ behavior with prefix As, such as -AsHashtable and -AsString for Group-Object.

As for _input_ treatment, the only instance I could find that _somewhat_ fits is Copy-Item's -FromSession

From as the prefix strikes me as reusable, so perhaps -FromDisplay [representation] or -FromDisplayOutput.

In any event, incorporating the word _display_ strikes me as preferable to _console_.

@vexx32: That test isn't sufficient, I'm afraid, because there are cases with different .ToString() behavior that is still distinct from PowerShell's output formatting; e.g., on _Windows_:

PSonWin> (Get-Process)[0].psobject.BaseObject.ToString()
System.Diagnostics.Process (ApplicationFrameHost)

Edit: Note how the output is _more_ than just the full type name - the process image name is appended.

@mklement0: I like the -From prefix for the parameter name. Maybe we should also borrow the fact that this is related to the Out-String -Stream parameter, and call it -FromInputStream (or maybe -StreamInput, since that's stating what the switch would do?). I like those better than -FromDisplay, because this is more about streaming object input into the command. It could be described as follows:

Indicates that the cmdlet searches for text and text patterns in streamed input. Objects piped into this command will first be rendered in their default format, and that output will be streamed into this command for text matching. By default, objects piped into this command are converted into their string equivalent using their ToString method, and that output is used in this command for text matching.

That's a little wordy, but something along those lines should do.

@mklement LanguagePrimitives.ConvertTo<string>() is not the same as .ToString(); it will follow the same behaviour as a [string] cast in PowerShell.

Besides, I thought that's exactly what we wanted -- if the object does not have a direct string conversion, we go a different method and run it through Out-String first before having Select-String parse it. So that would hit the test correctly -- Select-String would grab the value, see it's the same as the object's type name, and then pass the PSObject through Out-String before attempting to process the value.

EDIT: Oh, I see what you mean, there's an extra piece attached to that ToString() huh... hmm. maybe if the conversion to string .StartsWith() the type name?

@vexx32 That's still not sufficient, because you're dependent on how the creator of the object wanted it to render as a single string in that case. For example:

image

This is different. It's about being able to explicitly indicate you want the object streamed into Select-String with its default output, so that you can select string matches from that.

@KirkMunro aye, I think the explicit switch is a good idea. However, I also think that (perhaps as an experimental feature), Select-String could (should?) attempt to detect when its InputObjects are being converted into useless data and attempt to automatically compensate, using the behaviour that can also be forced with a switch.

@vexx32 Here's another example showing how it is different:

image

If I want to select "October", I can only do that from the rendered output.

However, I also think that (perhaps as an experimental feature), Select-String could (should?) attempt to detect when its InputObjects are being converted into useless data and attempt to automatically compensate, using the behaviour that can also be forced with a switch.

Maybe. That might just be another source for confusion though since the way an object renders as string is 100% dependent on how the author decided to implement it, and there's really no model for consistency there.

I'd just like to see it work this way by default. Select-String is often associated with grep, but users of grep expect to select information that they would see in a console if they were to output it there, not information that comes from a method and varies from object to object. If you're getting into object methods or properties, there are better commands for that.

That's a good point... Yeah, I can't really see any reason I'd pipe complex objects into Select-String without Out-String at present. 馃

Note to self: when implementing this, consider #8963 (i.e. What happens if you're selecting text that is formatted and possibly colored/highlighted already -- will the highlighting still work properly and provide a good experience for users)?

@vexx32: Re .ToString(): I should have worded by comment more clearly (since amended), but it sounds like we're all on the same page now.

@KirkMunro:

Maybe we should also borrow the fact that this is related to the Out-String -Stream parameter, and call it -FromInputStream (or maybe -StreamInput, since that's stating what the switch would do?)

I think _streaming_ is too general a term, and the connection with Out-String may not be obvious; after all, the term is also applied to how the pipeline functions _fundamentally_: _objects_ stream through the pipeline.

What's specific to the scenario at hand is that it is _text-line-by-text-line_ streaming derived from the _for-display_ output (as rendered by PowerShell's formatting system).

I think the word _display_ offers an important hint, and if we want to also incorporate the line-by-line aspect, we could use -FromDisplayOutputLines (the use of the plural is nonstandard, but the singular sounds awkward), though that is quite verbose.

My thinking was that mentioning lines wouldn't be necessary, as my sense is that line-by-line search is what users would naturally expect, in line with grep and findstr.

Either way we should also think about a short alias name, such as -dl.

Seems kind of wordy... hmm. What do you think of something like -FromFormat?

My original thought was that _display_ was a less formal term to guide those potentially not even aware of PowerShell's output formatting system.

However, _format_ to refer to the formatting system is technically more accurate, so I like the idea, and it would also give us as nicely alliterative alias name, -ff.

I'm still tempted to be a little more wordy - -FromFormattedOutput - and then rely on prefix name matching (-FromFormat) or the alias name for brevity.

I think the value proposition here is clear, and I like the ideas around the parameter name (thanks for participating in that discussion @vexx32 and @mklement0!), so I'll probably move forward with -FromFormattedOutput for now in a PR and then change it later if the PowerShell Committee has something else they would prefer.

@mklement0 I like the -ff alias' similarity to -f for -Force, as though it's an "I really don't care _how_ you do it, just _make_ it a string and work with it! 馃挗" kind of switch, which is kind of fitting. 馃槀

I was just watching a 30-minute demo of Docker streamed from Ignite, when the presenter ran this command:

Get-Process | findstr smss

This is exactly what Select-String should be able to do by default, rather than switching to findstr or grep. In fact, consulting the Select-String documentation shows that it is even documented to work similarly to both grep and findstr, yet as demonstrated by the difference in output from gps | findstr smss and gps | sls smss, you can see that Select-String does not function like findstr or grep, which work against the textual output of a command.

Given that's the case, I'm hopeful the PowerShell Committee votes in favor of not having an additional parameter to get this behavior.

Yeah, since this is about selecting strings, you'd think that when the command doesn't get string input, it would automatically run the object through the formatting engine to get the formatted string to search.

Good points, @KirkMunro and @rkeithhill; I think changing the default behavior makes perfect sense and strikes me as a bucket 3 change:

  • On the _input_ side, there are probably not too many people who rely on the current behavior, as it is either near-useless or hard to predict (you'd have to know what a particular input type stringifies to with .ToString(), which is often just the full type name, and otherwise non-standardized and not typically visible in PowerShell).

  • I wish we could also change the default behavior on the _output_ side - have Select-String output _strings_ by default - but that would obviously be a prohibitive breaking change; at least we have the -Raw switch now, however.

at least we have the -Raw switch now, however.

It looks very specific and Out-String works.

@mklement0 If I understand right you say that no need to have FromFormattedOutput parameter?

It looks very specific and Out-String works.

The -Raw switch is already implemented - even though you could always do Out-String -Stream or (...).Line, -Raw is clearly an improvement in terms of both convenience and performance.
I simply pointed -Raw out, because it isn't well-known yet - your response being a case in point.

If I understand right you say that no need to have FromFormattedOutput parameter?

Yes, we are now advocating simply changing the _default behavior_ for non-string input.

Heading into @PowerShell/powershell-committee meeting in a few mins, just got through this thread and wanted to get a few thoughts out:

  • I absolutely agree that it's a valuable scenario to want to filter the formatted received at the console. I use findstr and grep all the time (and if I ever get around to setting up this, I'll just use grep everywhere).
  • My inclination in reading through the thread is actually to lean towards an entirely new cmdlet with the reasoning that it sounds like there's multiple behaviors one might want, and an expanding matrix of parameters on Select-String doesn't sound awesome. Don't get hung up on the name, but I think something like a Select-FormattedOutput, Select-FormattedString, Select-Output, Select-HostOutput, Select-HostText, Select-Text etc. would be helpful.
  • I'm very worried about breaking changes to Select-String. Not picking on you at all, @mklement0 (I saw this sentiment throughout the thread), but with regards to this statement:

    On the input side, there are probably not too many people who rely on the current behavior, as it is either near-useless or hard to predict

    The issue with breaking changes is that often folks have hardcoded themselves to work around "hard to predict" behaviors. Take @KirkMunro's Get-Date example: the default ToString() doesn't return what's output by the default formatter. Maybe I did it once, realized that the string name of the month wasn't in the part parsed by Select-String, and now I'm actually trying to parse "12/18/2019 2:52:23 PM" as a mm/dd/yyyy string and hand it to some native utility. This breaks me catastrophically.

To that end, I propose that one or two people actually build this, throw it up on the Gallery, and then we can actually play with an implementation and decide what fits right across multiple platforms (I have a strong feeling we won't get it right on the first try).

Might it actually just be a wrapper on findstr and grep depending on platform? Or is there a reason we would want this to be fully implemented in PowerShell land?

@PowerShell/powershell-committee discussed this, and we agree it should be started in the Gallery by folks in the community. Primarily, we question the overall usefulness of this given the existing findstr and grep on platforms. If there's an elegant design that proves to be popular, we can consider including it in the future.

We do agree that we should stop telling new users that Select-String/sls is our grep equivalent and document the existing utilities and foo | Out-String | Select-String 'bar' workaround better.

Oh, and we bikeshedded on the name of the cmdlet for a while. No one's in agreement, so I threw every possible one that folks threw out into my previous comment

@joeyaiello What is the perceived value in adding another cmdlet to do essentially the same job? :confused:

The way I see it at the moment is:

  1. We _have_ a Select-String cmdlet,
  2. for which a lot of potential piped input is effectively useless, because
  3. most objects' ToString() values are significantly less useful than their Out-String values.

Adding another cmdlet effectively only duplicates Select-String (and likely quite a lot of code if we're not very careful and do a lot of refactoring of the original cmdlet _as well_), and would effectively make Select-String a second-class citizen -- we introduce a new cmdlet that does everything Select-String does, _and more_, and then Select-String becomes effectively useless, since the new cmdlet would already do what Select-String does.

I don't see a particular need for this to _be_ a whole extra cmdlet, and nor do I think adding one additional parameter would be a huge change for this cmdlet. If we're talking adding parameters to Invoke-WebRequest, sure, cause for caution. Here? Not so much, in my opinion.

That said, looking at the current syntax diagram for Select-String, we clearly need to do something about excessive listing of possible values for -Culture, but on the whole it doesn't have that many parameters, and that issue is something we need to fix in the help system, not the cmdlet itself.

Some small comments:

@KirkMunro

Now, try to match the month using Select-String. This returns nothing.

Get-Date | Select-String -Pattern $month

This seems rather artificial to me given that PowerShell is an object-based shell. Why are you not doing:

Get-Date | where Month -match $month

And if you really want to grep against formatted output, it's simple:

Get-Date | Out-String -Stream | Select-String -Pattern $month

@mklement0

On the input side, there are probably not too many people who rely on the current behavior,

On what basis are you making this claim? It would certainly break my daily workflow. Oh and BTW, Select-String works against files too so that has to be taken into account.

Finally

  1. Going through the formatter is an order of magnitude slower than where or sls (as it is today).
  2. Formatted output has no guarantee of compatibility. Changing how objects are rendered is one of the things that is explicitly called out as being completely open to change. So writing scripts against formatted output is discouraged.

@bpayette There's scripting in scripts, where you want performance and you want to use .NET objects, and there's ad-hoc use where you are pulling data and you want to get chunks of it quickly and easily. That's why grep or findstr are used even by folks on the PS Team or PMs not on the PS team in place of Select-String, and in place of using Out-String -Stream, Where-Object, etc. Being able to get information quickly and easily is highly useful, which is the motivation here. The motivation here, at list for me, isn't to use Select-String to filter output in scripts.

My only argument for updating Select-String vs a new command is to have a single command to do easy filtering with highlighting of what it found in-box so that I don't have to worry about what system I'm on when I use it. Given PowerShell is a shell, this seems like solid value to offer across the board. I can use grep or findstr, but I tend to stick to native, in-box PS that just works everywhere. That's where I think augmenting Select-String vs using @SteveL-MSFT's Select-Text in his upcoming module adds value.

Given the pushback on augmentation of Select-String and the concern around breaking changes, I'll just hope that a successor gets enough traction to replace Select-String with a suitably short alias in-box, and then just switch to that.

In addition to the points made by @vexx32 and @KirkMunro:

And if you _really_ want to grep against formatted output, it's simple:

  • Select-String's sole purpose is to search through _strings_ (and being able to do that "quick and dirty", as @KirkMunro explains, is an invaluable _interactive_ tool).
  • Now, if the input isn't composed of strings, searching what _string_ representation of the input makes more sense?

    • What you see in the console (host), i.e. the formatted representations?

    • Or the result from a .ToString() call which produces a near-useless and hard-to-predict stringification you don't typically get to see elsewhere (which answers the _on what basis_ question).

Piping to | Out-String -Stream isn't simple: it's an obscure, cumbersome workaround for something the cmdlet should have done automatically _to begin with_.

Oh and BTW, Select-String works against files too so that has to be taken into account.

Yes, the -LiteralPath binding via Get-Item / Get-ChildItem output for file-content searching would have to be retained, which makes for a (preexisting) inconsistency - but an easily explained one.

That is, if you really wanted to search a directory listing as printed to the screen, then - and only then - would you need | Out-String -Stream.

Apparently we have an oss function wrapper for that now - that such a function was created speaks to how often you currently have to resort to that workaround.

It would certainly break my daily workflow.

Assuming the -LiteralPath binding is retained, what would break?

Other than someone inappropriately using Select-String with non-string input _in a script_ (the scenario that's @joeyaiello's concern) - which is where you should definitely check _object properties_ instead - nothing should break, and much is gained.

To me, that makes it a bucket 3 change, which spares us the confusion of introducing another cmdlet.

Revisiting some of the earlier points:

Primarily, we question the overall usefulness of this given the existing findstr and grep on platforms.

The different syntax and varying capabilities between these utilities , let alone having to use _different_ utilities on different platforms to begin with is reason alone to make Select-String support the intuitively expected behavior of searching the formatted representations.

Going through the formatter is an order of magnitude slower than where or sls (as it is today).

If the intent is to search the formatted representations, that cost is inevitable - and using findstr or grep already incurs it.

Again, the primary use case is quick-and-dirty _interactive_ search.
In _programmatic_ use, Where-Object should be used instead.
Providing such guidance - along with conversely saying: in _scripts_, use Select-String only with _string_ input - as part of the documentation should be sufficient.

throw it up on the Gallery, and then we can actually play with an implementation

It's not in the Gallery, but I've created a Select-StringFormatted wrapper function (which will be slow compared to a proper implementation as part of SelectString) in this Gist (also part of this SO answer), which has Out-String -Stream built in - this is what Select-String itself should do.

I recommend defining scs as its alias, which is what Select-String's alias should always have been, had its name followed the naming conventions.

Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but it's always good to check), you can install it as follows:

irm https://gist.github.com/mklement0/46fea9e6e5ef1a3ceaf681c976cb68e3/raw/Select-StringFormatted.ps1 | iex

Sample call (assumes Set-Alias scs Select-StringFormatted):

PS> Get-Process | scs service

     16     3.74      14.09       0.00    5664   0 SecurityHealthService
     10     3.61       6.84       0.00     632   0 services
     12     3.21       7.15       0.00    2388   0 VGAuthService

While this is already very convenient, there's room for improvement, given that simply string-filtering the lines doesn't include the header line with the column names; for (implicitly to be) Format-Tabled input, the header line could be included _for display_, analogous to the existing _for-display_ enhancement of coloring the matching part of the line.

Was this page helpful?
0 / 5 - 0 ratings