Powershell: Close the feature gap between the Where-Object cmdlet and Where array method, introduce -First, -Last, ... switches

Created on 22 Oct 2020  路  28Comments  路  Source: PowerShell/PowerShell

Summary of the new feature/enhancement

_Update for clarification_: The gist of this proposal is to bring valuable filter functionality that currently only exists in the _in-memory_ .Where() array method to its _pipeline_ counterpart, Where-Object.
(Always using .Where() isn't an option with large input sets, because you'd have to collect all input in memory first.)

The .Where() array method, which is the in-memory equivalent of the pipeline-based Where-Object cmdlet, offers a number of additional features, such as the ability to optionally stop matching after the first match is found, which can be an important optimization technique, and the ability to output all objects that come before/after a matching one:

First / -First example:

# OK - very fast, because - despite the large input collection - processing stops after the first match.
(1..1e6).Where({ $_ -eq 10 }, 'First')  # -> 10

# WISHFUL THINKING: Add a -First switch (among several others - see below).
1..1e6 | Where-Object { $_ -eq 10 } -First

# Without it, ALL elements are processed - SLOW
1..1e6 | Where-Object { $_ -eq 10 }

# Current workaround is cumbersome.
1..1e6  | Where-Object { $_ -eq 10 } | Select-Object -First 1

SkipUntil / -SkipUntil example:

(1, 2, 42, 43).Where({ $_ -eq 42 }, 'SkipUntil') # -> 42, 43

# WISHFUL THINKING: Add a -SkipUntil switch.
1, 2, 42, 43 | Where-Object { $_ -eq 42 } -SkipUntil

Based on the current .Where() features, the following switches should be introduced with the same behavior, mirroring the WhereOperatorSelectionMode enumeration values:

-First, -Last, -SkipUntil, -Until, -Split

Note:

  • -Split partitions the input into two and returns two collections, so it would in effect require collecting all input in memory first before producing output, which would have to be clearly documented.

  • The .Where() method has an optional numeric parameter, numberToReturn, that modifies the operations, such as 2 with First returning the first _two_ matches; since we don't have a syntax for switches with _optional (non-Boolean) arguments_ for commands (see #12104), we have the following options:

    • Option A: Simply omit this aspect of the functionality and invariably default to 1, which is likely fine in the majority of cases; if a different number is needed, Select-Object -First/-Last $n can be piped to.

    • Option B: Implement -First and -Last not as _switches, but as-First and-Last ` ; the potential downside is that a number argument is then _mandatory_.

    • Option C: Given that numberToReturn also modifies all other functionality - though there's likely less of a need for that - implement a separate -Count <int> parameter, which in the absence of any of the switches would imply -First.

  • For these behaviors to be implemented efficiently, they have to stop the pipeline on demand, as Select-Object already does. However, the latter does so in a problematic fashion - not giving other cmdlets a chance to run their End blocks - which should be addressed (independently) as well: see #7930

On a meta note: You can skip the obsolete comments that follow and resume reading at this comment.

Area-Cmdlets-Core Issue-Enhancement

Most helpful comment

If this moves forward, I'd love to see -First and -Last implemented as <Switchable[int]> values, so that you don't need to provide a number if you are getting the very first one or the very last one. If you don't know what I mean, I proposed on another PR that there are scenarios where we might want a parameter to have a default value when used as a switch, or accept a specific value as input as well. This is one of those scenarios.

Are these new parameters being proposed only for the -ScriptBlock parameter sets? Or for all parameter sets. Maybe we could have a new poster boy for parameter set nightmares. Let's see...31 parameter sets times 5 new incompatible, optional switches per parameter set, so 135 parameter sets in total. 馃ぃ

Also FWIW, I would agree with all 5 new parameters to cover all options, especially given @mklement0's argument for -Last.

All 28 comments

Eh, I'd say the only reason .Where() handles that is because we don't have a .Select() magic method.

I don't see a real need to add this to Where-Object given that Select-Object already performs this function. 馃し

@vexx32:

Select-Object is for _transforming_ input and/or extracting objects _by position_, which is completely separate from the _filtering_ that Where-Object performs.

As an aside, there's no need for a .Select() array method, because the .ForEach() array method and indexing fulfill that role for in-memory collections.

The point is that .Where() provides useful functionality for in-memory collections that would be equally useful for Where-Object's pipeline processing, and in the case of -First offers an important performance optimizations.

Select-Object -First $x already short-circuits the pipeline.

Adding it to Where-Object as well would 1) duplicate the code path (unless you move that code somewhere common) and 2) not really gain a whole lot above that, I wouldn't think.

You misunderstand the intent of what I'm proposing:

  • Select-Object -First $n _unconditionally_ selects the _first $n_ input objects (possibly modified by the equally positional -Skip $m).

  • Where-Object { <condition> } -First would select (only) the first input object _that matches the filter criterion_.

E.g.:

# WISHFUL THINKING
PS> 'foo', 'bar', 'baz', 'bar', 'bar' | Where-Object { $_ -eq 'bar' } -First
bar  # only the first 'bar' match

The above is the pipeline equivalent of the following in-memory operation based on the Where()_method_:

PS>  ('foo', 'bar', 'baz', 'bar', 'bar').Where({$_ -eq 'bar' }, 'First')
bar

No, I understand well enough.

I'm just saying I don't think that's meaningfully or usefully better than just doing $stuff | Where-Object { $_ -eq 'bar' } | Select-Object -First 1

No, I understand well enough.

I guess I was confused by you repeating the cumbersome workaround already shown in the OP.

First, the -First and -Last switches alone have two benefits:

  • primarily: concision

  • secondarily: performance (though probably negligible in practice)

Second, the -SkipUntil, -Until, -Split functionality doesn't even have straightforward-but-cumbersome Select-Object workarounds.

These are natural extensions, especially given that the conceptually equivalent _method_ already offers these features.

On a more general note: Duplicating functionality is nothing new, and has proven helpful in other cases:

Get-Content -First 2 file.txt
# vs.
Get-Content file.txt | Select-Object -First 2

'foo' | Select-String 'o' -Raw
# vs.
'foo' | Select-Sting 'o' | Select-Object -ExpandProperty Line

In the case of Get-Content it's able to take additional shortcuts due to how the IO reading works.

For the case of whether there's a switch/parameter for Where-Object vs Select-Object I think there's even less reason to do it. If you really want it I guess I have nothing specifically against it, I just don't consider it worth the time to do. 馃檪

While that is a matter of preference with respect to -First and -Last, your Select-Object workaround doesn't apply to the remaining 3 switches proposed: -SkipUntil, -Until, -Split

Yeah, those are probably worth adding and more aligned with Where-Object's general role IMO.

@vexx32, let me try to summarize, and, if we agree, I suggest we hide all previous comments:

The gist of this proposal is to bring valuable filter functionality that currently only exists in the _in-memory_ .Where() array method to its _pipeline_ counterpart, Where-Object.
(Always using .Where() isn't an option with large input sets, because you'd have to collect all input in memory first.)

The proposed new switches, -First, -Last, -SkipUntil, -Until, -Split would directly correspond to the WhereOperatorSelectionMode operation enumeration values (except for Default) accepted by the .Where() method.

You, @vexx32, disagree with this proposal _in part_:

  • You agree that the functionality of -SkipUntil, -Until, -Split is worth adding,
  • while thinking that -First and -Last aren't needed, because you can pipe to Select-Object -First 1 and Select-Object -Last 1 instead and therefore do not strictly need to this functionality in Where-Object itself.

I disagree with this for the following reasons:

  • Concision and convenience: 1..1e6 | Where-Object { $_ -eq 10 } -First is much shorter to type than
    1..1e6 | Where-Object { $_ -eq 10 } | Select-Object -First 1

    • There is precedent for duplicating functionality in-cmdlet that can also be had by combining cmdlets, notably Get-Content -First $n instead of Get-Content | Select-Object -First $n and Select-String -Raw instead of Select-String | Select-Object -ExpandProperty Line
  • Symmetry: Only adding _some_ of the operations of the analogous .Where() method is awkward.

  • Performance: While the performance gain is negligible in the -First case, with -Last it matters:

    • If many input objects match in a Where-Object call, an internal -Last implementation could keep only the most recent match and replace it if another one comes around; by contrast, if the -Last functionality is provided by Select-Object, Where-Object would send _all_ matches through the pipeline before Select-Object can select the last.

What about new commands? Select-Object -First 1 doesn't feel great interactively. That's why I have first, last, skip, and at (indexing) defined in my profile (Source).

e.g.

gci | skip 2 | at -1 | % children | first

Also would be nice to be able to stop the pipeline without killing it completely like Select-Object -First 1 does (well, an easier way then the nightmare of reflection and implementation detail abuse I linked above).

If this moves forward, I'd love to see -First and -Last implemented as <Switchable[int]> values, so that you don't need to provide a number if you are getting the very first one or the very last one. If you don't know what I mean, I proposed on another PR that there are scenarios where we might want a parameter to have a default value when used as a switch, or accept a specific value as input as well. This is one of those scenarios.

Are these new parameters being proposed only for the -ScriptBlock parameter sets? Or for all parameter sets. Maybe we could have a new poster boy for parameter set nightmares. Let's see...31 parameter sets times 5 new incompatible, optional switches per parameter set, so 135 parameter sets in total. 馃ぃ

Also FWIW, I would agree with all 5 new parameters to cover all options, especially given @mklement0's argument for -Last.

@SeeminglyScience, those are interesting ideas, but it's really a separate discussion.

@KirkMunro :

馃榿 re parameter sets, though perhaps that will inspires us to improve matters there, such as implementing simpler mutual exclusion... - ideally, we wouldn't have to restrict this to the script-block parameter set solely for technical reasons.

Re <Switchable[int]> - something like that was also proposed in #12104, which I mention in the OP (couldn't find the PR you're referring to).

While I like the idea in general, in this particular case I'm leaning toward option C from the (updated) OP: A separate, optional -Count <int> parameter, given that such a quantifier applies to _all_ operations supported by the .Where() method - we would need that for full feature parity.

@mklement0: The PR discussion I was referring to can be found here.

@SeeminglyScience, those are interesting ideas, but it's really a separate discussion.

It's not, I'm suggesting the functionality be built out into separate commands instead of continuing to add the the monolith that is Where-Object. It's already got some super complicated parameter binding.

@SeeminglyScience:

It's not

It is: _Unequivocally_ for -SkipUntil, -Until, -Split, and _debatably_ for -Last (pros: symmetry, concision performance) and -First (pros: symmetry, concision)

instead of continuing to add the the monolith that is Where-Object. It's already got some super complicated parameter binding.

_Implementation_ concerns should not trump what is _conceptually_ the right thing to do.

It is: _Unequivocally_ for -SkipUntil, -Until, -Split, and _debatably_ for -Last (pros: symmetry, concision performance) and -First (pros: symmetry, concision)

I understand you disagree with the suggestion, that doesn't mean it's a separate discussion. Maybe I'm missing which part of this is clarification.

_Implementation_ concerns should not trump what is _conceptually_ the right thing to do.

Don't know what to tell ya there. It'd be nice, but implementation concerns are a big factor in what gets implemented and how.

Yeah, frankly... good luck trying to mess with Where-Object much further without introducing some _really_ nasty and hard to pin down parameter binder issues. It's already pushing the limit of what the parameter binder can handle.

If you wanna try, go right on ahead, but I have my doubts this is likely to be implemented otherwise. ^^

Maybe I'm missing which part of this is clarification.

You could address the arguments made.

Implementation concerns are a big factor in what gets implemented and how.

You could choose to retain the intellectual freedom to distinguish between what is _conceptually_ the right thing to do from what _regrettably cannot be done due to real-world constraints_, instead of using the latter to argue against the former.

You could address the arguments made.

I'm not trying to be snarky I really don't see them. All I see is the comment I quoted and you saying it should be a different discussion.

You could choose to retain the intellectual freedom to distinguish between what is _conceptually_ the right thing to do from what _regrettably cannot be done due to real-world constraints_, instead of using the latter to argue against the former.

I really don't know what you want from me here. We're talking about what thing gets implemented. There's a lot of great ideas that aren't feasible, if it's not actionable there isn't a lot point talking about them.

I'm not trying to be snarky I really don't see them.

I've tried my best to lay them out. Reaching a shared understanding is always what I strive for, but if points made aren't even acknowledged as such, it is time to stop.

I really don't know what you want from me here

I want you to realize that it's important to always acknowledge what is at least _hypothetically_ the right thing to do, even if it cannot (currently) be done (and, for the record, I'm not saying that this is necessarily the case here).

Conversely, I want you to realize that it is detrimental to champion / advocate against something solely on the basis of (current) real-world constraints, thereby obscuring what could (some day) be a better solution.

I've tried my best to lay them out. Reaching a shared understanding is always what I strive for, but if points made aren't even acknowledged as such, it is time to stop.

All I was asking for is clarification to what you want a response to, just a link. If you want to move on, that's fine too.

I want you to realize that it's important to always acknowledge what is at least _hypothetically_ the right thing to do, even if it cannot (currently) be done (and, for the record, I'm not saying that this is the case here).

Conversely, I want you to realize that it is detrimental to champion something solely on the basis of (current) real-world constraints, thereby obscuring what could (some day) be a better solution.

If it's not feasible, I'm not going to argue about why something isn't a good idea conceptually. I may not even spend much time considering if it's a good idea, assuming the fix that would make it feasible isn't likely to happen in the nearish future.

If I bring up an insurmountable implementation challenge, and then spend several paragraphs tearing the idea apart, that doesn't really help anyone. If/when it becomes actionable, that's when it's useful to debate. The thread can always be bumped, or a new issue can be made.

@vexx32, I forgot to address your comments:

After this summary was posted to bring closure to my misinterpretation of your objections and to summarize the state of the discussion, I hid my comments that related to this part of the exchange. Your comments above the linked comment - now lacking context without their since-hidden responses - are still visible and create a needless distraction: please hide them too.

As for the evolution of your stance:

Going from this (referring to the -SkipUntil, -Until, -Split switches among the proposed ones):

Yeah, those are probably worth adding and more aligned with Where-Object's general role IMO)

to this:

Yeah, frankly... good luck trying to mess with Where-Object much further without introducing some really nasty and hard to pin down parameter binder issues.

without even so much as acknowledging this baffling change of heart - let alone offering how, if the functionality is deemed worthwhile, it should be offered _differently_ - is perhaps even more disconcerting and disheartening than the I'm-dismissing-this-for-implementation-concerns-alone logic discussed above.

If that baffles you, you've clearly never tried to add parameters to Where-Object ^^

We can talk philosophy all day, but I find it exhausting and wasteful to continue discourse which can derive no useful and/or practical solution. 馃檪

is perhaps even more disconcerting and disheartening than the I'm-dismissing-this-for-implementation-concerns-alone logic discussed above.

Come on man, is that really necessary? We can have a minor disagreement and still remain civil.

@SeeminglyScience:

What is necessary is to address the issue of how this debate unfolded so as to inform future debates, and my comment was part of that.

My intent was not to be uncivil: perhaps I was dramatic in my (genuine) expression of my disappointment, but that was only the _garnish_ to the arguments made, and I ask you to focus on that, just as am I trying to see past the "Ehs", "馃し", "Don't know what to tell ya"s, "Yeah, frankly"s, "good luck!"s, and "man"s - civility is in the eye of the beholder.
But, point taken: I will try without expressions of disappointment going forward.

So let us exhaust the exhaustion, and lay waste to the wastefulness:

@vexx32:

If that baffles you, you've clearly never tried to add parameters to Where-Object ^^

My bafflement was about something different: It was about first saying "let's do this" and then seamlessly transitioning to "good luck trying to do that!", and considering that to be the end of the discussion.

Again, having clarity on what _should be done_ even if it _currently cannot be done_ and to also _communicate it that way_ is important - it is an investment in a potentially better future.

Conversely, the clarity on what should be done informs the investigation into whether it _can_ currently be done, after all.

Concluding the discussion with "Good luck trying to do that!" thwarts both those goals.

Notice how @KirkMunro was the first to bring up implementation concerns above, yet he commendably did _not_ use that to dismiss the proposal.

Returning to the issue at hand and its implementation challenges:

  • The proliferation of parameter sets can be worked around by implementing the mutual exclusion inside the command - not great, but doesn't impact users much.

  • Re stopping the pipeline prematurely, which is necessary to implement the proposed behaviors efficiently: Agreed that it should be done in a way that gives the other cmdlets in the pipeline a chance to run their End blocks, as @SeeminglyScience has advocated, albeit in the context of an unrelated proposal that couldn't be used to implement the bulk of this proposal's functionality, but that too is a separate issue: #7930

But, point taken: I will try without expressions of disappointment going forward.

You can express disappointment without being disrespectful. Specifically "than the I'm-dismissing-this-for-implementation-concerns-alone logic discussed above." is incredibly rude. I'm no stranger to folks being rude on the internet and typically I don't say anything, but if you actually want to have productive discussions that's a sure fire way to kill that possibility.

My bafflement was about something different: It was about first saying "let's do this" and then seamlessly transitioning to "good luck trying to do that!", and considering that to be the end of the discussion.

Folks are free to change their mind in a discussion. The 32 already existing parameter sets were probably not at the top of his mind. Also saying something "is probably worth adding" is not a promise to fight for the thing you want.

Again, having clarity on what _should be done_ even if it _currently cannot be done_ and to also _communicate it that way_ is important - it is an investment in a potentially better future.

This is your issue. While it's certainly a nice bonus for folks to come up with an alternate solution to implement the thing you are asking for, it is not a requirement or even an expectation.

Notice how @KirkMunro was the first to bring up implementation concerns above, yet he commendably did _not_ use that to dismiss the proposal.

I want to point out that at no point did I actually dismiss anything. I proposed an alternate solution, and you dismissed it as irrelevant to the conversation completely. For the majority of the discussion following that, I was mainly asserting that it did not belong in a different issue.

More than that though, I don't know why you find the idea of dismissing an implementation due to concerns of said implementation to be so horrible. The benefit of a feature or change always has to out weigh the amount of effort and risk involved.

I know this applies to you as well, but please keep in mind none of us are being paid to be here. If someone chimes into a thread with an implementation concern, it's not fair for you to demand that they also debate you on the merits of the idea that they do not think is actionable.

You can express disappointment without being disrespectful.

Like civility (and flippancy), disrespect is in the eye of the beholder. To me, "I'm-dismissing-this-for-implementation-concerns-alone" still seems like a condensed, but accurate summary of your comments.
I'm not taking responsibility for this being "incredibly rude", but I'm sorry that your perception of it as such caused you distress - it wasn't my intent.

Folks are free to change their mind in a discussion.
not a promise to fight for the thing you want.

Absolutely, but that wasn't my point at all.

While it's certainly a nice bonus for folks to come up with an alternate solution

That's not what I was asking for, though it would indeed be nice.

I proposed an alternate solution

I've argued multiple times why your alternate solution isn't one (to recap again: you can't use it to implement the bulk of the functionality in this proposal - perhaps I should have led with that, but it seemed obvious to me), but there was no engagement on these points.

I don't know why you find the idea of dismissing an implementation due to concerns of said implementation to be so horrible

I don't find it horrible either - it _may_ just be a fact of life, recognized as such _after an earnest investigation_ - but, again, not my point.

If someone chimes into a thread with an implementation concern, it's not fair for you to demand that they also debate you on the merits of the idea that they do not think is actionable.

What is fair is to _expect_ and _ask for_ - not demand - that implementation concerns be presented in a _constructive_ manner, and a flippant about-face to "Good luck with that!" doesn't qualify.

Similarly, it is fair to ask that the merits of the proposal be evaluated _separately_, because even if something cannot get implemented _now_, it's still valuable to have clarity on what _some day_ can be done, perhaps in a different context.

And it is perfectly fine for someone to only contribute to the former, constructively, without wanting to engage in the latter (which some may consider wasted effort) - but that's not what happened here.

A constructive discussion can be had that way, and those that agree that the proposal has merit can then discuss the actual implementation concerns to assess whether it can be done and, if so, with how much effort.

It's ultimately always a tradeoff, but it should be driven by clarity on the merits of the proposal. And only with an interest in the proposal is someone likely to expend energy _investigating_ the implementation challenges.
Conversely, using a lack of interest / objection to the proposal to make a categorical claim of infeasibility is unhelpful.

Of course, the outcome may still be that implementation is currently not feasible, but that determination would be the result of a constructive collaborative process, not a foregone conclusion.

Was this page helpful?
0 / 5 - 0 ratings