At the moment, PowerShell doesn't really have any reliable and syntactically tidy methods of comparing two arrays.
There is probably a lot more that could be done in this area, but one possible addition is a subset operator. We can -contains and -in, but they only work for arrays being compared against singletons; what we need is a -subset operator or similar.
Is -Intersects an appropriate verb there?
The standard operators could definitely use better collection against collection comparisons.
That would be a great expression-mode complement to the improvements to Compare-Object proposed in #4316.
@vexx32 You might want to adjust the title - to me, 'subset' means 'subset of the members of the collection' which is expressed in PowerShell as $myarray[$start..$end]. I would also suggest against usingIntersect (System.Linq.Enumerable.Intersect) since set intersection is a different concept. The name for the LINQ comparison operator is SequenceEquals (since Equals by default means "reference equals"). Note: as a simple workaround for now, you can define an Eq method on arrays that compares them:
Update-TypeData -MemberType ScriptMethod -MemberName Eq -TypeName System.Array -Value {
param ($x)
[System.Linq.Enumerable]::SequenceEqual([object[]] $this, [object[]] $x)
}
which is used like:
PSCore (1:185) > $a=1,2,3
PSCore (1:186) > $b=1,2,3
PSCore (1:187) > $a.eq($b)
True
PSCore (1:188) > $a.eq((2,3))
False
PSCore (1:189) >
Ultimately, I don't like introducing a new operator for something so basic. The current behaviour for -eq with two arrays is to do a reference comparison of the RHS against the members of the LHS. The _only_ time you get a positive result is if you do something like:
$a=1,2,3
, $a -eq $a
(note the leading comma) which is pretty obscure. Obviously changing the behaviour of -eq is a breaking change. The question is is it worth it? My gut feel is that we should at least have the discussion.
You misunderstand, I think. I don't need an $array -isexactly $otherarray. It'd be interesting to have, perhaps, but that might be about it.
I'm looking for:
$SmallArray = 1,2
$BigArray = 1,2,3,4,5
> $SmallArray -issubsetof $BigArray
$true
@vexx32 Yes - I did misunderstand which is sort of the point :-) Can you please update the title to say -IsSubset (Is since it's a boolean operator) instead of -compare? Thanks. Note, when we get around to implementing LINQ, most of these operations will be there (Intersect, Union, etc.) courtesy of LINQ so we won't really need this type of operator (well - _operators_ since there would need to be 3 of them: -IsSubset, -cIsSubset and -iIsSubset. And the syntax would be <collection> -IsSubset <collection>[,<predicateScriptblock>. Hmm . then there would also need to be a plain -Subset, -cSubset, -iSubset triple, also taking a predicate scriptblock. - so 6 operators total if we want to be consistent and complete which would bring the total operator count to ~78.) So while I like the _functionality_ (thanks for opening the issue), I'm not enamored with the idea of adding more operators to PowerShell. Would the LINQ query "operators" be satisfactory for your purposes?
Note - there's also the alternative approach suggested by @mklement0 in issue #4316 where he proposes that set operations be added to Compare-Object that's worth considering.
I'll update the issue title. :)
I don't really know what LINQ methods would end up looking like in PS, so I can't say for sure, but if the functionality is there, that's all we need.
However, unlike #4316, these oeprators would return a boolean value. I think it perhaps is still valuable to be able to get a bool wrangled out of two arrays in this fashion without the frankly kludgy code it currently takes to do so! (Though I do love the suggestions made in that issue, some of which are just simply over my head!)
Raw LINQ will look like
$bigArray = 1..10
$smallArray = 6,4,5
if ($smallArray.Intersect($bigArray))
{"it's a subset"}
else
{"It's not a subset"}
with a predicate, it would look like
(gps sm*).Intersect((gps s*), {param ($x, $y) $x.processname -eq $y.processname})
or maybe
(gps sm*).Intersect((gps s*), {$1.processname -eq $2.processname})
if we finally get around to implementing $1, $2, $3, ... for positional arguments :-)
Personally, I appreciate the ability to define one's own arguments, but either of those work fairly well!
Thanks for the LINQ examples, @BrucePay.
We've talked about method vs. query LINQ syntax before.
Despite the proliferation, I'd love to see _all_ the LINQ set operations - Distinct, Except, InterSect, Union - as _operators too_ (even though these aren't available in C# _query syntax_); e.g.:
$array1 -intersect $array2
$array1 -union $array2
$array1 -except $array2
-distinct $array
I wonder if it's worth also implementing the optional equality-comparer script block as the 2nd RHS operand, or whether to leave this advanced feature to the methods only.
As such, I think of the Compare-Object enhancements in #4316 as the argument-mode (pipeline) _complement_ to these operators, not as an _alternative_.
As for a _Boolean_ form a la -issubset:
I don't have the full picture, but ideally we would enhance the existing -contains and -in operators to accept an array-valued RHS / LHS:
1..4 -contains 3, 2 # -> $True - a RHS is subset of LHS
3, 2 -in 1..4 # -> $True - LHS is subset of RHS
What happens currently is that the unsupported array operand is quietly _stringified_ as it would be inside an expandable string, as the following examples demonstrate:
3, 2 -in 1, '3 2', 4 # -> $True: 3, 2 was stringified to "3 2"
1, '3 2', 4 -contains 3, 2 # -> $True; ditto.
This sounds like a Bucket 3: Unlikely Grey Area change to me - i.e., it is unlikely that users depend on this current behavior.
That said, I may be missing important aspects.
I think implementing the script block parameter as the 2nd RHS operand is a good idea. It is already present in the -replace operator (although I'm not sure that's a LINQ method per se) in PS Core, and although its utility isn't always required, having it at hand without changing syntax is a good thing in my opinion.
(Also, the general habit of having script blocks as parameters to methods tends to lead to messy syntax with nested parentheses and/or braces, and it just looks off.)
Is that what's happening with -contains and -in at the moment? Woah... That explains a lot. And yes, I would love for this to be an expansion to the capabilities of -contains and -in; they're in that sort of place where you'd expect this kind of thing to work... but at the moment, it doesn't.
@vexx32:
The problem with the 2nd RHS operand is that it's potentially ambiguous:
{ 'foo' }, { 'bar' }, { 'baz' } -contains { 'baz' }, { 'foo' }, { <# ... #> }
How do you determine whether the last script block passed is part of the array or whether it is the 2nd operand, the equality-comparer?
To avoid ambiguity you'd have to _nest_ the RHS array (1st operand), which is awkward.
{ 'foo' }, { 'bar' }, { 'baz' } -contains (, { 'baz' }, { 'foo' }), { <# equality comparer #> }
Hmm, that's true.
But in honesty, it's difficult in some scenarios with .NET methods to use methods that have array parameters anyway. PowerShell reads what we normally write as arrays as different method parameters in most cases; you have to really go out of your way to indicate that what you're passing is an array designed to go to one parameter.
It's awkward either way, in my opinion.
@vexx32:
Yes, calling _methods_ can be tricky business, given that you're entering a world that wasn't designed with PowerShell in mind. Generally, PowerShell is helpful in anticipating what your intent is when you call methods, but there are definite pitfalls, some of them unavoidable, because they are outside of PowerShell's control, as this striking example demonstrates.
Thus, PowerShell-native features are always preferable.
In this case, however, it seems to me that the only way to get a familiar and predictable PowerShell experience is by _not_ exposing the advanced equality-comparer feature - unless there is a syntax we haven't considered yet that is more PowerShell-like.
Even the awkward _nested_ passing of the RHS array from my previous comment is ultimately ambiguous.
Consider that all operators that currently accept multiple RHS operands (-replace, -split, am I missing any?) treat the list of operands as an _array_ and rely on the operands themselves to be _scalars_, which avoids any ambiguity.
So, unless we can come up with an unambiguous syntax that doesn't require unfamiliar syntax for the majority uses cases (where the default equality comparer will do), I suggest we leave the advanced use case of a custom equality comparer to the realm of methods.
I don't think there's a really good way to do this without altering the default array syntax, honestly, or creating a distinct difference between method parameter delimiters and array item delimiters, e.g.,
[string]::Split('a', 'b', 'c' | 4 | [System.StringSplitOptions]::None)
(That would also be terrible because pipeline, but I'm not sure there're any good candidates for such a thing.)
Most helpful comment
Thanks for the LINQ examples, @BrucePay.
We've talked about method vs. query LINQ syntax before.
Despite the proliferation, I'd love to see _all_ the LINQ set operations -
Distinct,Except,InterSect,Union- as _operators too_ (even though these aren't available in C# _query syntax_); e.g.:I wonder if it's worth also implementing the optional equality-comparer script block as the 2nd RHS operand, or whether to leave this advanced feature to the methods only.
As such, I think of the
Compare-Objectenhancements in #4316 as the argument-mode (pipeline) _complement_ to these operators, not as an _alternative_.As for a _Boolean_ form a la
-issubset:I don't have the full picture, but ideally we would enhance the existing
-containsand-inoperators to accept an array-valued RHS / LHS:What happens currently is that the unsupported array operand is quietly _stringified_ as it would be inside an expandable string, as the following examples demonstrate:
This sounds like a Bucket 3: Unlikely Grey Area change to me - i.e., it is unlikely that users depend on this current behavior.
That said, I may be missing important aspects.