Copied from TFS item 819298
Original Connect item: https://connect.microsoft.com/PowerShell/feedback/details/635454/add-support-for-linq
When LINQ is available, it can push queries into the native language of the underlying storage without requiring users to learn that query language. This transformation can speed up queries to perform hundreds times faster, and when the alternative is to ship large amounts of data over the wire to be filtered locally, thousands of times faster.
In many cases using LINQ can convert a script which ran so slow as to be infeasible into scripts that we can run every day, and in fact, can run whenever we like -- and it also drastically reduces CPU impact.
For a trivial example, see this post on Stack Overflow http://stackoverflow.com/questions/4559233/technique-for-selectively-formatting-data-in-a-powershell-pipeline-and-output-as where the switch to using LINQ (by embedded the LINQ into a C# function via Add-Type and calling it from PowerShell) provided 1000 times speedup. It's currently not possible to write that LINQ query at all in pure PowerShell, and the alternative (as it was written in the other answer on that StackOverflow question, using Where-Object) is extremely slow even on modest data sets.
The bottom line is that the Where-Object and Select-Object cmdlets are very slow, and using them to query databases, and other remote systems results in not only the slowdown of processing thousands of items in a Where-Object, but potentially sending them all over the wire as well.
Since LINQ to events is becoming available through the Rx project, it's more useful, and more important than ever, and we really need a way to write LINQ natively in PowerShell.
Oh please, oh please, oh please ... :wink:
(@powercode you'll probably be interested in this discussion).
Some random thoughts on what implications LINQ/extension methods have for PowerShell...
@Jaykul I added an updated answer to your answer on StackOverflow that replaces all the pipeline use with .where()
. This removes the last usage of the pipeline from the code. The result is pretty fast.
@vors The example that @Jaykul provided used XLinq
which is perfectly usable from PowerShell as it stands. (XLinq isn't "really" LINQ anyway, it's just an alternative model for manipulating XML).
To see what the experience might be like, I've been playing with wrapping the [Linq.Enumerable]
methods in ScriptMethods
so I can write things like:
# Find processes with the most threads on this machine
(Get-Process).select{
param($p)
[pscustomobject] @{
Name=$p.Name;
Id=$p.Id;
Threads=$p.Threads.Count}}.
OrderByDescending{param ($p) $p.threads}.
Take(10)
The result is usually somewhat to much faster than the pipeline, even without lazy evaluation and a lot of copying. But it's also rather more complex to use. Now, while I am personally fond of this sort of thing, is this the experience we want to provide? For example, @mklement0 seems adamantly opposed to pushing methods as a first-class experience. So perhaps we should have a real language experience like from $x where $_ .Size -gt 10 select ...
. Or a set of cmdlets that built up the query. Or even automatically rewrite parts of the pipeline into LINQ queries to make regular Where/Select/etc. automatically faster. Also note that there is a LINQ for PowerShell module on the gallery. It's incomplete and hasn't been updated in a couple of years (only has 1200 downloads) so maybe people really aren't that keen on LINQ in PowerShell after all.
And finally, what are the core scenarios for doing this? Is it just to be faster (a good thing)? Or is there specific functionality that you want to unlock. What about LINQ-to-SQL? We have classes now. Is this something we should invest in?
Another "stunt-query" that gets the 10 most common words from the text of "War and Peace". It uses a few extra operators I created (split(), trim() and match()) that aren't standard query operators.
$h=@{}
[io.file]::ReadAllLines("c:\temp\WarAndPeace.txt").
trim().
split().
match('^[a-z]+$').
aggregate{param ($x, $y)
if ($x -isnot [hashtable]) { $h[$x]++; $x = $h } $x[$y]++; $x }.
select{param ($t)
@($t.GetEnumerator()).
select{ param ($w)
[pscustomobject] @{
Word = $w.Key;
Count = $w.Value}}.
OrderByDescending{$args[0].Count}.
Take(10)}
So perhaps we should have a real language experience like
from $x where $_ .Size -gt 10 select ....
To quote @Jaykul: Oh please, oh please, oh please ...
The query-like syntax fits better with the rest of PowerShell - a "whitespacey" experience that doesn't cause the confusion with argument mode that method syntax entails (though it's still an embedded language users will have to learn).
rewrite parts of the pipeline into LINQ queries to make regular Where/Select/etc. automatically faster.
Is that feasible? Certainly would be great.
And finally, what are the core scenarios for doing this?
As you say, a performance gain alone would be useful, but LINQ is such a wonderfully expressive, general-purpose feature that I can see its utility in a wide range of scenarios, including LINQ-to-SQL.
the confusion with argument mode that method syntax entails
I'm confused :-) What confusion? Method syntax is all expression mode. Every bit of the query I wrote above is in expression mode.
As you say, a performance gain alone would be useful,
Playing with this a bit more, I see that there will be a bound on how much faster we can get. LINQ query operators depend heavily on lambdas or, for us, scriptblocks, So the overhead of invoking a scriptblock will be a limiting factor on how fast we can get. (Perhaps things like PSLambda could help though at the risk of introducing another syntax mode. )
That's good to know; the PSLambda project sounds interesting; unless a solely method-based approach is chosen (see below), another syntax mode sounds unavoidable, though ideally it would feel like a variant of the existing (main) ones (argument mode and operators).
I'm confused :-) Method syntax is _all_ expression mode.
Every bit of the query I wrote above is in expression mode.
And my head is still spinning.
To recap previous arguments:
Method syntax invites confusion with PowerShell's shell-like argument [parsing] mode (notwithstanding the fact that the _option_ of using method syntax on .NET types directly is a great _extensibility_ feature):
From the perspective of a shell user mostly familiar with command syntax (argument mode), the tricky differences are:
,
-separated (vs. whitespace in argument mode, where ,
constructs _arrays_)With the exception of the - little-known and little-used - .Where()
and .ForEach
collection "operators" (see https://github.com/PowerShell/PowerShell-RFC/pull/126 for how to make them more PowerShell-like), PowerShell's _own_, core functionality relies only on _cmdlets / functions_ (argument mode) and _operators_ (expression mode, but _resembling_ argument mode), and my vote is to keep it that way.
@mklement0
PowerShell's own, core functionality relies only on cmdlets / functions (argument mode)
I really don't understand where you got this idea from. It has _never_ been the case that we relied only on cmdlets. There just wasn't enough time to cover all the scenarios (simple example: getting the substring of a string.) We targeted the _most common_ scenarios with cmdlets because they gave the best user experience and then depended on .NET to back fill what remained.
and operators (expression mode, but resembling argument mode),
What do you mean by "resembling" here? They are distinct modes: barewords are allowed in argument mode but not in expression mode. Operators are allowed in expression mode but not in argument mode. They both allow simple expressions (including method calls) so I suppose that's a resemblance. Method calls are part of expression mode and I can't say I've ever seen someone try to use a method like a command but I've seem a lot of instances where command invocations are written like method calls (and subsequently don't work).
@BrucePay:
I really don't understand where you got this idea from.
To be clear: only _you_ can provide behind-the-scenes glimpses - both historical and present - and I always appreciate them.
By contrast, I can only talk about de-facto behavior and _guess_ at design intent in cases where it it isn't documented.
It has never been the case that we relied only on cmdlets.
_De facto_, virtually all of PowerShell's _own_ functionality is provided by cmdlets and operators (with the exception of the useful, but little-known and little-used .ForEach()
and .Where()
collection "operator" methods that I wish had true operator representation.)
_Whether it was the original intent or not_, the following de-facto separation makes sense and is worth keeping:
Cover _basic_ needs with PowerShell's _own_ constructs - cmdlets and operators - whose syntax - while spanning two distinct parsing modes - doesn't _clash_:
For _advanced_ needs, provide full access to the .NET framework is _available_, which is a wonderful _option_ to have and sets PowerShell apart from traditional shells; however, this option comes at a cost:
You need to know more about the underlying .NET types and what methods they expose.
Get-Member
helps in discovering method _signatures_, with respect to getting _help_ you're on your own (whereas the purpose of _properties_ is usually self-evident).You need to know the subtleties of parameter binding when calling .NET methods directly.
Last not least, you need to be aware of how method syntax differs from command syntax.
It therefore makes sense to limit introduction of new PowerShell-native features to cmdlets and operators.
simple example: getting the substring of a string.
Getting a substring via the .Substring()
_method_ is actually a good example of something that is _rarely_ needed: due to PowerShell's OO nature and its excellent regex support via -split
, -match
, and -regex
you typically do _not_ need to deal with .Substring()
.
What do you mean by "resembling" here?
Understanding PowerShell's two fundamental parsing modes - argument mode vs. expression mode - is a challenge any PowerShell user faces, but it's the unavoidable price to pay for a shell that also offers the constructs of a "real" programming language (again, kudos to PowerShell).
So, yes, operators belong to the real of _expressions_, where string literals cannot be barewords (unquoted), for instance.
Aside from that, however, operators being separated from their operands by _whitespace_ (rather than (...)
and ,
- though ,
has a role in some operators too) is what I meant by "resembling" argument mode.
A negative way of phrasing it: Using literals (quoting need aside) and variable access - including _property_ access on variables - _doesn't clash_ with argument-mode syntax the way that _method_ calls do.
$arr = 'Call', 'me', 'Ishmael'
# Operator syntax - reminiscent of argument mode (whitespace as separator, no (...))
$arr -join ' '
# Method syntax - uses (...) and "," - both of which function differently in argument mode.
[string]::Join(' ', $arr)
I've seem a lot of instances where command invocations are written like method calls (and subsequently don't work).
Indeed, and that's why introducing method calls for basic PowerShell functionality is problematic:
# The web is full of examples of confused syntax like this:
New-Object Foo.Bar($arg1, $arg2) # should be: New-Object Foo.Bar [-ArgumentList] $arg1, $arg2
Set-StrictMode -Version 2
is living proof that the confusion is real:function foo { param($bar, $baz) "[$bar], [$baz]" }
foo(1, 2) # binds *array* 1, 2 to $bar only - BREAKS with Set-StrictMode -Version 2 or higher.
Where-Object
/ ForEach-Object
is a nod to how argument-mode syntax is easier to type and read.@mklement0
De facto, virtually all of PowerShell's own functionality is provided by cmdlets and operators
Except for little things like the _entire language_ (statements, etc.). And types (casts). And the ability to call methods. If we hadn't wanted people to use methods we wouldn't have implemented the feature. And we did think a lot about the feature before adding it. The intended experience is not binary, it's a continuum. As you grow with experience, your abilities will also grow. To quote Jeffrey, PowerShell has an extremely wide dynamic range.
BTW: when I say _types_, I'm talking about things like this extremely old joke:
-join [char[]] ([int[]] [char[]] 'hal'| foreach {$_+1})
substring ... a good example of something that is rarely needed:
Hmmm ... I get 2.9 million hits for "powershell substring". Unfortunately. But you're right - we should have cmdlets for basic data manipulation . Please support @FriedrichWeinmann's issue Commands for string methods & operators on the pipeline
Your example
$arr = 'Call', 'me', 'Ishmael'
is all expression mode; whitespace is unnecessary as in
$arr-join' '
And finally
($arr -join ' ').Count
[string]::Join(' ', $arr).Count
All in expression mode, parens are required around both to get the count.
Indeed, and that's why introducing method calls for basic PowerShell functionality is problematic:
No, that's not problem. The people who make these mistakes are invariably programmers who are used to methods in other languages and try to write PowerShell like it was C# (or whatever). They may not even be aware that PowerShell also has methods.
(You know, it just occurred to me how ironic it is to be having a discussion about how methods are _bad_ in an issue proposing the addition of extension methods.:-))
@BrucePay:
Except for little things like the entire language (statements, etc.). And types (casts).
Yes, I didn't mention the language constructs explicitly, but I did say this:
Understanding PowerShell's two fundamental parsing modes - argument mode vs. expression mode - is a challenge any PowerShell user faces, but it's the unavoidable price to pay for a shell that also offers the constructs of a "real" programming language.
This should tell you that my concern is not about expression mode / control-flow statement syntax _per se_ - on the contrary.
And:
For advanced needs, provide full access to the .NET framework is available, which is a wonderful option to have and sets PowerShell apart from traditional shells; however, this option comes at a cost:
This should tell you that your saying "If we hadn't wanted people to use methods" is refuting a point that I never made.
it's a continuum
With methods, it is not, for the reasons stated (syntax confusion, parameter-binding pitfalls, separate knowledge domain).
But - again - it's a is wonderful _advanced_ option to have.
Please support @FriedrichWeinmann's issue Commands for string methods & operators on the pipeline
Looks like a great idea (I'd already up-voted it).
BTW: when I say _types_, I'm talking about things like this extremely old joke:
I get the joke, but I don't know what you mean re types.
Your example
$arr = 'Call', 'me', 'Ishmael'
is all expression mode; whitespace is unnecessary
Just to drive the point home: my arguments were never about _expression mode_ per se. They were about _method syntax_ and the additional domain of knowledge you enter when using them (and, again, that is a wonderful _advanced_ option).
Yes, you _can_ omit whitespace (as you can _in part_ in argument mode, Get-Item /|select name
), but I hope we can agree that it is ill-advised, because the easy readability of of commands an operator-based expression comes from using whitespace).
The people who make these mistakes are invariably programmers who are used to methods in other languages and try to write PowerShell like it was C# (or whatever).
Indeed, you can use PowerShell entirely as if it were C# - virtually all the tools are there. You can get away without ever calling a cmdlet or use PowerShell's (unique to it) operators.
But I hope it is self-evident that that wouldn't be a good idea and that there's a PowerShell way to do things, and if that PowerShell way functions _consistently_ with _the fewer concepts to master the better_, learning it is easier.
For someone who doesn't come from a programming background, argument mode (command-line syntax) and expression mode _with properties_ is a manageable entry point, with an _option_ to _later_ branch out into the method world, as needed.
They may not even be aware that PowerShell also has methods.
With the exception of .Where()
and .ForEach()
, PowerShell _itself_ has no user-facing methods that I am aware of (I'm not talking about the API), and my suggestions is that we keep it that way for a clean separation between PowerShell's own core functionality and the realm of .NET in general, which avoids conceptual and syntax challenges especially for _beginners_.
For people who _want and need_ method calls, the door is wide open and - have I mentioned this before? - it is a great option to have.
(You know, it just occurred to me how ironic it is to be having a discussion about how methods are bad in an issue proposing the addition of extension methods.:-))
Let me know if you still feel that I've ever made the point that "methods are bad".
There's also no need to _choose_ in this particular case; just like C#, we could provide _method syntax _and_ query syntax.
In the case of method syntax, however, to the user this would be just another door into the realm of the underlying .NET world (no matter how much work is needed behind the scenes to "pass the functionality through").
By contrast, query syntax, such as your from $x where $_ .Size -gt 10 select ...
example, with its whitespace-based token separation and absence of parentheses and lambdas, feels like a more natural PowerShell fit to me.
On a meta note: I feel that a lot of effort in this debate was spent on not discussing the actual points that I made. Were there specific ways in which I presented my arguments that obscured their intent?
@SteveL-MSFT We definitely need an RFC with detailed design for this, if only because the standard query operator Where()
collides with out existing Where()
method. There's also lazy vs strict semantics to consider, fat-arrow lambdas and I'm sure a bunch of other things I can't think of right now.
@BrucePay - if the existing Where
method turns into a proper extension method, then overload resolution should avoid any issues with the LINQ method because the identity conversion on the first argument (a scriptblock) would be better than the conversion from scriptblock to delegate.
Do you think we should have special syntax for finding extension methods?
like
using extensionsfrom "System.Linq.Enumerable, System.Linq"
Or reuse the existing like
using assembly System.Linq
I strongly believe we should only get extension methods statically, i.e. what is available at parse time, not by finding the best match at runtime.
I think using assembly System.Linq
is sufficient and consistent with C#.
I'd like to just side-step @mklement0's opinions about members. PowerShell is an _object oriented_ .NET programming language. Properties and Methods are a _core part of OOP_ and PowerShell's ETS system is all about adding more of them -- the idea that Methods are somehow "un" PowerShell is preposterous.
So, for the record ...
I think it would be wonderful to have LINQ Query syntax.
I also think it would be awesome to have pipeline syntax that was optimized against LINQ extensions. Do you remember the stuff that Bart de Smet wrote while trying to make "Rx" work for PowerShell pipelines? It was complicated to work with, but exposed awesome functionality!
But it should be clear that although query syntax may have been the original reason for adding extension methods to the .NET Framework, the reality is that extension methods are used extensively throughout every type of library and API. The whole "fluent programming" fad was basically built upon them. We can't avoid exposing extension methods _as methods_ by adding a query syntax: there are lots of extension methods that are really painful to call statically, and lots of them require generic type arguments which makes them _basically impossible_ to call...
P.S.: I think it is a well known truism about PowerShell that Methods vs. Pipeline is a tradeoff. The PowerShell pipeline is fluent and expressive, and has the lower _memory usage_ of streaming actions on objects instead of storing the whole collection in memory -- similar to the benefits of LINQ. However, pipelines require cmdlets, which have significant performance overhead -- so despite the benefits of pipelines, chaining methods and pure language
without cmdlets is nearly always faster, and extension methods (and in particular, LINQ collection methods) will continue that expectation.
@Jaykul:
I think it would be wonderful to have LINQ Query syntax.
I'm glad to hear it.
I too think that exposing LINQ functionality via methods _as well_ is definitely called for:
generally, providing access to [LINQ] extension methods more thoroughly fulfills PowerShell's promise of full access to the .NET framework.
specifically, certain LINQ features are accessible via method syntax only.
We therefore needn't continue the methods-in-PowerShell debate here, so let me try to bring closure to it:
PowerShell is an object oriented .NET programming language.
PowerShell is also a _shell_, and the mix of syntax modes - unavoidable, as discussed - is a challenging thing to master. While the OOP aspect of accessing _properties_ is easy to grasp, _methods_ can cause confusion and present additional challenges (to briefly recap: syntax confusion with command mode, parameter-binding subtleties, introduction of a new knowledge domain).
As stated many times before, having the _option_ to take full advantage of .NET types and their methods is unequivocally wonderful and sets PowerShell apart from other shells.
That option, however, is an _advanced_ feature - as is the ETS:
Writing your own ETS definitions as an end user is definitely an advanced task (irrespective of whether you use it to implement properties or methods).
PowerShell built-in ETS definitions are transparent to the end user and in the vast majority of cases add _properties_, not methods (in fact, if we leave .ToString()
aside, which users needn't and are unlikely to call directly, the PSv5.1 ETS definitions define only 2 methods overall, for the Microsoft.Management.Infrastructure.CimInstance#MSFT_*
types: GetSecurityDescriptor()
and .SetSecurityDescriptor()
.)
In short, the current (largely de-facto) separation - cmdlets and operators for PowerShell's own functionality (API aside), with optional access to the full .NET framework if and as needed for advanced tasks - is useful and worth keeping.
As stated, adding support for LINQ falls into the full-access-to-the-full-.NET-framework category and it just so happens that LINQ's query syntax - to me - fits in better with PowerShell syntax; additionally, adding PowerShell-like syntactic sugar sounds like a worthwhile endeavor (hinted at in @BrucePay's example: from $x where $_ .Size -gt 10 select ...
vs. from $obj in $x where $obj.Size -gt 10 select ...
)
On a meta note:
Please stop calling others' arguments "preposterous". It adds nothing to the discussion and only serves to antagonize.
@PowerShell/powershell-committee reviewed this and agrees an RFC should be authored
@Jaykul I found only that link on Bart de Smet's blog LINQ THROUGH POWERSHELL
Are you speaking about this implementation ?
I try with an updated Linq.Dynamic https://github.com/StefH/System.Linq.Dynamic.Core
Add-Type -AssemblyName System.Data
Add-Type -AssemblyName System.Data.DataSetExtensions
Add-Type -AssemblyName System.Data.Linq
Add-Type -Path "$pwd\System.Linq.Dynamic.Core\1.0.8.9\lib\netstandard2.0\System.Linq.Dynamic.Core.dll"
-------------------------------------------------------------------------------------
using namespace System.Collections
using namespace System.Data
using namespace System.Linq
using namespace System.Linq.Dynamic.Core
$t1 = [DataTable]::new()
$t1.Columns.AddRange(@(
[DataColumn]@{ ColumnName = 'FundId' ; DataType = [int] }
[DataColumn]@{ ColumnName = 'Date' ; DataType = [datetime] }
[DataColumn]@{ ColumnName = 'CodeA' ; DataType = [string] }
[DataColumn]@{ ColumnName = 'Orders' ; DataType = [int32[]] }
))
$t1.Rows.Add(1, [DateTime]::Now, 'A1', [int32[]]@(1) ) > $null
$t1.Rows.Add(2, [DateTime]::Now.AddDays(-365), 'A2', [int32[]]@(1,2,3,4) ) > $null
$t1.Rows.Add(3, [DateTime]::Now.AddHours(-12), 'A3', [int32[]]@(1,2,3) ) > $null
$sourceTable = [Queryable]::AsQueryable([DataTableExtensions]::AsEnumerable($t1))
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
$sourceTable
FundId Date CodeA Orders
------ ---- ----- ------
1 28/05/2018 10:02:51 A1 {1}
2 28/05/2017 10:02:51 A2 {1, 2, 3, 4}
3 27/05/2018 22:02:51 A3 {1, 2, 3}
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[DynamicQueryableExtensions]::Where($sourceTable, "DateTime(Date) >= DateTime(2018, 1, 1)")
FundId Date CodeA Orders
------ ---- ----- ------
1 28/05/2018 10:01:30 A1 {1}
3 27/05/2018 22:01:30 A3 {1, 2, 3}
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[DynamicQueryableExtensions]::Where($sourceTable, "int32(FundId) > 2")
FundId Date CodeA Orders
------ ---- ----- ------
3 27/05/2018 22:02:51 A3 {1, 2, 3}
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[DynamicQueryableExtensions]::Where($sourceTable, 'int32(Orders.Count) >= 3')
An error occurred while enumerating through a collection: Target object is not an ExpandoObject.
At line:2 char:1
+ [DynamicQueryableExtensions]::Where($sourceTable, 'int(Orders.Count) ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (System.Linq.Enu...m.Data.DataRow]:WhereEnumerableIterator`1) [], RuntimeException
+ FullyQualifiedErrorId : BadEnumeration
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[DynamicQueryableExtensions]::Select($sourceTable, 'new(FundId as Id, Date as LastWriteTime)')
{ Id = 1, LastWriteTime = 28/05/2018 10:02:51 }
{ Id = 2, LastWriteTime = 28/05/2017 10:02:51 }
{ Id = 3, LastWriteTime = 27/05/2018 22:02:51 }
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
$Query = [DynamicQueryableExtensions]::Where($sourceTable, "DateTime(Date) >= DateTime(2018, 1, 1)")
$Query.ToString()
$Query = [DynamicQueryableExtensions]::OrderBy($Query, 'CodeA descending')
$Query.ToString()
System.Data.EnumerableRowCollection`1[System.Data.DataRow].Where(Param_0 => (Convert(Param_0.get_Item("Date")) >= new DateTime(2018, 1, 1)))
System.Data.EnumerableRowCollection`1[System.Data.DataRow].Where(Param_0 => (Convert(Param_0.get_Item("Date")) >= new DateTime(2018, 1, 1))).OrderByDescending(Param_1 => Param_1.get_Item("Co
deA"))
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Poke
Hi guys,
Just discovered what LINQ was, and how it can accelerate some of my data manipulation I'm doing on a few projects.
Has this died? Where do we stand on this?
There isn't many people using LINQ, because PowerShell is still a growing language. Most of the people who picked up PowerShell in the first place where administrators, and their needs for LINQ is quite limited. If we want more types of users to use PowerShell as their daily driver (DataScience, Reporting, BigData), we should introduce more ways for us to interact with the data in a user friendly way. If not cooked in, at least as a robust module.
The question is, how can we simplify the process of invoking a LINQ expression though PowerShell?
Can we do something like this:
Get-Process | Invoke-LINQExpression -Command 'from p in items select p).GroupBy(g =>g.name)'
Simply parse the string into a PowerShell invokable LINQ expression?
IMO, we should keep the LINQ syntax unchanged so users who are already familiar with the language, can simply port over their expressions to powershell.
I realize this might be to 'simple', but simply exposing a simpler way to invoking LINQ expression from the pipeline, would help novice users, like myself, have an easier time accessing some of these faster speeds for these scenarios.
Or even having as a method:
$Process.LINQ('from p in items select p).GroupBy(g =>g.name)')
There's no need to do so. LINQ methods should be able to be mapped to script blocks fairly directly; I believe the main issue there is that some of them are generically-typed methods, which PowerShell doesn't yet support.
Existing methods that require delegates or similar can _already_ be used in PowerShell, via script blocks:
using namespace System.Collections.Generic
[List[string]] $Collection = 1..50
$Collection.RemoveAll(
{
param($x)
$x -gt 10
}
)
$Collection # outputs numbers 1 through 10
I think the main issue is designing adapters to work with the different types of inline functions that are available from the C# methods, and potentially also upgrading the parser to recognise and properly invoke generic methods, as many LINQ methods are generic.
Hey guys, am I missing something?
Just did some quick tests, to see the differences between between all 3 types of Where filtering:
$InputObjects = (1..500000) | Select-Object @{n="Name";e={$_}}
Write-Host "Invoking: Where-Object"
$Measure = Measure-Command {
$InputObjects | Where-Object { $_.Name -eq 5000 }
}
Write-Host "ExecutionTime: $($Measure.TotalSeconds)`r`n"
Write-Host "Invoking: .Where()"
$Measure = Measure-Command {
$InputObjects.Where({ $_.Name -eq 5000 })
}
Write-Host "ExecutionTime: $($Measure.TotalSeconds)`r`n"
Write-Host "Invoking: [System.Linq.Enumerable]::Where()"
$Measure = Measure-Command {
[System.Linq.Enumerable]::Where($InputObjects, [Func[object,bool]]{ param($x) $x.Name -eq 5000 })
}
Write-Host "ExecutionTime: $($Measure.TotalSeconds)`r`n"
Results:
Invoking: Where-Object
ExecutionTime: 87.4306334
Invoking: .Where()
ExecutionTime: 31.1690778
Invoking: [System.Linq.Enumerable]::Where()
ExecutionTime: 59.088012
I'm not seeing any speed/performance improvement in this simple example. Can someone help me improve my methodology?
I don't think there's anything wrong with your methodology -- it clearly shows that even without using PSLambda, LINQ methods are a lot faster than pipeline cmdlets.
The .Where()
easter egg method _did not exist at all_ when this issue was originally created in Connect, but it doesn't affect this story except as proof that exposing (faster) methods is useful, even when they're not very discoverable. Unfortunately, the way that .Where()
was added requires adding each method by hand, so it doesn't resolve the general request about generic methods or extension methods in general.
TANGENT: For what it's worth @mkellerman -- if you want the fastest of the fast, you just need to use PSLambda to make the delegate. I'm using a strongly typed collection here just because i prefer that when working with LINQ, but it's probably not really necessary.
Write-Host "Initializing Collection"
[System.Collections.Generic.KeyValuePair[string, int][]]$InputObjects = (1..500000).ForEach({ [System.Collections.Generic.KeyValuePair[string, int]]::new("$_", $_) })
Write-Host "Invoking: Where-Object"
$Measure = Measure-Command {
$InputObjects | Where-Object { $_.Key -eq "5000" }
}
Write-Host "ExecutionTime: $($Measure.TotalSeconds)`r`n"
Write-Host "Invoking: .Where()"
$Measure = Measure-Command {
$InputObjects.Where({ $_.Key -eq "5000" })
}
Write-Host "ExecutionTime: $($Measure.TotalSeconds)`r`n"
Write-Host "Invoking: [System.Linq.Enumerable]::Where()"
$Measure = Measure-Command {
[System.Linq.Enumerable]::Where($InputObjects, [Func[[System.Collections.Generic.KeyValuePair[string, int]],bool]]{ param($x) $x.Key -eq "5000" })
}
Write-Host "ExecutionTime: $($Measure.TotalSeconds)`r`n"
Write-Host "Invoking: [Linq.Enumerable]::Where with PSLambda"
$Measure = Measure-Command {
$lambda = New-PSDelegate { param($x) $x.Key -eq "5000" } -Delegate ([Func[[System.Collections.Generic.KeyValuePair[string, int]],bool]])
[System.Linq.Enumerable]::Where($InputObjects, $lambda)
}
Write-Host "ExecutionTime: $($Measure.TotalSeconds)`r`n"
Invoking: Where-Object
ExecutionTime: 15.1560668
Invoking: .Where()
ExecutionTime: 4.6443658
Invoking: [System.Linq.Enumerable]::Where()
ExecutionTime: 9.4843462
Invoking: [Linq.Enumerable]::Where with PSLambda
ExecutionTime: 0.1397667 馃槷
馃弮馃挩馃挩馃挩
Oh dang! That's sexy!
Okay, so how can we wrap this up better for the average PS user? I'm going to go read up on PSDelegate and PSLambda.
@Jaykul Why is PSLambda so much swifter, and how can we repurpose that perhaps in the built in conversion paths for, say, a script block delegate?
Well, the reason LINQ is so slow is that inside of the LINQ method, it's running that scriptblock (back out into PowerShell's runspace, feeding it the parameter) half a million times...
Using LINQ with PSLambda is faster because we're basically not using PowerShell at all 馃槉
As Bruce said earlier, PSLambda represents _yet another language mode_ (and one that's even more restrictive than classes). We're accepting the limitation that we can't use cmdlets in exchange for writing compile-able PowerShell, and getting a compiled (IL) .NET lambda method. Then, when we invoke the LINQ Where
, we're not calling back out into the PowerShell runspace (and copying variables around) over and over...
And of course, I had to tell it the type/signature of the lambda, too. I'm sure we could do something intelligent there (i.e. detect parameter and return types), but if we do it all transparently, it's going to be hard to explain why it's only super fast _sometimes_ (when you don't use cmdlets)...
馃
Pragmatic solution from my perspective would seem to be perhaps you just make it work when it can, and if someone uses a cmdlet in a LINQ method maybe a PSSA warning to indicate it will be slower than usual?
@Jaykul
Well, the reason LINQ is so slow is that inside of the LINQ method, it's running that scriptblock (back out into PowerShell's runspace, feeding it the parameter) half a million times...
Yeah, there is a significant amount of overhead in scope creation, and the default ScriptBlock
to delegate converter will create a new scope for every invocation.
We're accepting the limitation that we can't use cmdlets in exchange for writing compile-able PowerShell, and getting a compiled (IL) .NET lambda method.
In the case of PSLambda, it's a lot more restrictive than that. It's statically typed, has static member resolution, can't redefine variables, no pipeline support, and probably more. It could be changed to use the DLR, generate enumerable state machines for the pipeline, etc, but it would definitely take a bit of a performance hit.
we're not calling back out into the PowerShell runspace (and copying variables around) over and over...
It's worth noting that you can actually use (and mutate) variables from the scope that defined the delegate. A thread safe wrapper for PSVariable
is created if a reference is found during compilation.
And of course, I had to tell it the type/signature of the lambda, too. I'm sure we could do something intelligent there (i.e. detect parameter and return types)
Yeah, some basic support for that already exists in PSLambda, but only inside the delegate. This works for example:
$delegate = New-PSDelegate {
$ExecutionContext.InvokeProvider.ChildItem.Get('function:', $true).
Select{ $pso => $pso.BaseObject }. # Func<PSObject, object>
OfType([g[System.Management.Automation.FunctionInfo]]).
FirstOrDefault{ $f => $f.Name.Equals('TabExpansion2') } # Func<FunctionInfo, bool>
}
$delegate.Invoke()
# CommandType Name Version Source
# ----------- ---- ------- ------
# Function TabExpansion2
Support for that during psdelegate
conversion would be possible within the engine, I just didn't have any visibility into the call site from the perspective of a PSTypeConverter
@vexx32
Pragmatic solution from my perspective would seem to be perhaps you just make it work when it can, and if someone uses a cmdlet in a LINQ method maybe a PSSA warning to indicate it will be slower than usual?
It's a bit more complicated than that. That could really only be done in the most basic of expressions, otherwise behavior would differ wildly without an extraordinary amount of work to refine the compiler.
My opinion is that if a strict, statically compiled language mode were to be added then it should require some sort of explicit opt-in.
Also for reference, here's a working version of @BrucePay's stunt query
using namespace System
using namespace System.Collections.Generic
using namespace System.IO
using namespace System.Text.RegularExpressions
$delegate = [psdelegate]{
# Generic return type inference seems to be broken for SelectMany, oops
$selectManyDelegate = [Func[string, IEnumerable[string]]]{
$line => $line.Split([array]::Empty([g[char]]))
}
return ([File]::
ReadAllLines('c:\temp\WarAndPeace.txt').
Select{ $line => $line.Trim() }.
SelectMany($selectManyDelegate).
Where{ $word => { [regex]::IsMatch($word, '^[a-z]+$', [RegexOptions]::IgnoreCase) }}.
Aggregate(
<# seed: #> [Dictionary[string, int]]::new([StringComparer]::OrdinalIgnoreCase),
<# func: #> {
($map, $word) => {
if ($map.ContainsKey($word)) {
$map[$word]++
return $map
}
$map.Add($word, 1)
return $map
}
# Also a bug in finding the IEnumerable<> implementation on Dictionary<,>
}) -as [IEnumerable[KeyValuePair[string, int]]]).
OrderByDescending{ $kvp => $kvp.Value }.
Take(10)
}
$delegate.Invoke()
# Key Value
# --- -----
# The 34258
# and 21396
# to 16500
# of 14904
# a 10388
# he 9298
# in 8733
# his 7930
# that 7412
# was 7200
I recently wrote a module (EFPosh) that exposes Linq functionality through cmdlets in a PowerShell-friendly way I think. I wanted to post it here and see what people thought.
Instead of trying to figure out how to write the Func delegates in PowerShell using .Net methods, I decided to handle that on the backend and expose query functionality through these commands:
New-EFPoshQuery
Start-EFPoshQuery
Add-EFPoshQuery
A query would go like this (pseudo-code as I wrote this for EntityFramework):
New-EFPoshQuery -Object $ObjectToQuery
Add-EFPoshQuery -Property '<TabComplete list of properties>' -Equals 'Value' -And
Add-EFPoshQuery -Property '<TabComplete list of properties>' -GreaterThan 5 -Or
Add-EFPoshQuery -Property '<TabComplete list of properties>' -Contains @('Array')
Start-EFPoshQuery -ToList -Distinct
The idea is once New-EFPoshQuery is executed, tab complete works for Add-EFPoshQuery through ArgumentCompletors. This way the syntax works from the command line and through a script.
It's a little weird to think about not executing the query in the line you start it, but deferred execution is an IQueryable idea that can come to PowerShell also!
On the back end I'm using the DynamicLinq library but that is more out of wanting to get something out the door rather than a necessity. It wouldn't be difficult (and might be easier) to build the delegates without a 3rd-party library.
Here's the code if you want to see what I'm doing:
https://github.com/Ryan2065/EFPosh/tree/master/src/Module/EFPosh/Commands
GitHubContribute to Ryan2065/EFPosh development by creating an account on GitHub.
I was thinking about it, and in my module I store the IQueryable internally and let you build off of it and it's disposed when Start is run or New is re-run. Another way to write this that would be more PowerShell Friendly is this:
$Results = New-EFPoshQuery -Object $ObjectToQuery |
Add-EFPoshQuery -Property '<TabComplete list of properties>' -Equals 'Value' -And |
Add-EFPoshQuery -Property '<TabComplete list of properties>' -GreaterThan 5 -Or |
Add-EFPoshQuery -Property '<TabComplete list of properties>' -Contains @('Array') |
Start-EFPoshQuery -ToList -Distinct
So return the IQueryable and pipe it to the next command. That's more Poshey and allows you to write the query on multiple lines.
Most helpful comment
Well, the reason LINQ is so slow is that inside of the LINQ method, it's running that scriptblock (back out into PowerShell's runspace, feeding it the parameter) half a million times...
Using LINQ with PSLambda is faster because we're basically not using PowerShell at all 馃槉
As Bruce said earlier, PSLambda represents _yet another language mode_ (and one that's even more restrictive than classes). We're accepting the limitation that we can't use cmdlets in exchange for writing compile-able PowerShell, and getting a compiled (IL) .NET lambda method. Then, when we invoke the LINQ
Where
, we're not calling back out into the PowerShell runspace (and copying variables around) over and over...And of course, I had to tell it the type/signature of the lambda, too. I'm sure we could do something intelligent there (i.e. detect parameter and return types), but if we do it all transparently, it's going to be hard to explain why it's only super fast _sometimes_ (when you don't use cmdlets)...