PowerShell 🚀 - PowerShell should support creating an List similar to how it supports arrays

I like the idea but to truly impact performance you'd need to be operating on large lists. For a convenient way to create large lists, I would expect something like this to work @[Get-ChildItem C:\Windows -r -file *.dll -ea 0]. While the list literal form is nice, I don't see folks creating lists large enough to gain much of a perf benefit over using an array. Well, unless the list literal is created inside a busy (large n) loop.

rkeithhill on 7 Dec 2017

Two points:

@() says - make sure the thing inside is an array. It's not necessary if you use , because the comma operator always creates an array.
I've often wondered if the comma operator could create a list instead of an array. I have a feeling most scripts would never notice a difference because of how freely things are converted to an object array.

lzybkr on 7 Dec 2017

👍2

@lzybkr I updated my description based on @lzybkr 's comments

TravisEz13 on 7 Dec 2017

I like the idea of having a list literal in powershell. I think it could have a syntax like @[1, 2, 3] to directly create a list with elements 1, 2 and 3 without first create an array literal from 1,2,3 and then make it a list using @[].

daxian-dbw on 7 Dec 2017

👍2

@lzybkr is it something like this?
1,2,3
(1,2,3).length
( , (1,2,3) ).length
( @(1,2,3) ).length

cchiu1979 on 7 Dec 2017

Sure, alias properties would be needed to make lists work just like arrays.

lzybkr on 7 Dec 2017

👍1

I've often wondered if the comma operator could create a list instead of an array.

If that's not considered too much of a breaking change, it would certainly be the best solution.

Otherwise:

@rkeithhill:

I get what you're saying about large lists, but that's where += comes in as a convenient syntax for appending to the list (calling ::Add() or ::AddRange() on the [System.Collections.ArrayList] or [System.Collections.Generic.List[object]] instance behind the scenes - unlike today's behavior of +=, which either silently recreates the variable content as an _array_ or, if the variable was type-constrained, as a _new instance_).

In other words: something like the following would make sense:

$al = @[] # simpler than: [System.Collections.ArrayList]::new()

for ($i = 0; $i -lt 1000; ++$i) {
  $al += $i # simpler than: $null = $al.Add($i)
}

mklement0 on 15 Dec 2017

👍3

I submitted two PRs (WIP) with different designs for the List support in PowerShell:

#5762 -- Support @[], similar to @()
#5761 -- Support ListLiteralExpression '[]', similar to ArrayLiteralExpression

@[] is my first design. However, I ran into a blocking issue regarding the closing bracket character ']'. Quoted from #5762:

@[] has a SubExpression like '@()' and '$()'. However, unlike the closing parenthesis character ')', the closing bracket character ']' doesn't always force to start a new token, and it can be included in a generic token, meaning that ']' can appear in a command name, argument, or function name. This makes it impossible for @[dir] to determine the ending of the list expression because dir] will be treated as a single generic token.

This PR adds the property InListSubExpression to Tokenizer, and makes ']' a force-to-start-new-token character when _tokenizer.InListSubExpression is set. This approach solves the most common UX problem but is by no way perfect, for example, comparing to @(funcHas[]inName) or @(dir has[]inpath), @[funcHas[]inName] and @[dir has[]inpath] won't work because the first ']' will force the command name to end.

Without breaking change, I think the best we can do is probably to make ']' a force-to-start-new-token character when parsing a command invocation pipeline in @[] but not when parsing any nested expression or statement within the @[].

At the same time, I started to think an alternative -- add ListLiteralExpression like the ArrayLiteralExpression. In that case, a list can only contain Expression elements and hence command name, arguments, and function names won't be a problem for the ending bracket. PR #5761 is for that design, where we use '[]' (same token pair as TypeConstraint and Attribute).

I hope the those 2 PRs can draw more discussion on the design.

daxian-dbw on 30 Dec 2017

👍3

@daxian-dbw since this code is beyond my understanding, does it attempt to create a strongly typed list, or is always List<Object>?

markekraus on 30 Dec 2017

@markekraus It attempts to always create List<object>, like @() ways create an object[].

daxian-dbw on 30 Dec 2017

@lzybkr proposed to use new token pairs instead of @[] to represent a ListExpression in https://github.com/PowerShell/PowerShell/pull/5762#issuecomment-354512131:

You could consider 2 character tokens.
For example, F# uses this syntax for an array literal:

f# [| 1; 2 |]
There are other possibilities that probably aren't breaking changes, e.g. [< 1, 2 >].
The key here is to use a second character that can't be in a command name.

It would be great to have @[] to represent ListExpression, but I'm fine with new token pairs. ~I will prototype with [<>]~. [<>] won't work because '>]' is allowed in a generic token. [| .. |] may work. If new token pairs are acceptable, I definitely prefer ListExpression over ListLiteral.

daxian-dbw on 30 Dec 2017

👍1

@daxian-dbw: I'm really glad to see you take this on, but before we go any further with the syntax debate:

Is the consensus that we _cannot_ just simply switch ,, the array construction operator, to an array-list/generic-list implementation behind the scenes, as @lzybkr hinted at - for reasons of backward compatibility?

The answer may well be that yes, it's too risky to make that change (I personally cannot tell), but if it happens to be no, after all, there's no need for a syntax debate.

mklement0 on 30 Dec 2017

@mklement0 IMHO, there would be 3 problems if we simply change the comma operator ',' to return a list:

The AST type name ArrayLiteralAst would be inconsistent, but changing it would be a huge breaking change. There would be other breaking changes like the returned value of StaticType property, but the AST type name would be the most problematic one I guess.
With the comma operator, we wouldn't be able to create an empty list.
The comma operator only takes Expression elements, not arbitrary statements like @() does, for example, comparing to @(dir), you would have to use ,(dir). Besides, the comma operator doesn't unwrap the Expression value because it's literal (ArrayLiteralAst). So ,(dir) would return a one-element list that contains an object array.

I prefer a ListExpressionAst '@[]' over a ListLiteralAst '[]' because of the 3rd one above.

daxian-dbw on 31 Dec 2017

@daxian-dbw Thanks for great prototypes!
I'd prefer @[] if it would possible to implement. I very wonder to see something like[| ... |] - if we haven't another way I'd rather see simple List( ... ) or [List]1,2,3.

If we have problem with last ] in@[] could we use @[ 1, 2, 3 ]@ like multiline string literals?
@@[] don't resolve the problem.
We could reuse parentheses with other prefix - if @() array, $() singletion then %() or &() or *() - list.

I personally like *().

iSazonov on 1 Jan 2018

*() would be somewhat ambiguous. Should 5*(Get-Random) throw a ~RuntimeException for missing op_Multiply on List~ a CommandNotFoundException or should it multiply a random number by 5?

markekraus on 1 Jan 2018

👍3

'%(1)' is parsed into a CommandAst today, where '%' is the command name (foreach-object), and the argument is (1).
'&(1)' is parsed into a CommandAst today, where '&' is the invocation operator and the command name is (1).
'*()' is also ambiguous, as @markekraus pointed out.

daxian-dbw on 2 Jan 2018

Minor correction: %{} would be the foreach-obejct. %() is ambiguous with modulo. e.g 5%(Get-Random -Minimum 1 -Maximum 5)

Also, @@[] would possibly be problematic for extended splat literals (if they ever make their way out of RFC).

outside of the literals.. I like the idea of Lists getting an accelerator, but only if it works similar to using namespace System.Collections.Generic making $MyList = [List[MyClass]]::New() easier. I would not like a [List] accelerator without the ability to set the type unless if could play nice and create List<Object> by default but still allow creating lists of a desired type.

markekraus on 2 Jan 2018

I have only one question - where I can buy Unicode keyboard with 32000 buttons to replace my 102 keyboard? 😄

We could combine the accelerator idea and list literals:

@[int](1,2,3)
@[string](dir C:\)
@[](1,2,3) as short cut of @[object](1,2,3)

iSazonov on 2 Jan 2018

@daxian-dbw: Thanks for the detailed feedback.

I can't speak to 1. (AST names), but perhaps the answer is to _special-case_ @() for the @(<empty-or-scalar-or-array-literal>) cases, such as @(), @(3), or @(1, 2, 3) (note that @(<array-literal>) already _is_ special-cased - see #4280), while leaving any @() that involves a _command_ and/or _multiple statements_ to work as it does now.

The alternative is to simply make @() _always_ return a list. This has the advantage of allowing the definition of lists as a series of individual expression statements (defining an element each), obviating the need for , in _multiline_ definitions (in which case the line breaks take the place of the _statement_-separating ;). The down-side is that lists would be created in many situations where an array will do; while @(Get-ChildItem) is more convenient than @((Get-ChildItem)), creating a list in such a case strikes me as less important.

Again, it might be too risky, but it would solve the syntax problem.

That said, that alone wouldn't address the desire for explicit typing.

Perhaps the special casing could be tweaked to translate something like
@([string[]] (...)) into a List<string> instance.

The need for the inner (...) - due to operator precedence - makes this slightly awkward, however, and forgetting them can easily go unnoticed, because you quietly get [object[]].

On the other hand, explicit typing is a more advanced use case, and optimizing for the typical case is arguably more important.

mklement0 on 5 Jan 2018

The alternative is to simply make @() always return a list.

I talked to @jpsnover about this today and he also brought up changing the semantic of @() to return a list. The down-side is:

the AST type name ArrayExpressionAst being inconsistent with the semantic and StaticType property.
list is created in some situation you need an array, but powershell can convert List<object> to object[] implicitly, so this might not be an issue.

For (1), could it be OK to have this inconsistency?

daxian-dbw on 5 Jan 2018

👍2

The Ast type name doesn't matter that much.

There are many examples outside of PowerShell where the name can be misleading - ArrayList is a good one.

Lua is another good example - quoting from here.

Tables in Lua are not a data structure; they are the data structure. All structures that other languages offer---arrays, records, lists, queues, sets---are represented with tables in Lua. More to the point, tables implement all these structures efficiently.

lzybkr on 5 Jan 2018

👍3

but powershell can convert List to object[] implicitly, so this might not be an issue.

If $a = @(1, 2, 3) define List then I'd expect that $a = $a + 4 or $a += 4 don't convert List to Array. We could $a.ToArray(). In the case we should add magic ToArray() to arrays too as we add magic Count, Length, Where() and ForEach().
Also I expect many customers will ask about typed lists like [int]@(1, 2, 3) or @[int](1, 2, 3).

iSazonov on 5 Jan 2018

@iSazonov:

If $a = @(1, 2, 3) define List then I'd expect that $a = $a + 4 or $a += 4 don't convert List to Array.

Actually, I would expect that to work with instances of _any type that implements the IList interface_ and therefore has an .Add(Object) method - irrespective of this specific issue; see #5805

Also I expect many customers will ask about typed lists like [int]@(1, 2, 3) or @int

While slightly awkward, as discussed, @([int[]] (1,2,3)) has the advantage of not introducing new syntax (only new semantics).

mklement0 on 6 Jan 2018

Hi.

It is a very big breaking change to not only addition of ListLiteralExpression([]) but also change the behavior of @().
I think we should issue RFC, need more open discussion, need documentation of the specification.

stknohg on 11 Jan 2018

I found related RFC(Rejected).

RFC0014-Language enhancements for collections

stknohg on 11 Jan 2018

@PowerShell/powershell-committee reviewed this. Based on the feedback, we will accept RFC0014 for [list] accelerator to be system.collections.generics.list of [object]. Concern for adopting @() for list is breaking change impact due to nuances between arrays and arraylist/list. We can always revisit this in the future based on additional user feedback.

SteveL-MSFT on 18 Jan 2018

A new type accelerator might not be a good idea. Consider:

using namespace System.Collections.Generic [list]$x = $null [list[string]] = $null

This is confusing. E.g., The following won't work, but some people might expect it to:

param([type]$t) [list].MakeGenericType($t) # Create generic types like [list[string]]

You would also want to better specify how type lookup works. I think it will just work if you add the accelerator, but it will be confusing as to why - because type accelerators take precedence over everything else. The magic that makes it work is if there are generic arguments, we ignore a type found that has no generic arguments and instead look for another type with `1 or the number of generic arguments appended, so we'd ignore List[object] that would be found and continue looking for List`1.

lzybkr on 18 Jan 2018

@lzybkr Given that there's no advantage to using generics with object-element collections (or am I missing something?), [list] could refer to [System.Collections.ArrayList] (arguably, ArrayList should have been List all along, following the example of other non-generic/generic type pairs).

mklement0 on 19 Jan 2018

It is regrettable that v6 has shipped already - given the numerous breaking changes in v6, switching @() to creating variable-sized collections would have been but another - but one with presumably manageable real-world impact while providing great automatic benefits - assuming that += is also changed to update in place.

mklement0 on 19 Jan 2018

👍1

@mklement0 - Strongly typed collections do provide roughly the same value in PowerShell as they do in any other .Net language.

I say roughly because PowerShell does many more conversions than other languages, so for example, adding to a List[string] will work for almost any object because PowerShell would call ToString() if there isn't a better conversion available. But a List[ProcessInfo] collection will most likely throw an error if you try to add something other than a ProcessInfo.

And of course consumers of the collection can be certain about the type of every item in the generic collection.

It seems somewhat safe to assume enhancements to List<T> happen before ArrayList, so that is one argument against ArrayList. Another is the annoying non-void return from ArrayList.Add.

lzybkr on 20 Jan 2018

👍1

@lzybkr:

Strongly typed collections do provide roughly the same value in PowerShell as they do in any other .Net language.

Yes, but my point was that specifically when using generics with System.Object elements - the root of the object hierarchy - the advantages of generics are a moot point - save for the potential future enhancements to the types you mention.

Couldn't we make [List] refer to [System.Collections.ArrayList] while also providing accelerators such as [List[int]] for [System.Collections.Generic.List[int]]?

Another is the annoying non-void return from ArrayList.Add.

That is indeed annoying, but by making += call .Add() behind the scenes (and not emitting the return value) the problem mostly goes away.

mklement0 on 20 Jan 2018

Sure, [list] could mean [ArrayList], but I'm not seeing a benefit compared to List[object] - only drawbacks.

And I do think folks will still use .Add. Intellisense will suggest it and many folks won't use += because:

lack of awareness

habit

being conservative (carryover from being told to not use += on arrays)

I also think it would be weird that some references to list are ArrayList and others List<T> - another drawback in my book.

lzybkr on 20 Jan 2018

The reason I'm suggesting ArrayList is to avoid the awkwardness of having a type literal - [list] - that _looks_ like it's non-generic while actually being generic.

That said, perhaps that's not really a problem in practice: I don't think that anyone will truly expect something like [list].MakeGenericType($t) to work, ~~given that PowerShell has no literal representation for _uninstantiated_ generic types~~.

mklement0 on 20 Jan 2018

One can specify a generic type without type arguments by using it's real name, e.g.: [System.Collections.Generic.List`1] or [System.Collections.Generic.Dictionary`2]

lzybkr on 20 Jan 2018

@lzybkr: That's good to know, thanks - I had no idea.

In light of that, it's even less likely that someone would expect [list].MakeGenericType($t) to work:

If they know that [list] is an _instantiated_ generic type, they'll know that an intervening .GetGenericTypeDefinition() call is required

If they mistakenly believe that [list] is non-generic, they shouldn't expect .MakeGenericType() to work at all.

mklement0 on 20 Jan 2018

Please note that List[T] provides a Foreach method, so our magic method Foreach will not work without some additional work. This may break scripts that work with array, but break with List<T>.

powercode on 22 Jan 2018

👍2

Changing the semantics of any of the existing operators would be an unacceptable breaking change as described in bucket 1 in the Breaking Change Contract. New infix operators in expression mode would be fine. Prefix operators might fall into the acceptable breaking change buckets.

@lzybkr In fact I always thought of 1, 2, 3 as (cons 1 2 3) . Changing ',' would be breaking but we could use a new ','-like operator such as :: to build lists e.g. 1 :: 2 :: 3. That said, for creating new kinds of list-like objects, we could simply use commands

$list = New-List 1 2 3 4 $complexList = New-List 1 (New-List 2 3) 4

Anyway, having a REAL list (car/cdr) type would probably have significant perf advantages since the most common scenario for collections seems to be streaming/enumerating the collection. And it would be nice to make

$head, $list = $list

efficient.

For the "core" PowerShell types ("numbers", strings, arrays and hashtables) the + - * \' operators are magical. For other types, we look forop_Addition(),op_Subtraction(), etc. This is why [datetime] addition works:

PS[1] (31) > [datetime]::now + 1gb Thursday, March 29, 2018 4:23:58 PM

The .NET framework designers chose not to implement these methods, favoring Add() instead of an operator. We (or others) could add our own collection type that implemented the op_* methods and then the operators would just work.

@mklement0 ArrayList is a horrible, horrible type. It's methods uselessly return values which lead to all sorts of bugs in PowerShell scripts:

PS[1] (32) > $al = [System.Collections.ArrayList]::new() PS[1] (33) > $al.Add(1) 0 PS[1] (34) > $al.Add(2) 1 PS[1] (35) > foreach ($i in 1..5) { $al.Add($i) } # would reasonably expect this to return nothing but! 2 3 4 5 6

@powercode

Please not that List[T] provides a Foreach method, so our magic method

Yeah. That's a problem. Especially since it looks like it kinda works.

@TravisEz13 @rkeithhill @lzybkr @daxian-dbw @powercode @mklement0 This discussion goes well beyond the scope of an issue (so not just a bug). I propose closing the issue in favour of the existing RFC and possibly resurrecting it from the rejected bucket. Thought?

BrucePay on 30 Mar 2018

👍1

Agreed, there should be a concrete RFC to discuss, though I'd rather see performance improved in many other areas before introducing some new syntax to possibly improve performance of some scripts.

lzybkr on 30 Mar 2018

👍1

@lzybkr Yeah - most of this is just syntactic sugar for stuff you can already do. Do you have a list of where you think the effort should go?

BrucePay on 30 Mar 2018

@BrucePay:

The .NET framework designers chose not to implement these methods, favoring Add() instead of an operator. We (or others) could add our own collection type that implemented the op_* methods and then the operators would just work.

Please see my suggestion here, which proposes combining @PetSerAl's efficiently extensible collection type with op_*() methods to preserve familiar += semantics, for instance.

ArrayList is a horrible, horrible type

Well, that's a fair point. Let's get rid it of it, then :)

Overall, I think what's worth considering is to _replace arrays [object[]] as the fundamental PowerShell collection type with an efficiently extensible collection type_, so that no one has to think about collection types, their constructors and methods in the vast majority of scenarios, while still getting automatic performance improvements (efficient use of +=, PowerShell-internally no need to convert to a different collection type on output).

While that may not be as breaking as one might think, it certainly is a change so fundamental that it exceeds even the scope of a regular RFC, so what I'm wondering is: where is the place, if any, to discuss such PowerShell "vNext" proposals? A special class of RFCs?

mklement0 on 4 Apr 2018

I agree with @mklement0 that more efficient Collections need a path to becoming the default especially when you are calling REST APIs that return a lot of data. Maybe after the Experimental Flag RFC is done, the work can be moved under the Flag so that breaking changes can be found and worked around.

dragonwolf83 on 5 Apr 2018

👍2

At the very least I think the pipeline should by default emit an ArrayList or List

PowerShell should support creating an List similar to how it supports arrays

Most helpful comment

All 56 comments

Related issues