Powershell supports creating arrays with $array = 'a', 1, '3'
. Then you can add an element to the array with $array += 4
, but this creates a new array which is not performant.
Powershell should have a syntax which allows creating lists.
Assuming the operator is @[...]
, you could create a list with $list = @['a', 1, '3']
and then you could add an element to the existing list with $list += 4
without PowerShell having to create a new list.
Note: this new operator might function more like @(...)
This design assumes that changing ,
would be a breaking change. I'm open to discussing changing ,
as well.
I filed this based on an offline discussion about this comment on a PR: https://github.com/PowerShell/PowerShell/pull/5625#discussion_r155106230
I like the idea but to truly impact performance you'd need to be operating on large lists. For a convenient way to create large lists, I would expect something like this to work @[Get-ChildItem C:\Windows -r -file *.dll -ea 0]
. While the list literal form is nice, I don't see folks creating lists large enough to gain much of a perf benefit over using an array. Well, unless the list literal is created inside a busy (large n) loop.
Two points:
@()
says - make sure the thing inside is an array. It's not necessary if you use ,
because the comma operator always creates an array.@lzybkr I updated my description based on @lzybkr 's comments
I like the idea of having a list literal in powershell. I think it could have a syntax like @[1, 2, 3]
to directly create a list with elements 1, 2 and 3 without first create an array literal from 1,2,3
and then make it a list using @[]
.
@lzybkr is it something like this?
1,2,3
(1,2,3).length
( , (1,2,3) ).length
( @(1,2,3) ).length
Sure, alias properties would be needed to make lists work just like arrays.
I've often wondered if the comma operator could create a list instead of an array.
If that's not considered too much of a breaking change, it would certainly be the best solution.
Otherwise:
@rkeithhill:
I get what you're saying about large lists, but that's where +=
comes in as a convenient syntax for appending to the list (calling ::Add()
or ::AddRange()
on the [System.Collections.ArrayList]
or [System.Collections.Generic.List[object]]
instance behind the scenes - unlike today's behavior of +=
, which either silently recreates the variable content as an _array_ or, if the variable was type-constrained, as a _new instance_).
In other words: something like the following would make sense:
$al = @[] # simpler than: [System.Collections.ArrayList]::new()
for ($i = 0; $i -lt 1000; ++$i) {
$al += $i # simpler than: $null = $al.Add($i)
}
I submitted two PRs (WIP) with different designs for the List support in PowerShell:
@[]
, similar to @()
ListLiteralExpression '[]'
, similar to ArrayLiteralExpression
@[]
is my first design. However, I ran into a blocking issue regarding the closing bracket character ']'
. Quoted from #5762:
@[]
has aSubExpression
like'@()'
and'$()'
. However, unlike the closing parenthesis character')'
, the closing bracket character']'
doesn't always force to start a new token, and it can be included in a generic token, meaning that']'
can appear in a command name, argument, or function name. This makes it impossible for@[dir]
to determine the ending of the list expression becausedir]
will be treated as a single generic token.This PR adds the property
InListSubExpression
toTokenizer
, and makes']'
a force-to-start-new-token character when_tokenizer.InListSubExpression
is set. This approach solves the most common UX problem but is by no way perfect, for example, comparing to@(funcHas[]inName)
or@(dir has[]inpath)
,@[funcHas[]inName]
and@[dir has[]inpath]
won't work because the first']'
will force the command name to end.
Without breaking change, I think the best we can do is probably to make ']'
a force-to-start-new-token character when parsing a command invocation pipeline in @[]
but not when parsing any nested expression or statement within the @[]
.
At the same time, I started to think an alternative -- add ListLiteralExpression
like the ArrayLiteralExpression
. In that case, a list can only contain Expression
elements and hence command name, arguments, and function names won't be a problem for the ending bracket. PR #5761 is for that design, where we use '[]'
(same token pair as TypeConstraint
and Attribute
).
I hope the those 2 PRs can draw more discussion on the design.
@daxian-dbw since this code is beyond my understanding, does it attempt to create a strongly typed list, or is always List<Object>
?
@markekraus It attempts to always create List<object>
, like @()
ways create an object[]
.
@lzybkr proposed to use new token pairs instead of @[]
to represent a ListExpression
in https://github.com/PowerShell/PowerShell/pull/5762#issuecomment-354512131:
You could consider 2 character tokens.
For example, F# uses this syntax for an array literal:
f# [| 1; 2 |]
There are other possibilities that probably aren't breaking changes, e.g.[< 1, 2 >]
.
The key here is to use a second character that can't be in a command name.
It would be great to have @[]
to represent ListExpression
, but I'm fine with new token pairs. ~I will prototype with [<>]
~. [<>]
won't work because '>]'
is allowed in a generic token. [| .. |]
may work. If new token pairs are acceptable, I definitely prefer ListExpression
over ListLiteral
.
@daxian-dbw: I'm really glad to see you take this on, but before we go any further with the syntax debate:
Is the consensus that we _cannot_ just simply switch ,
, the array construction operator, to an array-list/generic-list implementation behind the scenes, as @lzybkr hinted at - for reasons of backward compatibility?
The answer may well be that yes, it's too risky to make that change (I personally cannot tell), but if it happens to be no, after all, there's no need for a syntax debate.
@mklement0 IMHO, there would be 3 problems if we simply change the comma operator ','
to return a list:
ArrayLiteralAst
would be inconsistent, but changing it would be a huge breaking change. There would be other breaking changes like the returned value of StaticType
property, but the AST type name would be the most problematic one I guess.Expression
elements, not arbitrary statements like @()
does, for example, comparing to @(dir)
, you would have to use ,(dir)
. Besides, the comma operator doesn't unwrap the Expression
value because it's literal (ArrayLiteralAst
). So ,(dir)
would return a one-element list that contains an object array.I prefer a ListExpressionAst '@[]'
over a ListLiteralAst '[]'
because of the 3rd one above.
@daxian-dbw Thanks for great prototypes!
I'd prefer @[]
if it would possible to implement. I very wonder to see something like[| ... |]
- if we haven't another way I'd rather see simple List( ... )
or [List]1,2,3.
]
in@[]
could we use @[ 1, 2, 3 ]@
like multiline string literals?@@[]
don't resolve the problem.@()
array, $()
singletion then %()
or &()
or *()
- list.I personally like *()
.
*()
would be somewhat ambiguous. Should 5*(Get-Random)
throw a ~RuntimeException for missing op_Multiply
on List
~ a CommandNotFoundException or should it multiply a random number by 5?
'%(1)'
is parsed into a CommandAst
today, where '%'
is the command name (foreach-object), and the argument is (1)
.
'&(1)'
is parsed into a CommandAst
today, where '&'
is the invocation operator and the command name is (1)
.
'*()'
is also ambiguous, as @markekraus pointed out.
Minor correction: %{}
would be the foreach-obejct. %()
is ambiguous with modulo. e.g 5%(Get-Random -Minimum 1 -Maximum 5)
Also, @@[]
would possibly be problematic for extended splat literals (if they ever make their way out of RFC).
outside of the literals.. I like the idea of Lists getting an accelerator, but only if it works similar to using namespace System.Collections.Generic
making $MyList = [List[MyClass]]::New()
easier. I would not like a [List]
accelerator without the ability to set the type unless if could play nice and create List<Object>
by default but still allow creating lists of a desired type.
I have only one question - where I can buy Unicode keyboard with 32000 buttons to replace my 102 keyboard? 馃槃
We could combine the accelerator idea and list literals:
@[int](1,2,3)
@[string](dir C:\)
@[](1,2,3) as short cut of @[object](1,2,3)
@daxian-dbw: Thanks for the detailed feedback.
I can't speak to 1. (AST names), but perhaps the answer is to _special-case_ @()
for the @(<empty-or-scalar-or-array-literal>)
cases, such as @()
, @(3)
, or @(1, 2, 3)
(note that @(<array-literal>)
already _is_ special-cased - see #4280), while leaving any @()
that involves a _command_ and/or _multiple statements_ to work as it does now.
The alternative is to simply make @()
_always_ return a list. This has the advantage of allowing the definition of lists as a series of individual expression statements (defining an element each), obviating the need for ,
in _multiline_ definitions (in which case the line breaks take the place of the _statement_-separating ;
). The down-side is that lists would be created in many situations where an array will do; while @(Get-ChildItem)
is more convenient than @((Get-ChildItem))
, creating a list in such a case strikes me as less important.
Again, it might be too risky, but it would solve the syntax problem.
That said, that alone wouldn't address the desire for explicit typing.
Perhaps the special casing could be tweaked to translate something like
@([string[]] (...))
into a List<string>
instance.
The need for the inner (...)
- due to operator precedence - makes this slightly awkward, however, and forgetting them can easily go unnoticed, because you quietly get [object[]]
.
On the other hand, explicit typing is a more advanced use case, and optimizing for the typical case is arguably more important.
The alternative is to simply make @() always return a list.
I talked to @jpsnover about this today and he also brought up changing the semantic of @()
to return a list. The down-side is:
ArrayExpressionAst
being inconsistent with the semantic and StaticType
property.List<object>
to object[]
implicitly, so this might not be an issue.For (1), could it be OK to have this inconsistency?
The Ast type name doesn't matter that much.
There are many examples outside of PowerShell where the name can be misleading - ArrayList
is a good one.
Lua is another good example - quoting from here.
Tables in Lua are not a data structure; they are the data structure. All structures that other languages offer---arrays, records, lists, queues, sets---are represented with tables in Lua. More to the point, tables implement all these structures efficiently.
but powershell can convert List
If $a = @(1, 2, 3)
define List
then I'd expect that $a = $a + 4
or $a += 4
don't convert List
to Array
. We could $a.ToArray()
. In the case we should add magic ToArray()
to arrays too as we add magic Count
, Length
, Where()
and ForEach()
.
Also I expect many customers will ask about typed lists like [int]@(1, 2, 3)
or @[int](1, 2, 3)
.
@iSazonov:
If $a = @(1, 2, 3) define List then I'd expect that $a = $a + 4 or $a += 4 don't convert List to Array.
Actually, I would expect that to work with instances of _any type that implements the IList
interface_ and therefore has an .Add(Object)
method - irrespective of this specific issue; see #5805
Also I expect many customers will ask about typed lists like [int]@(1, 2, 3) or @int
While slightly awkward, as discussed, @([int[]] (1,2,3))
has the advantage of not introducing new syntax (only new semantics).
Hi.
It is a very big breaking change to not only addition of ListLiteralExpression
([]
) but also change the behavior of @()
.
I think we should issue RFC, need more open discussion, need documentation of the specification.
I found related RFC(Rejected).
@PowerShell/powershell-committee reviewed this. Based on the feedback, we will accept RFC0014 for [list] accelerator to be system.collections.generics.list of [object]. Concern for adopting @() for list is breaking change impact due to nuances between arrays and arraylist/list. We can always revisit this in the future based on additional user feedback.
A new type accelerator might not be a good idea. Consider:
using namespace System.Collections.Generic
[list]$x = $null
[list[string]] = $null
This is confusing. E.g., The following won't work, but some people might expect it to:
param([type]$t)
[list].MakeGenericType($t) # Create generic types like [list[string]]
You would also want to better specify how type lookup works. I think it will just work if you add the accelerator, but it will be confusing as to why - because type accelerators take precedence over everything else. The magic that makes it work is if there are generic arguments, we ignore a type found that has no generic arguments and instead look for another type with `1
or the number of generic arguments appended, so we'd ignore List[object]
that would be found and continue looking for List`1
.
@lzybkr Given that there's no advantage to using generics with object
-element collections (or am I missing something?), [list]
could refer to [System.Collections.ArrayList]
(arguably, ArrayList
should have been List
all along, following the example of other non-generic/generic type pairs).
It is regrettable that v6 has shipped already - given the numerous breaking changes in v6, switching @()
to creating variable-sized collections would have been but another - but one with presumably manageable real-world impact while providing great automatic benefits - assuming that +=
is also changed to update in place.
@mklement0 - Strongly typed collections do provide roughly the same value in PowerShell as they do in any other .Net language.
I say roughly because PowerShell does many more conversions than other languages, so for example, adding to a List[string]
will work for almost any object because PowerShell would call ToString()
if there isn't a better conversion available. But a List[ProcessInfo]
collection will most likely throw an error if you try to add something other than a ProcessInfo
.
And of course consumers of the collection can be certain about the type of every item in the generic collection.
It seems somewhat safe to assume enhancements to List<T>
happen before ArrayList
, so that is one argument against ArrayList
. Another is the annoying non-void return from ArrayList.Add
.
@lzybkr:
Strongly typed collections do provide roughly the same value in PowerShell as they do in any other .Net language.
Yes, but my point was that specifically when using generics with System.Object
elements - the root of the object hierarchy - the advantages of generics are a moot point - save for the potential future enhancements to the types you mention.
Couldn't we make [List]
refer to [System.Collections.ArrayList]
while also providing accelerators such as [List[int]]
for [System.Collections.Generic.List[int]]
?
Another is the annoying non-void return from
ArrayList.Add
.
That is indeed annoying, but by making +=
call .Add()
behind the scenes (and not emitting the return value) the problem mostly goes away.
Sure, [list]
could mean [ArrayList]
, but I'm not seeing a benefit compared to List[object]
- only drawbacks.
And I do think folks will still use .Add
. Intellisense will suggest it and many folks won't use +=
because:
I also think it would be weird that some references to list
are ArrayList
and others List<T>
- another drawback in my book.
The reason I'm suggesting ArrayList
is to avoid the awkwardness of having a type literal - [list]
- that _looks_ like it's non-generic while actually being generic.
That said, perhaps that's not really a problem in practice: I don't think that anyone will truly expect something like [list].MakeGenericType($t)
to work, given that PowerShell has no literal representation for _uninstantiated_ generic types.
One can specify a generic type without type arguments by using it's real name, e.g.: [System.Collections.Generic.List`1]
or [System.Collections.Generic.Dictionary`2]
@lzybkr: That's good to know, thanks - I had no idea.
In light of that, it's even less likely that someone would expect [list].MakeGenericType($t)
to work:
If they know that [list]
is an _instantiated_ generic type, they'll know that an intervening .GetGenericTypeDefinition()
call is required
If they mistakenly believe that [list]
is non-generic, they shouldn't expect .MakeGenericType()
to work at all.
Please note that List[T]
provides a Foreach
method, so our magic method Foreach
will not work without some additional work. This may break scripts that work with array, but break with List<T>
.
Changing the semantics of any of the existing operators would be an unacceptable breaking change as described in bucket 1 in the Breaking Change Contract. New infix operators in expression mode would be fine. Prefix operators might fall into the acceptable breaking change buckets.
@lzybkr In fact I always thought of 1, 2, 3
as (cons 1 2 3)
. Changing ',' would be breaking but we could use a new ','-like operator such as ::
to build lists e.g. 1 :: 2 :: 3
. That said, for creating new kinds of list-like objects, we could simply use commands
$list = New-List 1 2 3 4
$complexList = New-List 1 (New-List 2 3) 4
Anyway, having a REAL list (car/cdr) type would probably have significant perf advantages since the most common scenario for collections seems to be streaming/enumerating the collection. And it would be nice to make
$head, $list = $list
efficient.
For the "core" PowerShell types ("numbers", strings, arrays and hashtables) the + - * \' operators are magical. For other types, we look for
op_Addition(),
op_Subtraction(), etc. This is why [datetime]
addition works:
PS[1] (31) > [datetime]::now + 1gb
Thursday, March 29, 2018 4:23:58 PM
The .NET framework designers chose not to implement these methods, favoring Add()
instead of an operator. We (or others) could add our own collection type that implemented the op_* methods and then the operators would just work.
@mklement0 ArrayList
is a horrible, horrible type. It's methods uselessly return values which lead to all sorts of bugs in PowerShell scripts:
PS[1] (32) > $al = [System.Collections.ArrayList]::new()
PS[1] (33) > $al.Add(1)
0
PS[1] (34) > $al.Add(2)
1
PS[1] (35) > foreach ($i in 1..5) { $al.Add($i) } # would reasonably expect this to return nothing but!
2
3
4
5
6
@powercode
Please not that List[T] provides a Foreach method, so our magic method
Yeah. That's a problem. Especially since it looks like it kinda works.
@TravisEz13 @rkeithhill @lzybkr @daxian-dbw @powercode @mklement0 This discussion goes well beyond the scope of an issue (so not just a bug). I propose closing the issue in favour of the existing RFC and possibly resurrecting it from the rejected bucket. Thought?
Agreed, there should be a concrete RFC to discuss, though I'd rather see performance improved in many other areas before introducing some new syntax to possibly improve performance of some scripts.
@lzybkr Yeah - most of this is just syntactic sugar for stuff you can already do. Do you have a list of where you think the effort should go?
@BrucePay:
The .NET framework designers chose not to implement these methods, favoring Add() instead of an operator. We (or others) could add our own collection type that implemented the op_* methods and then the operators would just work.
Please see my suggestion here, which proposes combining @PetSerAl's efficiently extensible collection type with op_*()
methods to preserve familiar +=
semantics, for instance.
ArrayList is a horrible, horrible type
Well, that's a fair point. Let's get rid it of it, then :)
Overall, I think what's worth considering is to _replace arrays [object[]]
as the fundamental PowerShell collection type with an efficiently extensible collection type_, so that no one has to think about collection types, their constructors and methods in the vast majority of scenarios, while still getting automatic performance improvements (efficient use of +=
, PowerShell-internally no need to convert to a different collection type on output).
While that may not be as breaking as one might think, it certainly is a change so fundamental that it exceeds even the scope of a regular RFC, so what I'm wondering is: where is the place, if any, to discuss such PowerShell "vNext" proposals? A special class of RFCs?
I agree with @mklement0 that more efficient Collections need a path to becoming the default especially when you are calling REST APIs that return a lot of data. Maybe after the Experimental Flag RFC is done, the work can be moved under the Flag so that breaking changes can be found and worked around.
At the very least I think the pipeline should by default emit an ArrayList or List
Since we now support Experimental Feature flags, someone could do this work without breaking existing users and we can discuss when/if it goes from Experimental to a stable feature.
Low hanging fruit here but I'd be happy with an easier way to create list objects i.e. it would be nice to have a type accelerator for List[list]
(defaults toList<PSObject>
) and [list[typename]]
. Wanted to capture that here since my issue #9853 requesting this was closed as a duplicate of this issue (not sure it's exactly the same but close enough I suppose).
I'm late to this conversation, but so far in PowerShell, square brackets are always used for indices. For that reason I personally don't like @[]
as an enclosure.
Why not add a character to the enclosure prefix to make it a list? For example, this is visually representative of a (bulleted) list:
$x = :@(2,3,4)
I suppose we could nickname it the angry list as well, because of the relation to the angry emoji. Makes the syntax easy to remember. If you're angry because arrays are slow to add values to, change your enclosure prefix to the angry emoji and you'll get a nice performance increase. 馃槤
In terms of that being a breaking change (because :@
could be a command name), isn't that only breaking when both of the following are true?
I'm asking because that seems to dramatically reduce the likelihood of that causing an issue, given the obscurity of the command name and the fact that most people invoke commands in PowerShell without round brackets.
Also, while you can pass parameter values in following a colon, such as Get-Process -Id:$PID
, that syntax is primarily used to pass in $false
to switch parameters, and if you did do something like Get-Process -Id:@(1,2,3)
, that would unambiguously evaluate to passing in an array with values 1, 2, and 3 into the -Id
parameter of Get-Process
, would it not? That's my understanding, in which case I think this isn't an issue either. Note that Get-Process -Id@(1,2,3)
does not parse because PowerShell treats @
as if it is part of an -Id@
parameter name, so the difference between using :
before a parameter and using :@(...)
as a value seems pretty clear.
On a related note, I submitted this RFC yesterday for new enclosures for easy concurrent collections so that thread-safe collections could be used with some of recent and future multithreaded additions to PowerShell.
This is a most intriguing error:
PS> :@(1,2,3,4)
:@ : Cannot find drive. A drive with the name 'get-' does not exist.
At line:1 char:1
+ :@(1,2,3,4)
+ ~~
+ CategoryInfo : ObjectNotFound: (get-:String) [], DriveNotFoundException
+ FullyQualifiedErrorId : DriveNotFound
An alternative which I feel would much further reduce ambiguity with passing parameters that way might be @:(1, 2, 3)
which also currently results in a parse error (actually several):
PS> @:(1,2,3,4)
At line:1 char:1
+ @:(1,2,3,4)
+ ~~
Variable reference is not valid. '$' was not followed by a valid variable name character. Consider using ${} to delimit the name.
At line:1 char:3
+ @:(1,2,3,4)
+ ~
Unexpected token '(' in expression or statement.
At line:1 char:1
+ @:(1,2,3,4)
+ ~~
The splatting operator '@' cannot be used to reference variables in an expression. '@:' can be used only as an argument to a command. To reference variables in an expression use '$:'.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : InvalidVariableReference
I don't get the drive error you get. Are you testing that in a session that defines a CommandNotFoundAction
handler?
I suppose @:
could work too. That mucks up the reference to the angry emoji though. 馃槧 馃ぃ
I really like the notion that you can add a character to an enclosure prefix in your scripts and voil脿, they'll use a more efficient data structure. That would be a very low cost performance enhancement for some scripts if the data structure was implemented properly with operator support for things like +=
, etc.
Just to put another alternative on the table:
:(1,2,3,4)
That's shorter, but :
could be a command (still, it would only be a breaking change if someone had that as a command _and_ they invoked that command by passing arguments in using round brackets).
Nope, fresh PS7-preview1 session. 馃し鈥嶁檪
Yeah, could do, but then you lose the callback to @()
a bit and the meaning is a little less clear, I feel?
As Jason meantioned above lists is probably edge case for scripts - so no need to have a syntax suger for creating lists. Perhaps we could only enhance '+' (+=) operator to support lists and concurrent collections (other types?).
We could start with this and add syntax sugers later if we find compromise.
That's definitely a no-brainer; we need the +
/ +=
support for lists and similar.
The syntactic sugar would really be nice as well though 馃槉
Please consider adding support for +
/ +=
.
Even if you ignore the performance benefit this is more natural to use, for example I just did something like this and was surprised by the error:
$MyArrayList = [System.Collections.ArrayList]@(0, 1, 3, 4)
$MyArrayList += 5
$MyArrayList.Insert(2, 2) # Exception calling "Insert" with "2" argument(s): "Collection was of a fixed size."
I think we have an existing issue for that specifically: https://github.com/PowerShell/PowerShell/issues/5805
It came up again recently as a duplicate, but my comment there still stands: https://github.com/PowerShell/PowerShell/issues/13152#issuecomment-656790079
@daxian-dbw perhaps we can turn your @[]
implementation w/ addition operator support as an experimental feature? As part of this, we can make the breaking change so that ]
forces a new token as it seems like a bucket 3 breaking change and we can get real world feedback via experimental feature.
@SteveL-MSFT clarification point on that -- would @[]
become another subexpression operator in that case to match @()
and $()
or would it be more akin to ()
in that line breaks within it aren't permitted?
@vexx32 good question, I suppose it should probably match @()
so that hypothetically people could just search and replace in many cases as a replacement and get the benefits
Most helpful comment
Since we now support Experimental Feature flags, someone could do this work without breaking existing users and we can discuss when/if it goes from Experimental to a stable feature.