Note that the term _technical debt_ is used loosely here to mean "accumulated broken behavior that can't be fixed without breaking backward-compatibility"; strictly speaking the term has a different, specific meaning.
_Update_: @rjmholt has officially started a discussion about how to implement and manage breaking changes: #13129.
Note: This issue, which arose out of https://github.com/PowerShell/PowerShell/issues/5551#issuecomment-380522712, is just to get the discussion started to see if there's a fundamental willingness to entertain such changes. Eventually, RFC(s) are needed.
PowerShell's steadfast commitment to backward compatibility has served the community very well over the years.
On the flip side, the inevitable by-product was the accumulation of technical debt that requires memorizing exceptions and is a barrier to newcomers.
Certain fundamental problems that exist today can only be solved at the expense of backward compatibility, and there are two - not mutually exclusive - ways to handle that:
Implementation of a "PowerShell vZeroTechnicalDebt" edition that sheds all technical debt, but forfeits backward compatibility (possibly with future versions using semantic versioning to indicate compatibility)
Integration of backward-compatibility-breaking features / fixes into the existing code base, available strictly on an _opt-in_ basis only.
Which approaches, if any, are we willing to consider?
Once we have clarity on that, we can flesh out the processes.
Here are some of the existing fundamental problems that can only be solved at the expense of backward compatibility; I'm sure others will think of more:
The complexity and inconsistency of current error handling and problematic [lack of] integration with "native" (external) programs - #3996
Inconsistent, hard-to-predict preference-variable / common-parameter inheritance - #4568
Performance issues due to [object[]]
being the fundamental collection type - https://github.com/PowerShell/PowerShell/issues/5643#issuecomment-378467986
Problematic dynamic features that should work lexically: https://github.com/PowerShell/PowerShell/issues/3879#issuecomment-304940545 re break
and continue
, https://github.com/PowerShell/PowerShell-RFC/blob/master/1-Draft/RFC0003-Lexical-Strict-Mode.md re Set-StrictMode
(though the latter may be fixed without breaking compatibility)
[psobject]
-related problems (though perhaps they can be fixed without breaking compatibility): #5551, #5579, #4343, #5763
The unfortunate -LiteralPath
/ -Path
split - brief rationale at https://github.com/PowerShell/PowerShell/issues/6714#issuecomment-383992749 and escaping woes at https://github.com/PowerShell/PowerShell/issues/6714#issuecomment-384353966 (not sure there's a good solution)
Broken quoting for external programs - #3734, #5576 (and others) and the related languishing RFC
-Command
and -File
CLI argument parsing - https://github.com/PowerShell/PowerShell/issues/4024#issuecomment-311541803, #3223 - and general misalignment with the CLI of POSIX-like shells - #3743 - including user profiles getting loaded by default even in non-interactive (script) invocations - #992
Inconsistent and surprising parsing of compound command-line arguments - #6467 - and non-parameter tokens that _look_ like parameters - #6291, #6292, and #6360
Broken handling of ValueFromRemainingArguments
parameters, as argued in https://github.com/PowerShell/PowerShell/issues/2035#issuecomment-323816221, with broken behavior shown in #5955 and #5122 - going back to https://github.com/PowerShell/PowerShell/pull/2038
Written as of:
PowerShell Core v6.0.2
Interesting. I've always wondered why the 1
in .ps1
would be used to add a sort of semantic versioning to PowerShell scripts, allowing a script to indicate if it was adapted to possible breaking changes.
Language syntax for type parameters for generics methods #5146
Engine support for Extension Methods
to automatically surface in the type system (like they do in _compiled_ .Net languages). #2226
Language support for async APIs #6716 (and RFC)
Could we do something based on on type hints (like @KirkMunro's FormatPx) instead of cmdlets that output formatting objects? #4594 #4237 #3886
Would you normalize the Registry provider to be _content_ based -- instead of feeling like a demo of how a property provider could work? #5444
Would you refactor PSProviders deeply to make them easier to write? It's a shame that it takes two layers of abstraction to arrive at "SHiPS" and finally give people a way to write providers that they're willing to use
Would we be willing to reconsider the "many ways is better" approach and work on a "pit of success" approach? I mean, would we be willing to remove features that are considered "the easy way" but which are fundamentally worse, in favor of having only "the right way" to do things? E.g.:
List[PSObject]
Dictionary[PSObject, PSObject]
(or Dictionary[string, PSObject]
馃槻)All great suggestions, @Jaykul.
Re making @()
notation mean a List[PSObject]
: I think we can take this further and use an efficiently extensible collection type as PowerShell's fundamental collection type, so that it is used wherever [object[]]
currently is (and is used internally as well as when returning collections, with no type-conversion performance penalty); there are challenges around +
use, but @PetSerAl has suggested a custom list implementation to deal with that.
Re making Process {}
the default block in functions, as an aside: it's what Filter
currently does, but it is limited to the - implied - Process
block, and this function variant seemingly never really caught on; unifying the two in the context of Function
makes sense to me.
Funny, on the Azure PowerShell community standup call right now, and they're going to move from AzureRm to a new Az module that is cross platform (no separate AzureRm.Core), and with all command prefixes changed from AzureRm to Az. The two will be side by side for a while, but they're moving to the new command set.
Too bad PowerShell hasn't had an opportunity yet to create a new executable that would run side-by-side with Windows PowerShell, but that could break away from some of the legacy crud that drags it down.
Just imagine what would happen should we go for it:
Realistically speaking it would take at least 6 months of focussed effort to come up with a 'no regrets' version and another 6 months to reiterate on it. Then 3rd party modules need to adapt to it as well in order for this version to be useful, which will take at least another 6 months to get a reasonable coverage (and bear in mind that some modules never will)... Then account for delay and unexpected problems, etc... So, no, I think it is only wishful thinking that one can get rid of all technical debt in one version in one go (and still develop and not just maintain the old version).
As much as I wished that such a version existed, I think it will only be possible to get to it slowly one breaking change at a time. With v6 a lot of breaking changes were already accepted but I think if one includes too many breaking changes in one version, it will become too complex to upgrade existing scripts/modules. It's good to discuss the most valuable breaking changes but until there isn't an LTS version of pwsh, I do not think it is time to think about having a 2nd train of pwsh with more substantial changes in parallel to the existing mainstream version.
@bergmeister Agreed. However even relatively small changes on a core path can seriously hinder adoption. Look at Python 3. It took 10 _years_ to really catch on. With much bigger changes, who knows how long it will take for Perl 6 to be dominant (and it took them 15 years to come up with their right stuff so 1.5 years for PowerShell++ seems optimistic :-)) On the other hand PHP seems to break things on a regular basis, possibly due to the way and what it's used for.
Python 3 is certainly the horror show, has it really caught on yet? I'm still running 2.7, and don't plan on upgrading any time soon. But I haven't heard much about Perl 6 recently either...
I think the lessons to learn from Python there are to separate breaking changes to the language from a version change to the engine. Hypothetically, a PS 7 engine could still run earlier scripts (.PS1) files in a no-breaking changes mode, while if the script were marked as 7 aware (say with a .PS7 extension) they could declare they have been updated _and_ that the require at-least PS 7 to run. Hope that makes sense.
Perhaps the best success story is JavaScript/TypeScript/Babel. A transpiler (with source-map support) seems like the way-to-go for language evolution.
Javascript is a special case. You're pretty much stuck with it so transpiling is really the only option. Typescript is "just" Javascript with extensions so it's easy for people to adopt. Any javascript program is a typescript program so you start with what you have and just add annotations from there. Dart, on the other hand, is it's own language but transpiles to either javascript or a native runtime in Chrome (at least that was the plan at one point). Dart doesn't seem to have picked up much adoption outside of Google, likely because it is its own language..
@BurtHarris
Python 3 is certainly the horror show, has it really caught on yet?
I was reading an article last week where the author was claiming critical mass had been achieved for Python 3. That all the core modules were available and now people where migrating in droves. We shall see..
Interesting. I've always wondered why the 1 in .ps1 would be used to add a sort of semantic versioning to PowerShell scripts, allowing a script to indicate if it was adapted to possible breaking changes..
WRT .ps1
, we reserved the right to change the extension in case we got the language completely wrong resulting in catastrophic changes to the runtime such that scripts for the previous version just wouldn't work. But changing the extension also leads to a huge pile of work because so many things in the ecosystem are tied to an extension. So it's not something to do lightly. And of course, being part of Windows, if we forked the extension, we'd still have to maintain both versions (kind of like Python 2/3).
@bergmeister:
Valid concerns, but in the spirit of:
It's good to discuss the most valuable breaking changes
please share any that you may have in mind.
slowly one breaking change at a time
While a (lexically scoped) opt-in mechanism for incompatible changes is a solution, I'm concerned about two things:
Piecemeal introduction can lead to no one being able to remember which version is required for what feature; Perl (v5-) comes to mind.
The code base becoming bloated and hard to maintain, and the resulting binary being equally bloated, which hurts (at least) startup performance.
I've said it before: to me - and this is just a hunch - the v6 GA was an unfortunate compromise between making old-timers unhappy with breaking changes while carrying forward enough baggage to hinder adoption In the Unix[-like] world.
That said, given PowerShell's relative youth in the Unix[-like] world, perhaps there is (still) more of a willingness to work out problems by way of incompatible changes. As @BrucePay states, Windows PowerShell will have to be maintained anyway, and it can remain the safe haven for backward compatibility.
I think the scale of the negative consequences of such breaking changes has already been covered, and I agree with most of those points. I am writing this because I am doubtful that, in the greater context, these changes would yield significant benefits in the first place. At least for the way I use PowerShell.
The OP includes the following statement:
On the flip side, the inevitable by-product was the accumulation of technical debt that requires memorizing exceptions and is a barrier to newcomers.
The implied premise of this statement seems to be that making these various breaking changes would alleviate the burden of memorizing exceptions. Indeed that would be a great outcome. However, I am skeptical that that would be the result. PowerShell's behavior is deliberately rich. This makes it both expressive and unpredictable. Expressiveness and predictability seem to work against one another, at least amongst the languages I am familiar with.
While I agree that many of the breaking changes mentioned above would improve predictability somewhat in some cases, many unpredictable and surprising aspects of the language will remain. Despite spending years writing PowerShell, I am still frequently surprised by PowerShell behavior that seems to be by design and probably shouldn't be changed. Some recent examples that come to mind are as follows:
[scriptblock]
arguments (#6419)[AutomationNull]::Value
to $null
during parameter binding (#6357 is related)break
and continue
on flow of control in the pipeline (#5811)$null
(SO)$_
(SO 1, SO 2)I expect that these examples are just a small fraction of the many other carefully-designed but surprising nuances I have not yet discovered. I selected these examples because
I'm fine with this. The overwhelming majority of surprising PowerShell behavior does not have lasting impact on my success with PowerShell. Surprising behavior is almost always caught immediately by testing during development. I learn from the things that slip through and use that to improve my testing strategy. This is true whether those surprising behaviors are the kind that could be eliminated or the kind that must remain.
Whether the proposed breaking changes above are made to or not, PowerShell will never become so predictable that I can significantly reduce test coverage. In other words, the way I use PowerShell, I don't think there's much of an upside that's even possible by making the breaking changes proposed above.
(BTW, thank you @mklement0 for directly asking this question. This has been in the back of my mind for a while. It's good to see the opportunity for everyone to say their piece.)
Thanks, @alx9r.
I think it's important to distinguish between _intrinsic_ complexity and _extrinsic_ complexity:
_Intrinsic_ complexity stems from the inherent complexity of the concepts being implemented, which in the case of PowerShell has two primary sources: _internal_ complexity from introducing a new paradigm (object-based pipeline) and marrying two distinct worlds (shell syntax and programming-language syntax), and _external_ complexity from interfacing with multiple, disparate outside worlds.
Intrinsic complexity is managed by documenting concepts and features properly and also given them _names_.
The _member enumeration_ feature similarly lacks an established name and focused documentation - see https://github.com/PowerShell/PowerShell-Docs/issues/2198
.Count
, but also a .Length
property and parameter-binding enumeration behavior being subtly different from pipeline/operator behavior. _Extrinsic_ complexity stems from leaky abstractions and inconsistencies.
If that is not an option due to backward-compatibility concerns, such complexity should be _documented as known problems_.
The script-module variable scoping behavior (which you reference in the context of $_
) is more than just a quirk: it is the root cause of a major problem, the previously mentioned #4568.
All other issues you mention strike me as falling into the extrinsic category (rather than being the result of careful design), because they all present as inconsistencies that have no obvious (documented) rationale or benefit:
$null
to pass _no_ arguments rather than a positional $null
argument.[System.Management.Automation.Internal.AutomationNull]::Value
be converted to $null
during _parameter binding_, even though the type is preserved in _direct variable assignment_? See https://github.com/PowerShell/PowerShell/issues/9150#issuecomment-473743805break
and continue
, across the call stack, with _quiet termination_ if no enclosing loop is found?While the wealth of features and the joining of disparate worlds alone makes it hard to remember all requisite _intrinsic_ complexity, minimizing the _extrinsic_ one is still important.
Having to test your intended approach first without just knowing and trusting that it will work - or to have things break unexpectedly due to surprising behavior - is a serious productivity (and enjoyment) hindrance (even though PowerShell commendably makes it very easy to interactively test behavior).
It comes down to solid concepts (that don't contravene intuitive expectations), descriptive naming, and good documentation:
If I don't know it / not sure if I remember correctly, I need to know where to look it up, and have faith that _known problems and edge cases_ are also documented (either as part of the regular help topics or via links from there).
So even if we decide that _eliminating_ extrinsic complexity is not an option, we can at least _document it systematically_ - and the discussion here can serve as the starting point for compiling a "pitfall gallery" (which may also include cases of unavoidable _intrinsic_ complexity that may be surprising).
we can at least document it systematically - and the discussion here can serve as the starting point for compiling a "pitfall gallery"
Roman Kuzmin has been collecting such a gallery for a while, here: https://github.com/nightroman/PowerShellTraps
Thanks, @HumanEquivalentUnit - that looks like a great collection.
Along with the issues collected in this thread, it could form the basis for a gallery that is part of the _official_ documentation, witch each entry augmented with, as appropriate:
a design rationale that explains why the behavior, though perhaps surprising, is justified after all (a properly framed explanation may make the element of surprise may go away)
an acknowledgement of the problem, stating that:
Exactly for this there is a required version command.
If something breaks in a new version, the only pain we get is that we have to study and learn the usually small differences.
But we (finally) get a platform, which really gets better with each iteration in the core. And much easier to maintain.
No, it's never a question to kick off bad designs and decisions by new knowhow and technology.
Also, to flog the living s**t out of a dead horse, it's still stupendously hard to write a well-behaved function or cmdlet that deals with the magical differences between the host's native filesystem and the filesystemprovider. Such nuanced code is difficult to write and often gotten wrong. Here's a post on stackoverflow I answered about seven or eight years ago that's still valid:
Ref: #8495
There are some strange operator precedence rules that deserve revisions:
PS> 1, 2, 2 + 1, 4, 4 + 1
1
2
2
1
4
4
1
In most languages, one would more commonly expect this to be the output:
1
2
3
4
5
One option that has its own pros and cons is a new feature flag to enable all of the "ideal" behavior so that it's opt-in. If we had some telemetry indicating that most of the usage has moved to the new behavior, we could flip it so it's opt-out. This might all just be a pipe dream as I haven't seen any real world case where such a model worked...
Indeed that would be _nice_, but given the rather... thorough... nature of some of these revisions in this thread, it risks potentially completely _splitting_ compatibility of scripts between "normal" and "experimental", and potentially into several pieces, depending on which flags an individual has enabled.
This means we can't really rely on anything that is an experimental feature in that way, nor write scripts and modules that rely on them, unless we attach a huge warning to the docs pages, or potentially prevent them from being imported unless certain flags are enabled.
This _might_ be avoidable, however... if we can, at will, enable and disable specific experimental features on a per-each-module-scope basis. But, given that that only further complicates matters, I'm not even sure if that's a particularly great solution, either.
@vexx32 just to be clear, I wasn't suggesting an experimental
flag which would eventually become non-experimental. I was thinking something different (perhaps more like Enable-PSFeature
as it would be officially supported (and thus protected from breaking changes unlike experimental features)). I was also thinking it would be a single flag where you opt into these new breaking changes as a set rather than individually.
Oh, in that case... Yeah, absolutely. That would let us neatly package everything together under a single umbrella, and actually use it. Much better than what I was thinking! 馃槃
@jszabo98's addition to the list of issues compiled here, based on #8512:
-and
and -or
unexpectedly have the _same_ precedence, whereas in most languages (including C#) -and
has higher precedence than -or
.
Example expression that contravenes expectations:
PS> $true -or $true -and $false
False
Due to left-associativity and -and
and -or
having the same precedence, the above is evaluated as ($true -or $true) -and $false
rather than the expected $true -or ($true -and $false)
This might all just be a pipe dream as I haven't seen any real world case where such a model worked...
In fact, we already have a model that works all over the world - LTS. MSFT uses this to develop Windows 10. The model is used to develop Unix distributions.
For us, this means that we could release LTS PowerShell Core versions every two years and include them in Windows 10 LTS versions and Unix LTS version. (One problem is to sync Windows and Unix release dates for LTS versions. I guess MSFT can negotiate with the main Unix companies.)
During these two years, we are collecting minor breaking changes, transferring more significant breaking changes to the next cycle.
For each new LTS version, the ScriptAnalyzer should get a set of new rules to facilitate script migration.
With this model, critical products and scripts will only have to jump from the previous version to the next after thorough testing and migration.
I used this approach when I migrated step by step (version by version) an old system from PHP 4 version to next PHP version until I reached the supported PHP 5.x version. At each step, I had to make relatively small and quick changes, after which the system continued to work for some time until the next step.
Update: It should be in sync with .Net Core LTS versions. I expect that .Net Core 3.1 will be next LTS and PowerShell Core 6.x should have been on the version.
Honestly? That sounds like a really _great_ idea for PowerShell Core. That would allow us to make breaking changes like this that really could help it move forward, not overburdening it with compatibility concerns, while _also_ not overburdening it with too many breaking changes in the interim between LTS releases.
Ref: https://github.com/PowerShell/PowerShell/pull/8407#issuecomment-453221566
Get-Variable -ValueOnly -Name myVar | Should -Be $var
assertion fails when $var
is an array.
This is because Get-Variable -ValueOnly
does not use the enumerating overload for WriteObject()
.
Regarding having multiple versions of powershell which have different feature sets and breaking changes: We already have this with legacy and core. Which seems bad at first glance but may provide a working model for the future. Stay with me...
We have to set almost all processes which use powershell to call the core pwsh.exe before running the actual scripts because Windows scheduled tasks, jenkins, and many environments are not powershell core aware. We do this because we are using features which are only available in a particular PS version.
In cases where we cannot force a call to core we have to have every script check which version of PS it is running under and then call itself using the appropriate PS version on the host system. Whatever trick we use to get the right code to run in the right environment we often are passing all of the environment and parameters to the same script that was already running.
That is the UNIX philosophy. Make a tool do one thing and do it well. Then combine tools to create a whole that is greater than the sum of its parts. This is the world of scripting where we make subshell calls to other utilities all the time and process results. Historically that is scripting. Powershell trends toward a strange convention by trying to keep all the code native but that is only a convention--not a requirement. PS runs with access to the OS and all the utilities available just like unix shell scripts.
This would be much easier if the file extension for core had been updated to .ps6 so the powershell engine could be doing the work for us. Each binary version of powershell can remain independent of the others as long as every call to one passes environments and returns the outputs, errors, warnings, exceptions, to the other--which we have to do manually right now at great effort. It is not necessary to have backward compatible flags in the engine as long as the engine knows which version to run. The context switching involved is already happening so we are already at worst case scenario.
We will be forced to use awful hacks in powershell forever UNLESS we go ahead and break things in a well defined manner. Breaking changes are not going to make our lives any more difficult than they already are. Alternatively, if we use incremental, continuous improvement and have the engine do the heavy lifting of switching context then our code (and lives) will be better.
@jtmoree-kahalamgmt-com Thanks for your feedback! Feel free to open new issues to share your experience and feedback and ask new helpful features.
This would be much easier if the file extension for core had been updated to .ps6 so the powershell engine could be doing the work for us.
See https://github.com/PowerShell/PowerShell/issues/2013#issuecomment-247987977
And I created https://github.com/PowerShell/PowerShell/issues/9138 based on your feedback.
Python 3 is certainly the horror show, has it really caught on yet? I'm still running 2.7, and don't plan on upgrading any time soon. But I haven't heard much about Perl 6 recently either...
I think the lessons to learn from Python there are to separate breaking changes to the language from a version change to the engine. Hypothetically, a PS 7 engine could still run earlier scripts (.PS1) files in a no-breaking changes mode, while if the script were marked as 7 aware (say with a .PS7 extension) they could declare they have been updated _and_ that the require at-least PS 7 to run. Hope that makes sense.
I think calling Python 3 a horror show is overly dramatic. If I look at PyPi top downloaded packages, and grab the individual package metadata about v2 vs v3 downloads, I see v3 has now broken 33% penetration for pretty much all the top packages, and for newer libraries like Ken Reitz' requests library, it's nearly 50/50 split: yesterday there was 685k downloads of v3, vs 875k of v2.
The major thing holding Python 3 back for a long time was scientific community support for pandas and pysci. But, in the latest stats, the scientific community now prefers Python 3 to Python 2: https://pypistats.org/packages/pandas 231k v3 downloads, vs 175k v2 downloads.
As the famous Dr. W. Edwards Deming would say, "In God we trust; all others, bring data!" Maybe you can write a PowerShell script to analzye the PyPi data set and tell me how horrific Python 3 is using data: https://packaging.python.org/guides/analyzing-pypi-package-downloads/
I would be happy with (in this exact order):
Easier to use module system that produces good stack traces.
Replace PowerShell default Error object string formatting that swallows 99% of the usefulness of programming in a stack-based von Neumann architecture.
Fix null redirection so that the three different ways to redirect output streams to the null device are equivalent in performance.
Fix dynamic scoping:
function a($f) {& $f}
function main() {
$f = {write-host 'success'}
a {& $f} # stack-overflow
a {& $f}.getnewclosure() # okay
}
[void] (main)
set-strictmode -version 'latest'
$erroractionpreference = 'stop'
function main() {
$a = 'foo'
$f = {
$g = {$a}.getnewclosure()
& $g # "variable '$a' cannot be retrieved because it has not been set."
}.getnewclosure()
& $f
}
main
Output accumulates; return
does not do the same thing as other languages; fix it.
PowerShell Does not have syntactic sugar for Disposing of Objects' External Resources
Rewrite PowerShellGet. It's AWFUL. Functions with hundreds of lines of code, swallowed exceptions, no test coverage.
Get rid of how module caching works.
Provide "compiled modules" similar to how Py files compile to pyc files for performance improvement.
Add $none
literal
$null
with $none
in many places$none
special value to parameters, so that I do not need to write crazy switch statements to call code 5 different ways. COM and C# dynamic does this correctly - PowerShell makes this a disaster. Example would be how I have to dynamically map a deployment configuration for SSRS reports in the ReportingServicesTools module based on Subscription Occurrence type. I would prefer just to pass a bag of properties and have the method internally deal with the complexity, rather than force externally complexity on end-users. ParameterSetName is only useful as a concept as a template, not good for programming against. It muddles ad-hoc polymorphism with intellisense.@jzabroski Regarding ErrorRecord formatting, there's a separate issue used to discuss that: https://github.com/PowerShell/PowerShell/issues/3647, since formatting is not considered a breaking change, we can make improvements there.
Some of your other requests are potentially additive and not breaking so those should be separate issues. This specific issue is discussing requests for breaking changes if we had an opportunity and could justify doing a new major version of PS.
@jzabroski Regarding ErrorRecord formatting, there's a separate issue used to discuss that: #3647, since formatting is not considered a breaking change, we can make improvements there.
_Why_ is it not a breaking change? Because developers shouldn't pattern match on strings in error messages in the first place? I guess that's valid. But it will still break downstream consumers.
Some of your other requests are potentially additive and not breaking so those should be separate issues.
Which ones do you think are potentially additive? I can create issues for them so we don't derail this discussion, but I carefully put together this list this morning. I hate PowerShell with a fury and passion of rolling thunder and exploding lightning. PowerShell is a bowling ball with a butcher knife. My top 5 most embarrassing moments as an engineer all happened while using PowerShell. I'll never get that time back.
I've updated my suggestions to make it clear why these are breaking (and awesome) change suggestions.
Output accumulates; return does not do the same thing as other languages; fix it.
return
does behave as you expect it to within a class
method. The way "output accumulates" in a regular/advanced PS function is pretty typical for shell scripting environments e.g Korn shell and Bash. That is, copy several commands from the console and put them in a function. When you invoke that function, it outputs from each command just as when you executed those commands from the console.
My primary gripe with PowerShell is that in the regular/advanced function case, return
should have never taken arguments. Because something like:
return $foo
is really executed as:
$foo
return
This gives folks a sense that they can control the output by using return and they can't.
return
does behave as you expect it to within aclass
method. The way "output accumulates" in a regular/advanced PS function is pretty typical for shell scripting environments e.g Korn shell and Bash.
I hear people say this, including @BrucePay (see emphasis mine, below, quoting him) , but the bottom line is:
Only in PowerShell do we have this illogical "return does not return" non-sense and output accumulates. This is a glorified memory leak, and you should think of it as such.
To be honest, everywhere in PowerShell In Action I search for the phrase "problems", I can add something to the vZeroTechnicalDebt list. Here is what Bruce writes about return statement:
7.3 RETURNING VALUES FROM FUNCTIONS
Now it鈥檚 time to talk about returning values from functions. We鈥檝e been doing this all along, but there鈥檚 something we need to highlight. Because PowerShell is a shell, it doesn鈥檛 return results鈥攊t writes output or emits objects. As you鈥檝e seen, the result of any expression or pipeline is to emit the result object to the caller. At the command line, if you type three expressions separated by semicolons, the results of all three statements are output:
[...]
In the traditional approach, you have to initialize a result variable,$result
, to hold the array being produced, add each element to the array, and then emit the array:PS (16) > function tradnum > { > $result = @() > $i=1 > while ($i -le 10) > { > $result += $i > $i++ > } > $result > }
This code is significantly more complex: you have to manage two variables in the function now instead of one. If you were writing in a language that didn鈥檛 automatically extend the size of the array, it would be even more complicated, as you鈥檇 have to add code to resize the array manually. And even though PowerShell will automatically resize the array, it鈥檚 not efficient compared to capturing the streamed output. The point is to make you think about how you can use the facilities that PowerShell offers to improve
your code. If you find yourself writing code that explicitly constructs arrays, consider looking at it to see if it can be rewritten to take advantage of streaming instead.
I think the motivation behind this feature is nice, but there are much cleaner, easier-to-read ways to achieve the same goal: Implicit promotion from a single object to a collection of objects. Moreover, PowerShell already has a simple way to return output for redirection: Write-Output
.
In Bash, return exits a function, period.
Huh? Bash accumulates output just the same:
hillr@Keith-Dell8500:~$ function foo() {
> echo "Hello"
> ls ~
> date
> return
> echo "World"
> }
hillr@Keith-Dell8500:~$ foo
Hello
bar.ps1 date.txt test.ps1
baz.ps1 dotnet powershell-preview_6.2.0-rc.1-1.ubuntu.16.04_amd64.deb
Tue Mar 26 16:06:18 DST 2019
Just like PowerShell, Bash still outputs everything from each command up until the return
statement.
Ah, I think you are referring to what is returned in $?
. Yeah, that's a diff. I think those have to be int
return values in Bash to indicate success / failure. In PS, $?
indicates success/failure as a bool based on errors generated by the function. In practice, I don't think $?
isn't used that much. I use exception handling via try/catch/finally
instead.
Yes. return
has only one purpose in those languages. In PowerShell, it is mixed with output. Mixing return values and output streams is a bad idea. It can also lead to memory leak-like performance woes. What you really want is co-routines or "yield return", or generators, generator-builders and scanners (as in Icon). The reason being is that input consumption is lazy and failure suspends computation while allowing the portion of computation generated so far to be disposed of.
You can think of "output redirection" as really two objects: a Processor and a ProcessorContext.
Edit: I see why you rebutted my original point. You thought I was saying it's bad to be able to yield output. I am instead saying its bad to return output in the way PowerShell does it.
I use exception handling via
try/catch/finally
instead.
That's a lot of typing, and doesn't clearly express intent in all cases.
Well, PowerShell is based on .NET which is garbage collected, so it shouldn't suffer memory "leaks". That doesn't mean you can't have a memory hoard though. So you do have to be a bit careful when handling large output/collections that is assigned to a variable or put in a hashtable of some sort.
I think PowerShell's return
statement should only have a single purpose and that is to return early - nothing else. When someone does do return 42
it has no bearing on $?
. It is just another item output (yielded) to the output stream.
Like I mentioned before, $?
is managed by PowerShell and has nothing to do with what a function outputs/returns. $?
is set based on generated errors (or the lack thereof). This means that when you execute $res = foo
- $res will contain every object output by the function foo. If you want to know if the function "failed" you can inspect $?
on every single command you execute OR you can wrap a bunch of commands in a try/catch
block. I find the former much less typing (in scripts).
hear people say this, including @BrucePay (see emphasis mine, below, quoting him) , but the bottom line is:
In Korn Shell, return exits a function, period.
In Bash, return exits a function, period.
In PowerShell, return exits a function, too:
PS C:\> function test { "hello"; return "!"; "world"}
PS C:\> test
hello
!
Mixing return values and output streams is a bad idea. It can also lead to memory leak-like performance woes
There is no mixing because there is no difference between the output stream and the return value in PowerShell. You're talking as if the function writes the strings to stdout and separately returns an exclamation mark to the caller, but it doesn't. Pointing to write-output
seems like the same mistake; all of these: write-output 'x'
, 'x'
, return 'x'
send something down the same output path, to the pipeline. There is no separate return value.
What you really want is co-routines or "yield return", or generators, generator-builders and scanners (as in Icon). T
That would mean that if you wrote a command on the command line, then put it in a script or function, it would behave differently:
PS C:\> robocopy /whatever
PS C:\> "robocopy /whatever" | set-content test.ps1; .\test.ps1
PS C:\> function test { robocopy /whatever }
I like seeing the same output in all three cases, not have the output swallowed in two cases because I didn't yield return
for no reason. The way PowerShell does it is consistent and convenient for a shell and writing shell / batch scripts - taking something you wrote on the command line and putting it somewhere you can call it over and over. Doing that shouldn't completely change how the code behaves.
The reason being is that input consumption is lazy and failure suspends computation while allowing the portion of computation generated so far to be disposed of.
The PowerShell pipeline is controlled by the runtime, if I understand what you're saying, it's already able to to do that:
PS D:\> 1..1mb |foreach { write-verbose -Verbose "incoming number: $_"; $_ } |select -first 3
VERBOSE: incoming number: 1
1
VERBOSE: incoming number: 2
2
VERBOSE: incoming number: 3
3
the select
will terminate the pipeline as soon as it has three things The rest of the million numbers weren't fed into foreach
. Information moving down the pipeline is a "push" model, but it's controlled by the PowerShell engine.
Although, being a shell, it's not clear what it would mean for "input failure to suspend computation" if the input is coming from a native command or something outside a PS cmdlet.
Provide "compiled modules" similar to how Py files compile to pyc files for performance improvement.
I think the intended solution for more performance is to write modules in C#. PowerShell code does get JIT compiled, in some situations, but PS team members have said before they have no plan to expose this to end users for customisation, e.g. IIRC Dongbo Wang's PS Conf Asia talk on the PS Engine internals Nov 2018. (I guess that could change).
Fix null redirection so that the three different ways to redirect output streams to the null device are equivalent in performance.
Desirable, but it seems unlikely; $null = 4
is a simple variable bind, 4 | out-null
needs the overhead of starting up a pipeline and doing command resolution - what if someone had intentionally made out-null
a proxy-wrapper which added logging code? Is that technical debt, that asking for a pipeline and cmdlet to run is slower than an expression?
Fix dynamic scoping:
If you mean "don't use dynamic scoping", it was a deliberate choice, e.g. Bruce Payette here saying "dynamic scoping, this is almost never used; before, like 30 years ago LISP had dynamic scoping, but it's very very rarely used now. The reason that we're using it is: that's what shells use. Shells use the equivalent of dynamic scoping, even Bash, and the Unix shells use the equivalent of dynamic scoping, where the variables from the caller's context - in their case the variables from the caller's context are copied into the child context. [..] With one modification, normally in dynamic scoping you look up the call chain and set the variable where it exists, we don't do that, we set the variable in the current scope all the time, and again that emulates the behaviour of shells".
That might be annoying, but is that "technical debt" for a shell, and would be "zeroTechnicalDebt" if it was changed?
Well, PowerShell is based on .NET which is garbage collected, so it shouldn't suffer memory "leaks". That doesn't mean you can't have a memory hoard though. So you do have to be a bit careful when handling large output/collections that is assigned to a variable or put in a hashtable of some sort.
I had a system administrator, who is not an engineer, write some logic to try to tabulate metadata on all files on a 8 TB file share. That did not end well: It ate up a lot of memory, and also pegged the system at 100% cpu... we could not even log in to the box after the script started, and the only solution _in production_ to save the server was to hard reboot it.
In my experience, PowerShell makes it easy to write such rubbish. You can do with, and critique, my feedback however you wish. It will not change the fact PowerShell makes it easy to write scripts that can effectively crash production machines.
@jzabroski can you share any details of the script?
Powershell people try to encourage pipeline streaming (get-childitem | foreach-object {}
) instead of pre-loading a dataset ($files = get-childitem; $files | foreach-object
) deliberately to reduce memory use. Did it do that up-front loading?
Was there a use of Group-Object
to collect up the similar files? There have been improvements in Group-Object performance recently which would reduce CPU use, if not memory use.
PowerShell is single-core by default, was there jobs/runspaces code in there, was it a single core server? PS 6.1 has Start-ThreadJob which allows multiple jobs with way less overhead and throttling to help control CPU use.
In my experience, PowerShell makes it easy to write such rubbish
All the tutorials and books which tell you to write $files = @(); $files += $f
must have cost the world a small fortune in cpu cycles and memory. That's a pattern which is slow, expensive of processor and memory, and distressingly common. I'm curious if that pattern was in the script at all? (And if there's any way we could block it without breaking backwards compatibility, or make += invalid on arrays in future breaking-changes?).
But, there is a tradeoff here. Get-ChildItem -Recurse
is 22 characters and eats 800+MB of RAM on just my C: drive, the C# to do that without using tons of memory is 50+ lines and doesn't get file metadata at all, and will be more like 80MB. That's not PS leaking memory, that's PS using it to gain user convenience.
Is there any language where you don't have to care about the details of how it works, can write useful code, can't write bad code, and don't have to be a skilled programmer? (And which is also a shell?)
Would any of your earlier proposed changes have made it so the script worked?
You can write a Bash script to list files, spawn a process for each one, until your server falls over and you can't even SSH into it. You can write a Python script which os.walk()
s the file tree and calls os.stat()
for each file and uses less memory and finishes faster - but takes 10 lines instead of 1 and you still have to care about OS and care about code details (and it still isn't a shell, and won't work with PS Providers).
I would like PS to have more "pits of success" and fewer trip hazards. Fast filename listing being one of them that I see a lot of use for. Radically changing it to be a completely different language, and hobbling the areas it does well, won't do that.
You want Python, Python exists, it's a beautiful language, it's not hobbled by trying to fit in two different worlds and keep scoping and function return semantics of a thirty year old shell from a different OS (why, Microsoft), or by Microsoft's commitment to backwards compatibility (and obsession with .Net), so it gets to dodge all those questions, rather than face them.
I would like PS to have more "pits of success" and fewer trip hazards.
If I could add twenty thumbs up for that alone, I would. This is exactly the pattern and approach PowerShell needs to take.
For-EachObject is a PowerShell function for taking PowerShell collections and converting it into unreadable stack traces.
Today, I took a script written by Amazon to list EC2 Windows Disks: https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-windows-volumes.html#windows-list-disks
Guess what? I didn't have Initialize-AWSDefaults
set-up on this PC, and since the script author used ForEach-Object, my stack trace looked like this _garbage_:
PS C:\Windows\system32> D:\Installs\PowerShell\List-EC2Disks.ps1
Could not access the AWS API, therefore, VolumeId is not available.
Verify that you provided your access keys.
ForEach-Object : You cannot call a method on a null-valued expression.
At D:\Installs\PowerShell\List-EC2Disks.ps1:39 char:12
+ Get-disk | ForEach-Object {
+ ~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [ForEach-Object], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull,Microsoft.PowerShell.Commands.ForEachObjectCommand
This happens _every single time people use ForEach-Object expressions in their code_. The stack traces are _always terrible_. Re-writing this is painful. Thankfully, it's just a script I copy-paste into a random file, save, and run. If it's a problem in a large module, I have to get the source code for the module, re-write the script, hope to god in refactoring it I didn't break anything across the module by converting the ForEach-Object expression AND hope I can figure out where the REAL error is just so I can _do my job_. And now I have some custom code I forked from some major repository, just to figure out why I'm getting some error because PowerShell's Write-Error
IS AWFUL compared to C# simple idiom of throw new Exception("message", ex)
.
Re-writing this to use a for ($var in $list)
, accumulating the results in a variable $results
, I get the following error:
PS C:\Windows\system32> D:\Installs\PowerShell\List-EC2Disks.ps1
Could not access the AWS API, therefore, VolumeId is not available.
Verify that you provided your access keys.
You cannot call a method on a null-valued expression.
At D:\Installs\PowerShell\List-EC2Disks.ps1:58 char:26
+ ... lDevice = If ($VirtualDeviceMap.ContainsKey($BlockDeviceName)) { $Vir ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Oh, and by the way, every time I have to do this I can't remember PowerShell's stupid rules for adding items to lists, so I have to Google https://www.google.com/search?q=powershell+append+to+list and read a blog post about someone explaining how non-intuitive PowerShell's list operations are.
If you don't know C# or a language that does this well, it's easy to argue that this behavior is OK. However, C# handles this problem intelligently by wrapping the exception in an AggregateException, so that you get both the inner and outer exception. PowerShell? Hahahaha. Yeah, that would be too useful.
@jzabroski can you share any details of the script?
Sadly (_and happily_), I don't remember it. It was a fileshare, so it was almost certainly one or two cores at most. If it was two cores, the remaining CPU was likely taken by Windows Defender.
Edit: Found it in Teams chat.
$big32聽=聽Get-ChildItem聽D:\Data\Website\Data聽-recurse聽|聽Sort-Object聽length聽-descending聽|聽select-object聽-first聽32聽|聽measure-object聽-property聽length聽鈥搒um
As an added bonus, I just spoke to the person who ran this command (and felt incredibly bad), and he said the use case "was to grab the biggest 32 files for the initial replication group sizing requirement for DFS".
There is no mixing because there is no difference between the output stream and the return value in PowerShell.
I have yet to see people use this correctly on the first try. Again, this is not what bash does or any other shell. While it was an interesting experiment to unify the two concepts, it was a complete and utter disaster from a readability standpoint. It doesn't help that Write-Output
was broken in PowerShell v6 for 16 months.
@jzabroski from your output of For-EachObject
it looks like you are not using Powershell Core (that is what this repository is dedicated to) could you please try Powershell core and see if you can reproduce it. Make sure to use the latest version. Be mindful, that windows specific functionality (such as Get-Disk) is not part of Powershell Core, you will need to take this into account, while doing your tests.
I have yet to see people use this correctly on the first try. Again, this is not what bash does or any other shell.
$ test () { ls /tmp; ls /etc/pm; return 5; ls /v*; } ; x=$(test)
$ echo "$x"
tmux-1000
sleep.d
$ echo $?
0
That is both output of two separate commands accumulating in a Bash function, and Bash return not functioning as return functions in C#. What, specifically, "isn't what Bash does"?
You are not capturing the return value correctly for the bash example. the return is lost because the $? is always the last command. in your case 'echo'.
$ test () { ls /tmp; ls /etc/pm; return 5; ls /v*; } ; x=$(test)
$ echo $?
5
$ echo $x
tmux-1000
sleep.d
POWERSHELL does this differently
> function test() { ls; return 5 ; ls } ; $x=test
> echo $?
True
> echo $x
Directory: c:\foo\bar
Mode LastWriteTime Length Name
---- ------------- ------ ----
More PowerShell oddities:
Chocolatey adds a RefreshEnv.cmd which is a GODSEND. Why doesn't this just ship with PowerShell? Why do I need to restart my Shell or download some stupid script off the Internet to reset my $env?
@jzabroski This was brought up in https://github.com/PowerShell/PowerShell-RFC/pull/92 You might want to chime in there.
Create a standard object model for PowerShell providers: https://github.com/PowerShell/SHiPS/issues/66#issuecomment-368924485
In the link on SHiPS issue 66, I outline possible candidate cmdlets. It's not clear to me why we can't implement something like Haskell typeclass behavior for these core cmdlets.
This APIs is already in PowerShell.
@iSazonov Can you please edit and quote what you are replying to, and add some context, such as a hyperlink to documentation of said APIs? TYVM
@jzabroski Look files in srcSystem.Management.Automationnamespaces folder.
POWERSHELL does this differently
It's bash
that's doing it differently - return
can only set the (integer) process exit code (that's not what "return" means in other programming languages, that's an exit
).
PowerShell can do that too, with exit 5
It's just that we don't use exit codes very often in PowerShell (pretty much only for scheduled tasks and interop with other OSes), because it isn't a process-based shell, and functions should not exit.
At the end of the day, it's the same reason why detecting the stdout handle type isn't useful in PowerShell (and why "redirecting" to out-null isn't the same as casting to [void]
or assigning to $null
)
I recommend people file new issues if they actually have feedback they want acted on -- since the team has _clearly and unequivocally ruled out_ a not-backward-compatible version of PowerShell, I don't know why this thread just keeps going...
since the team has _clearly and unequivocally ruled out_ a not-backward-compatible version of
That is news to me. How is this clear? Please provide links to posts or other documentation.
It's a bit frustrating for some of us to hear "we'll never break compatibility" while we wrestle with some breakage every day. The point of this thread is so that the free market of ideas can solve some of these problems when upstream won't.
Someone might find it advantageous enough to fork powershell and create a version with minimal technical debt. That group will benefit from the ideas presented in this thread. (This is how the world works now. May the best powershell win.)
I'll reiterate that the team has already produced a 'not-backward compat' version of powershell by renaming the command from powershell to pwsh. Power(SHELL) is a shell. the job of a shell is to be the glue for humans that ties digital systems together. It's not a compiled binary with minimal external dependencies. Even traditional programming languages plan for and make breaking changes.
POWERSHELL does this differently
It's
bash
that's doing it differently -return
can only set the (integer) process exit code (that's not
I'm curious about other shells. What do they do? korn, csh, etc.
Here is an article discussing the return statement in multiple languages: https://en.wikipedia.org/wiki/Return_statement
It calls out that operating system [shells] allow for multiple things to be returned: return code and output.
My team has a variety of scripts that only run in PowerShell 5 even though we use PowerShell 6 as much as possible. In my experience, the premise that PowerShell is completely backward-compatible is definitely false. There are at least some extreme cases (ex: ErrorActionPreference
not behaving intuitively) that should definitely be addressed as a breaking change--cases where making the fix would be less "breaking" than not doing so.
@chriskuech is there an issue detailing your issue with ErrorActionPreference
?
@SteveL-MSFT I believe the Mosaic that @KirkMunro linked to directly below @chriskuech comments is the issue you are looking for. And yes, I squeeze the word Mosaic into a tech conversation.
That said, @iSazonov closed @chriskuech original issue on October 1, 2018: See https://github.com/PowerShell/PowerShell/issues/7774
It seems this stuff keeps coming up, in different forms, and the Committee keeps closing issues around it.
@jzabroski found my main issue targeting the root cause.
I also filed an issue in the past around one of the symptoms: Invoke-WebRequest
throwing a terminating error. I have personally witnessed multiple people completely dumbfounded around the whole try
/catch
boilerplate for handling failed HTTP requests, which occurs because the internal .NET methods throw terminating errors. In each of the three different cases, the engineer responded with an expletive when I explained the underlying behavior and why the issue would allegedly never be fixed.
To summarize, terminating errors terminate the script because PowerShell creators believe the script cannot logically proceed beyond the error, but I don't think that is literally ever anyone but the scripter's decision to make on a case-by-case basis. Only the scripter can decide if they want the script to Stop
, SilentlyContinue
, Continue
, etc.
(Tangential to the issue above)
If PowerShell implements an opt-in "ZeroTechDebt" mode, I definitely think that $ErrorActionPreference = "Stop"
should be set by default. Obviously, this setting does not make sense in a REPL and should therefore not be set by default for all scenarios, but literally all of my scripts are prefixed with $ErrorActionPreference = "Stop"
to enable "defensive programming" and behave like a "normal" programming language.
@chriskuech If you haven't already, please give this collection of RFCs a look: https://github.com/PowerShell/PowerShell-RFC/pull/187. They speak directly to what you're talking about here without needing a new zero tech debt version of PowerShell.
You can also find the four RFCs in separate issues on the Issues page in that repo if that makes them easier to read/digest. Just look for open issues posted by me and you'll find them
@SteveL-MSFT Here is a similar issue that impedes my productivity. It's not $ErrorActionPreference
but $ConfirmPreference
:
Below is an ugly script I wrote for setting SQL Server disk volumes to 64kb.
Import-Module Storage;
function Format-Drives
{
# See https://stackoverflow.com/a/42621174/1040437 (Formatting a disk using PowerShell without prompting for confirmation)
$currentconfirm = $ConfirmPreference
$ConfirmPreference = 'none'
Get-Disk | Where isOffline | Set-Disk -isOffline $false
# The next line of this script is (almost) copy-pasted verbatim from: https://blogs.technet.microsoft.com/heyscriptingguy/2013/05/29/use-powershell-to-initialize-raw-disks-and-to-partition-and-format-volumes/
Get-Disk | Where partitionstyle -eq 'raw' | Initialize-Disk -PartitionStyle MBR -Confirm:$false -PassThru | New-Partition -AssignDriveLetter -UseMaximumSize -IsActive | Format-Volume -FileSystem NTFS -AllocationUnitSize 64kb -Confirm:$false
# See https://stackoverflow.com/a/42621174/1040437 (Formatting a disk using PowerShell without prompting for confirmation)
$ConfirmPreference = $currentconfirm
}
Format-Drives
Couple of side points:
Format-Volume
is wanting for more. Only two examples? And the official documentation is not in sync with the website: https://github.com/MicrosoftDocs/windows-powershell-docs/issues/1170This is annoying when you are writing Advanced Functions that load .NET Framework libraries where the API is completely different between .NET Framework and .NET Core, such as how AccessControl API works.
@jzabroski You can specify the edition, however, to separate that:
#requires -PSEdition Desktop
# versus
#requires -PSEdition Core
Just a quick note that @rjmholt has officially started a discussion about how to implement and manage breaking changes: #13129.
Plus, #6817 and #10967 are more behaviors worth revisiting once breaking changes are allowed.
(They have the same root cause, explained in https://github.com/PowerShell/PowerShell/issues/10967#issuecomment-561843650).
The fact that ,
is stronger than +
is logical, as PowerShell is more about lists than about numbers. IMHO.
@rjmholt has officially started a discussion
I should say it's no more official than any other discussion
I would love to have strict checking on imports and function signatures editing-time.
Are you considering to introduce new importing semantics such as EcmaScript modules provide?
No more global namespace polution. Slow import mechanics.
Most helpful comment
There expensive to fix problems which I think of as the technical debt of PowerShell not adopting new .NET concepts as they came along
Language syntax for type parameters for generics methods #5146
Engine support for
Extension Methods
to automatically surface in the type system (like they do in _compiled_ .Net languages). #2226Language support for async APIs #6716 (and RFC)
There are a few regrettable features:
Could we do something based on on type hints (like @KirkMunro's FormatPx) instead of cmdlets that output formatting objects? #4594 #4237 #3886
Would you normalize the Registry provider to be _content_ based -- instead of feeling like a demo of how a property provider could work? #5444
Would you refactor PSProviders deeply to make them easier to write? It's a shame that it takes two layers of abstraction to arrive at "SHiPS" and finally give people a way to write providers that they're willing to use
How about a bigger question:
Would we be willing to reconsider the "many ways is better" approach and work on a "pit of success" approach? I mean, would we be willing to remove features that are considered "the easy way" but which are fundamentally worse, in favor of having only "the right way" to do things? E.g.:
List[PSObject]
Dictionary[PSObject, PSObject]
(orDictionary[string, PSObject]
馃槻)