Powershell: Make the stop-parsing symbol (--%) work on Unix

Created on 8 May 2017  路  16Comments  路  Source: PowerShell/PowerShell

Related: #3734

For the stop-parsing symbol, --%, to work sensibly on Unix:

  • [preferably] _either_: /bin/sh, the default shell, must be invoked, passing it a command line as a single string via -c.
    As a - in my view beneficial - side effect, sh would perform _globbing_ on unquoted tokens.

  • _or_: PowerShell must still first interpret the tokens of the pass-through command line to strip syntactical quoting, in addition to expanding environment/shell variable references.
    That may still run afoul of users' expectation that _globbing_ should be performed.

Note: On _Windows_ PowerShell can get away with only expanding %<name>%-style environment-variable references and then invoking the target utility _without shell involvement_, but that's not an option on Unix, where the shell is expected to perform tokenization up front and pass an _array_ of _literal_ arguments instead (with syntactical elements such as quoting characters removed).

To illustrate the problem with the current behavior (run the commands on a _Unix_ platform):

> /bin/echo --% 'hi, mom'
'hi mom'   # The enclosing single quotes were treated as *literals*

Trying to pass an awk command that from sh looks like this: awk -F\" 'BEGIN { print "hi, mom" }'

> awk --% -F\" 'BEGIN { print "hi, mom" }'
/usr/bin/awk: syntax error at source line 1
 context is
     >>> ' <<< 
/usr/bin/awk: bailing out at source line 1

The array of arguments getting passed to awk breaks down as follows ($<n> represents the nth positional argument):

$1=[-F"]
$2=['BEGIN]
$3=[{]
$4=[print]
$5=[hi, mom]
$6=[}']

As you can see, the intended argument boundaries weren't respected, and the ' quoting characters were retained.

In fact, there's currently no way to make the above command work with --%, because no one is interpreting the quoting and removing the syntactical quoting characters.

Currently, only the following truly Byzantine invocation - _not_ involving --% - can make the above awk command work:

/bin/sh -c "awk -F\\\`" 'BEGIN { print \`"hi, mom\`" }'"

Environment data

PowerShell Core v6.0.0-alpha (v6.0.0-alpha.18) on macOS 10.12.4
PowerShell Core v6.0.0-alpha (v6.0.0-alpha.18) on Ubuntu 16.04.1 LTS
Committee-Reviewed Resolution-Won't Fix WG-Interactive-Console

Most helpful comment

@PowerShell/powershell-committee reviewed this. due to inherent differences between Windows and Unix on argument processing, --% was originally designed for Windows and we don't want different behavior for this switch on Windows on Unix. We are looking at making changes to improve native command arg processing on both Windows and Unix so that --% isn't required (where possible)

All 16 comments

--% is very specifically a Windows thing designed as a (rather poor) workaround for the inconsistencies in command line processing on Windows. I'm not sure it has or should have a role on non-Windows systems. It would be better if PowerShell quoting "just worked" on these systems.

@BrucePay:

That's a commendable goal, but I fear it is unrealistic:

While automatically applying globbing on Unix platforms when invoking external utilities is helpful, I don't think you can ever make PowerShell-parsed command lines "just work" as POSIX-like shell command lines.

Take the following example, which relies on (POSIX-compliant) shell parameter expansion:

$ /bin/echo "${HOME##*/}"  # remove the path prefix from $HOME
jdoe

If you try this from PowerShell:

> /bin/echo "${HOME##*/}" 
     # !! no output - PS interpreted *everything* inside {...} as an *identifier*

On a secondary note:

I presume that the % in --% is a nod to the cmd.exe syntax of environment-variable references (%<name>%), so, I suppose, the POSIX-shell equivalent would be --$, but I'm not sure that platform distinction is worth introducing.

To provide some more problematic examples:


Backtick-based command substitutions:

While $(...) is the preferred, modern syntax for command substitutions, `...`, the legacy syntax using backticks, is still widely used (and _not_ deprecated).

$ /bin/echo "`ls -d /`"
/

For obvious reasons, this command breaks PowerShell's parsing.


Word splitting
in POSIX shells means that output from command substitutions is blindly split into tokens by whitespace:

$ printf '%s@' $(printf 'a b\nc')
a@b@c

PowerShell preserves the partitioning into arguments based on line breaks:

> printf '%s@' $(printf 'a b\nc')
a b@c  # !! 'a b' was preserved as a *single* argument

$(( ... )) are POSIX-compliant arithmetic expansion in which variables needn't - and generally shouldn't - be $-prefixed:

$ v=1; /bin/echo $(( v + 1 ))
2

The missing $ before v breaks the command in powershell (or, worse, does something else).


The following frequently used _nonstandard_ (not POSIX-compliant) bash features are less of a concern, because users cannot reasonably assume that they work without calling bash explicitly, given that it is customary to call /bin/sh behind the scenes, which can be presumed to implement POSIX features only:

Thus, with the need to pass a _single_ string to the bash executable, that string can be protected from PowerShell's interpretation with '...' (assuming that the PS quoting problems are solved):

Brace expansion:

> bash -c 'echo A{1..3}'
A1 A2 A3

Process substitution (<(...))

> bash -c 'cat -n <(ls -d /)'
1 /

@mklement0 you are right, those things are unlikely to work soon, but if I understood @BrucePay correctly, he only said, that "quoting" (e.g. https://github.com/PowerShell/PowerShell/issues/3734) should just work as expected.

The examples you mention, are about emulating POSIX-Shell or bash-specific behavior. Although I would really like powershell being able to emulate bash (and dash and cmd.exe) and even thought about writing an RFC myself, I don't believe such functionality will ever be built into powershell.

The stop-parsing-symbol (--%) is not any kind of cmd.exe-emulation:
AFAIK it just stops the parser from doing anything and passes all content of the line to the executable.
As this also stops interpreting variables, one wouldn't be able to pass any content dynamically, so they added the Windows-typical environment-variable replacement (%variable%).
This environment-variable replacement is however nothing special of cmd.exe -- it's for example also used in Win+R dialog and for REG_EXPAND_SZ registry keys. There's an API-Function, that does this replacement.

As mentioned, --% does not emulate cmd.exe: Replacement operations, like %envVar:old=new% don't work, the cmd escape character ^ can't escape anything and no cmd operators like & or such are supported.

The only reason, why --% acts similar to cmd, is because cmd did very little parsing and basically just passed the whole commandline to the executable. When --% is used, powershell also doesn't do any parsing, so it behaves similar to cmd.

One more reason not to change --% on *nix:
It is AFAIK the only option to safely pass quotes (") to child processes and currently works on windows and linux. I think, we should not break this.

@TSlivede:

I don't believe such functionality will ever be built into powershell.

And I think that's a good thing: PowerShell should _not_ try to become Bash (which has a lot of warts not worth contracting).

My point was: Just like --% provides an escape route for users stuck in the cmd.exe world, there should be one for Bash (POSIX-like shell) users - whether that symbol will be "localized" to --$ or not is a secondary issue.

The stop-parsing-symbol is not any kind of cmd.exe-emulation:

Yes, it is. The very point of --% was to allow people people to execute command lines they were used to executing from cmd.exe / batch files - even if that emulation is _incomplete_, as you demonstrate.

Your parameter-expansion example aside (to put in POSIX terms), the emulation was always limited to a _single_ command, so involving control operators such as & or | was never meant to be supported.

Yes, it is. The very point of --% was to allow people people to execute command lines they were used to executing from cmd.exe / batch files

In that case, it was not implemented very well in my opinion... (advanced variable substitution, escape symbol (^), I can't even get output redirection (>) to work, but maybe that's my fault...)

However I still don't think --% should be changed -- please don't break the (until now) only reliable way to pass quotes.

I also don't think we need a special bash emulation (or bash call or cmd emulation) syntax. Whats wrong with bash -c '...'?
If the problem is the necessary quoting/escaping, I think this problem can be splitt into https://github.com/PowerShell/PowerShell/issues/1995 and maybe the need for some easier (maybe single line?) here string syntax...

However still don't think --% should be changed -- please don't break the (until now) only reliable way to pass quotes.

My suggestion was to implement --% _analogously_ on _Unix_ platforms, not to change the way it works on _Windows_.

What's wrong with bash -c '...'?

Nothing per se, but it does add another circle to Quoting Hell - which, I presume, people who need --% in the first place are likely to end up in.

Here-_documents_ _do_ work:

bash -c @'
echo a\'b"`date`"
'@

but, again, not easy on beginners.

As for a _single-line_ solution: I think something like --% (or --$) is probably as simple as it gets.

But for beginners even discovering --% is a challenge... - see #1761.

My suggestion was to implement --% analogously on Unix platforms, not to change the way it works on Windows.

But also on Unix --% is currently the only reliable and portable way to pass " to a child command -- this will hopefully change soon.
If --% is changed, old scripts, for example ported from windows (or scripts, that need to run on multiple versions of powershell), will break, therefore I'd strongly prefer a different symbol than --%.

Quoting Hell

Yes, that's a problem, therefore I suggested the single line here-string (for example @@' ... '@@ or similar, must be well documented, maybe suggested in the documentation near to --%) -- a three line here-doc just looks a little bit strange, although I don't know if it's worth to add a new syntax...

But also on Unix --% is currently the only reliable and portable way to pass " to a child command

No, unfortunately, it is _broken_ on Unix, as my first post tried to demonstrate:

> /bin/echo --% 'hi, mom'
'hi, mom'

As you can see, the single quotes were _preserved_, which is not what you get when you execute the same command (sans --%) in a POSIX-like shell:

$ /bin/echo 'hi, mom'
hi, mom

Passing parameters (arguments) works very differently on Unix, where it is the _shell_ that provides the argument parsing, not the _target_ command.

I haven't read through all the comments on the related issues and your RFC in full yet, but it's clear to me that there is no need to _reconstruct_ a _command-line string_ on _Unix_ after PowerShell has done its own parsing - instead, pass the PowerShell-parsed/expanded arguments as-is to the underlying system function (e.g., execv()), as an array of _literal_ tokens.

It is only the anarchy that is _Windows_ command-line handling that requires the - brittle - reconstruction of a _single-string command line_ behind the scenes.

In POSIX-like shells on Unix, it is the _shell_ that provides said preprocessing and then passes an _array_ of _as-is_ arguments - after having performed quote removal to remove the _shell-relevant-only_ _syntactical_ quotes.

I haven't considered all potential backward-compatibility issues, but if there weren't any, that's undoubtedly the way to go.

I wouldn't consider --% broken: It behaves equally to windows, and as the normal calling is broken (like you correctly noticed in https://github.com/PowerShell/PowerShell/issues/3734) some garbage like

function Run-Native($command) {
    $env:commandlineargumentstring=($args | %{'"'+ ($_ -replace '(\\*)"','$1$1\"' -replace '(\\*)$','$1$1') + '"'}) -join ' ';
    & $command --% %commandlineargumentstring%
}

Run-Native .\echoargs.exe 'A "B' 'A" B'

is the only reliable way that works in all powershell versions on linux and windows (only v3 and above).

Passing parameters (arguments) works very differently on Unix, where it is the shell that provides the argument passing, not the target command.
I haven't read through all the comments on the related issues and your RFC in full yet, but it's clear to me that there is no need to reconstruct a command-line string on Unix after PowerShell has done its own parsing - instead, pass the PowerShell-parsed/expanded arguments as-is to the underlying system function (e.g., execv()), as an array of literal tokens.
It is only the anarchy that is Windows command-line handling that requires the - brittle - reconstruction of a single-string command line behind the scenes.
In POSIX-like shells on Unix, it is the shell that provides said preprocessing and then passes an array of as-is arguments - after having performed quote removal to remove the shell-relevant-only syntactical quotes.

I completely agree, although I don't see a big problem with the construction of one argument string -- but it's just essential, that the commandline is splitt into the same strings it was previously constructed from. The current behavior is fatal. Ok, skipping construction of the single string and splitting afterwards would improve performance, but that's the only reason I see. Skipping this joining and splitting would require different code branches for linux and windows and require p/invoke on linux AFAIK.

One more reason why --% isn't broken: https://github.com/PowerShell/PowerShell-Docs/blob/staging/reference/6/About/about_Parsing.md states, that

PowerShell treats the remaining characters in the line as a literal. The only interpretation it performs is to substitute values for environment variables that use standard Windows notation, such as %USERPROFILE%

It does not state, how that literal line is split into multiple tokens on Unix -- And on windows it doesn't need to split the literal, as the child executable gets this as one string.

Back to the topic -- Although I think --% should keep working, I thought again about something like --$,
that also stops parsing, but then passes the remaining line as argument to bash -c ....

Such a syntax would help bash users when they start using powershell, this way helping to aquire new users and therefore maybe actually worth the new syntax addition.
(Although I absolutely don't like the syntax of --% and think it should never have been added the way it is.)

There is no disagreement except that I said --% is broken on _Unix_ - you may choose to call this _inapplicable_ instead, which is fine (I think that was @BrucePay's point) - it was, after all, devised for _Windows_.

And adapting it to work _analogously_ on _Unix_ was the whole point of creating this issue.

If the answer is: _don't use this on Unix_, that's fine - if so, it deserves _documenting_ and possibly a _warning_ at parsing / runtime.

I don't see a big problem with the construction of one argument string

I _do_ see a big problem with that: _Don't assume how the target program will interpret its arguments_. The fact that different programs do so differently is one of the existing pain points on Windows - see this SO answer of mine.

different code branches for linux and windows and require p/invoke on linux AFAIK.

That's perfectly appropriate, _if_ feasible (I don't know enough to comment).

you may choose to call this inapplicable

If the answer is: don't use this on Unix, that's fine - if so, it deserves documenting and possible a warning at parsing / runtime.

I completely agree with both.

I do see a big problem with that: Don't assume how the target program will interpret its arguments.

Yes, that's the pain on Windows. On Unix however, the commandline is always split by the .Net Core framework -- if the commandline is build compatible to the Unix .Net Core version of Process.Start / ProcessStartInfo.Arguments, the arguments will always reach the child executable correctly.

BTW - nice answer on SO

@PowerShell/powershell-committee reviewed this. due to inherent differences between Windows and Unix on argument processing, --% was originally designed for Windows and we don't want different behavior for this switch on Windows on Unix. We are looking at making changes to improve native command arg processing on both Windows and Unix so that --% isn't required (where possible)

Was this page helpful?
0 / 5 - 0 ratings