Powershell: Bash-style string manipulation

Created on 8 May 2019  路  3Comments  路  Source: PowerShell/PowerShell

Bash has a nice in-string DSL for substitutions and manipulations such as:

> str=abcABC123ABCabc

# Strip out shortest match between 'a' and 'C'.
> echo ${str#a*C}
123ABCabc

These are detailed here: https://www.tldp.org/LDP/abs/html/string-manipulation.html

Some of these might not apply to PowerShell because of methods already on [string] or because they're not useful enough to justify what's probably a breaking change. But worth considering.

Issue-Enhancement WG-Language

Most helpful comment

Thanks for the clarification, @rjmholt, but note that _parameter_ is simply the name that POSIX-like shells give to _both_ arguments and variables - the former are _positional_ parameters (e.g, $1), the latter are _named_ parameters (e.g, $var), and the parameter expansion works equally with both.

In other words: your example _is_ an instance of _parameter expansion_, as are the features in the document you link to.

To help with fleshing out what features PowerShell may benefit from, here's the list (from http://mywiki.wooledge.org/BashGuide/Parameters#Parameter_Expansion):

Syntax | Description
-- | --
${parameter:-word} | Use Default Value. If 'parameter' is unset or null, 'word' (which may be an expansion) is substituted. Otherwise, the value of 'parameter' is substituted.
${parameter:=word} | Assign Default Value. If 'parameter' is unset or null, 'word' (which may be an expansion) is assigned to 'parameter'. The value of 'parameter' is then substituted.
${parameter:+word} | Use Alternate Value. If 'parameter'聽is null or unset, nothing is substituted, otherwise 'word' (which may be an expansion) is substituted.
${parameter:offset:length} | Substring Expansion. Expands to up to 'length' characters of 'parameter' starting at the character specified by 'offset' (0-indexed). If ':length' is omitted, go all the way to the end. If 'offset' is negative (use parentheses!), count backward from the end of 'parameter' instead of forward from the beginning. If 'parameter' is @ or an indexed array name subscripted by @ or *, the result is 'length' positional parameters or members of the array, respectively, starting from 'offset'.
${#parameter} | The length in characters of the value of 'parameter' is substituted. If 'parameter' is an array name subscripted by @ or *, return the number of elements.
${parameter#pattern} | The 'pattern' is matched against the聽beginning聽of 'parameter'. The result is the expanded value of 'parameter' with the shortest match deleted.聽If 'parameter' is an array name subscripted by @ or *, this will be done on each element. Same for all following items.
${parameter##pattern} | As above, but the聽longest聽match is deleted.
${parameter%pattern} | The 'pattern' is matched against the聽end聽of 'parameter'. The result is the expanded value of 'parameter' with the shortest match deleted.
${parameter%%pattern} | As above, but the聽longest聽match is deleted.
${parameter/pat/string} | Results in the expanded value of 'parameter' with the聽first聽(unanchored) match of 'pat' replaced by 'string'. Assume null string when the '/string' part is absent.
${parameter//pat/string} | As above, but聽every聽match of 'pat' is replaced.
${parameter/#pat/string} | As above, but matched against the聽beginning. Useful for adding a common prefix with a null pattern:聽"${array[@]/#/prefix}".
${parameter/%pat/string} | As above, but matched against the聽end. Useful for adding a common suffix with a null pattern.

A few thoughts:

  • I think we can ignore ${#parameter}, given that $var.Length can be used on string variables (and arrays, though it's better to use .Count for collections).

  • The ${parameter:-word}, ${parameter:=word} and ${parameter:+word} features could conceivably be covered by ternary conditionals and null-conditional operators - see #3239 and #3240

  • ${parameter//pat/string} is covered by -replace, but variant ${parameter/pat/string}, which replaces only the _first_ occurrence, has no counterpart.

    • While -replace offers more flexibility through use of regexes, _literal_ use is currently cumbersome ([regex]::Escape()), so we could think about either a new operator or a syntax for verbatim regexes / wildcard patterns; related discussion: #9308
  • Arguably, -replace also has the various prefix (#) and suffix (%) stripping/substitution features covered.

  • That leaves us with ${parameter:offset:length} for positional _substring_ extraction, which would be nice to have.

    • For _arrays_, the existing [...] syntax is already pretty flexible, though there's room for improvement - see #7940 and #7928
    • With the above indexing improvements, we could consider borrowing the array indexing syntax along the lines of:
$var = 'abcde'
# Wishful thinking
${var[1..]} # -> 'bcde'; equivalent of: ${var:1}
${var[1:2]} # -> 'bc'; equivalent of: ${var:1:2}
${var[-2..]} # -> 'de'; equivalent of: ${var: -2}
${var[1..@-1]} # -> 'bcd'; equivalent of: ${var:1:-1}

All 3 comments

The perhaps not so obvious name of this feature is _shell parameter expansion_.
While the feature is POSIX-mandated, Bash implements several extensions: http://mywiki.wooledge.org/BashGuide/Parameters#Parameter_Expansion has a concise overview, and the manual page is at https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion

Generally, while parameter expansion is definitely convenient and concise, on the flip side the syntax is cryptic and hard to remember.

At least technically it is definitely a breaking change, because # is a legal character in PowerShell variable names; while perhaps unlikely that such names are used in the wild, the likelihood increases with the several other characters used in the various parameter-expansion features, namely, -, =, :, ?, +, !, @, /.

Yes I deliberately avoided conflating this with parameter expansion, since despite overlap in bash, there's no good reason for PowerShell to relate them and PowerShell arguably handles parameters much better.

My intent here is to discuss whether bash's string manipulation functionality is useful enough to consider implementing in PowerShell (be it as an in-string DSL, in string formatting, or as script methods).

I actually opened the issue after @JamesWTruher sent me a very old list of feature ideas for Monad. This and &&/|| are the only things left on it that aren't in PowerShell.

Thanks for the clarification, @rjmholt, but note that _parameter_ is simply the name that POSIX-like shells give to _both_ arguments and variables - the former are _positional_ parameters (e.g, $1), the latter are _named_ parameters (e.g, $var), and the parameter expansion works equally with both.

In other words: your example _is_ an instance of _parameter expansion_, as are the features in the document you link to.

To help with fleshing out what features PowerShell may benefit from, here's the list (from http://mywiki.wooledge.org/BashGuide/Parameters#Parameter_Expansion):

Syntax | Description
-- | --
${parameter:-word} | Use Default Value. If 'parameter' is unset or null, 'word' (which may be an expansion) is substituted. Otherwise, the value of 'parameter' is substituted.
${parameter:=word} | Assign Default Value. If 'parameter' is unset or null, 'word' (which may be an expansion) is assigned to 'parameter'. The value of 'parameter' is then substituted.
${parameter:+word} | Use Alternate Value. If 'parameter'聽is null or unset, nothing is substituted, otherwise 'word' (which may be an expansion) is substituted.
${parameter:offset:length} | Substring Expansion. Expands to up to 'length' characters of 'parameter' starting at the character specified by 'offset' (0-indexed). If ':length' is omitted, go all the way to the end. If 'offset' is negative (use parentheses!), count backward from the end of 'parameter' instead of forward from the beginning. If 'parameter' is @ or an indexed array name subscripted by @ or *, the result is 'length' positional parameters or members of the array, respectively, starting from 'offset'.
${#parameter} | The length in characters of the value of 'parameter' is substituted. If 'parameter' is an array name subscripted by @ or *, return the number of elements.
${parameter#pattern} | The 'pattern' is matched against the聽beginning聽of 'parameter'. The result is the expanded value of 'parameter' with the shortest match deleted.聽If 'parameter' is an array name subscripted by @ or *, this will be done on each element. Same for all following items.
${parameter##pattern} | As above, but the聽longest聽match is deleted.
${parameter%pattern} | The 'pattern' is matched against the聽end聽of 'parameter'. The result is the expanded value of 'parameter' with the shortest match deleted.
${parameter%%pattern} | As above, but the聽longest聽match is deleted.
${parameter/pat/string} | Results in the expanded value of 'parameter' with the聽first聽(unanchored) match of 'pat' replaced by 'string'. Assume null string when the '/string' part is absent.
${parameter//pat/string} | As above, but聽every聽match of 'pat' is replaced.
${parameter/#pat/string} | As above, but matched against the聽beginning. Useful for adding a common prefix with a null pattern:聽"${array[@]/#/prefix}".
${parameter/%pat/string} | As above, but matched against the聽end. Useful for adding a common suffix with a null pattern.

A few thoughts:

  • I think we can ignore ${#parameter}, given that $var.Length can be used on string variables (and arrays, though it's better to use .Count for collections).

  • The ${parameter:-word}, ${parameter:=word} and ${parameter:+word} features could conceivably be covered by ternary conditionals and null-conditional operators - see #3239 and #3240

  • ${parameter//pat/string} is covered by -replace, but variant ${parameter/pat/string}, which replaces only the _first_ occurrence, has no counterpart.

    • While -replace offers more flexibility through use of regexes, _literal_ use is currently cumbersome ([regex]::Escape()), so we could think about either a new operator or a syntax for verbatim regexes / wildcard patterns; related discussion: #9308
  • Arguably, -replace also has the various prefix (#) and suffix (%) stripping/substitution features covered.

  • That leaves us with ${parameter:offset:length} for positional _substring_ extraction, which would be nice to have.

    • For _arrays_, the existing [...] syntax is already pretty flexible, though there's room for improvement - see #7940 and #7928
    • With the above indexing improvements, we could consider borrowing the array indexing syntax along the lines of:
$var = 'abcde'
# Wishful thinking
${var[1..]} # -> 'bcde'; equivalent of: ${var:1}
${var[1:2]} # -> 'bc'; equivalent of: ${var:1:2}
${var[-2..]} # -> 'de'; equivalent of: ${var: -2}
${var[1..@-1]} # -> 'bcd'; equivalent of: ${var:1:-1}
Was this page helpful?
0 / 5 - 0 ratings