Powershell: New Feature: Commands for string methods & operators on the pipeline

Created on 21 Apr 2018  路  16Comments  路  Source: PowerShell/PowerShell

Introduction

One of the things I have seen that always bugged me in PowerShell was the fact, that string operators cannot be used on a pipeline. A simple example:

(2,3,1,4 | Sort-Object) -join "."

Not convenient, especially for interactive console use where you often need to go back and add the braces. The default string operators obviously aren't cmdlets and won't work on the pipeline - instead they are fast.

In one of my modules (PSUtil) I have functions that emulate this functionality:

2,3,1,4 | sort | join "."

However I think this would make sense as part of the default command set, as virtually every user of PowerShell has to handle strings.

Weighing the scales

Advantages

  • More convenient to use for end user on the pipeline
  • Additional suitable features can be added through parameters (See implementation proposal below)

Disadvantages

  • Clobber list of commands available by default with more commands.

On Implementation

I am willing to implement all commands involved, their tests and documentation, all by myself, if this feature is deemed worthwhile.

Proposed commands & features

Add-String (Alias: 'wrap')

Does not have an operator equivalent. Command that allows you to easily add to a string on the pipeline.

Example

PS> 1..4 | wrap '"' '"'

"1"
"2"
"3"
"4"

Parameters

  • InputString (string[] | Pipeline), the string(s) being added to
  • Before (string), the string to add before the input
  • Behind (string), the string to add behind the input
  • PadLeft (char), character to pad left with
  • PadRight (char), character to pad right with
  • PadWidth (int), up to how many characters the string should be padded with

Format-String (Alias: 'format')

Equivalent to -f. Command that allows you to use the format operator on the pipeline.

Example

PS> 1..4 | format "{0:N2} - {1:D3}" -Count 2

1,00 - 002
3,00 - 004

Parameters

  • InputObject (object | Pipeline), objects to be formatted
  • Format (string), the format definition
  • Count (int), the number of items to store up before formating them in bulk (optional, default 1)

Get-Substring

Allows trimming and picking a substring from specified strings.

Example

PS> "abc def ghi" | substring -trim "abi"

c def gh

PS> "abc def ghi" | substring 2 4

c de

Parameters

  • InputString (string[] | Pipeline), the strings to pick from
  • Trim (string), what characters to trim
  • TrimStart (string), what characters to trim at the start of the string
  • TrimEnd (string), what characters to trim at the end of the string
  • Start (int), the start index to pick the substring from
  • Length (int), how long a substring to pick

Join-String (Alias: 'join')

Equivalent to -join. Command that allows you to join items on the pipeline

Example

PS> 1..4 | join ","

1,2,3,4

PS> 1..4 | join "," -Count 2

1,2
3,4

Parameters

  • InputString (string[] | Pipeline), the strings to join
  • Separator (string), what string to join them with. Defaults to ([System.Environment]::NewLine)
  • Count (int), the number of items to join together. (Defaults to all items)

Set-String ('replace')

Equivalent to the -replace operator _and_ the .Replace() string method

Example

PS> "abc def ghi" | replace "d\w+" "zzz"

abc zzz ghi

PS> "abc def ghi" | replace "d\w+" { 1..4 | join "." }

abc 1.2.3.4 ghi

PS> "abc (def) ghi" | replace "(def)" "def" -DoNotUseRegex

abc def ghi

Parameters

  • InputString (string[] | Pipeline), the strings to replace within
  • OldValue (string), what sequence to replace
  • NewValue (object), what to replace with. Can be a string or a scriptblock
  • DoNotUseRegex (switch), switches from regex replace to using the string method Replace()
  • Options (RegexOptions), the regex options to use on replace, defaults to IgnoreCase

Split-String ('split')

Equivalent to the -split operator _and_ the .Split() string method

Example

PS> "abc def ghi" | split " d\w+ "

abc
ghi

PS> "abc def ghi | split " d\w+ " -DoNotUseRegex

abc

ef
ghi

Parameters

  • InputString (string[] | Pipeline), the strings to split
  • Separator (object), what to split with.
  • DoNotUseRegex (switch), switches from regex split to using the string method Split()
  • Options (RegexOptions), the regex options to use on split, defaults to IgnoreCase
  • Count (int), the maximum number of items to split into (equivalent to -split ".",2)

Concluding

I believe these commands to improve the interactive console experience of most users, without having significant drawbacks and am perfectly willing to do the implementation myself, pending the approval of the court of public opinion :)

Opinions / comments / refinements, anybody?

2018-04-27 - Updated parameter names, replaced Remove-String (trim) with Get-SubString

Area-Cmdlets-Utility Issue-Enhancement

Most helpful comment

I'd prefer to have one power cmdlet for string manipulations in pipes then many ones. For Windows PowerShell I see hundreds of modules on my system and thousands of cmdlets. Finding the right cmdlets becomes a serious problem and adding new cmdlet bundles does not make the situation easier.

I agree with the drawback of command inflation being a consideration.

Still I disagree with rolling them into a single, big command. Frankly, I had hoped the times of netsh style syntax trees are over. Admittedly, this wouldn't be _that_ bad, but the usability would ... suck. To the point of defeating most of the purpose and value I see in this update.
Given the simplicity of the naming, I doubt discoverability of other commands will be affected though.

All 16 comments

I already implemented my own join-item, so I definitely saw the use case for it!

We could combine them all into one cmdlet:
Convert-String -Join
Convert-String -Split
Convert-String -Replace

Technically possible, but would make it a lot more of a pain to maintain if convenience of use with aliases mimicking the operators is to be maintained (and convenience is one of the core benefits here).

Furthermore, it would be less useful from a discovery perspective:

Get-Command *-String

This would list all of them per action. With Convert-String this would pretty much require previous knowledge (in which case discovery is really a moot point). Add in that it would violate the definition of the convert verb and I'd argue against fusing them into a single command, under that verb or any other.

Cheers,
Fred

Link to definition, in case somebody wants to look it up:
https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.verbsdata.convert?redirectedfrom=MSDN&view=powershellsdk-1.1.0

We have several discussions aliases. Aliases greatly depend on personal preferences and create conflicts. Most likely we will delete them all #5870 and allow users to add the ones they want. So I wouldn't consider aliases here at all.

Discovery perspective is nothing. User discovery process is "How I can do this?" not "What?".
Having Convert-String you free to create any aliases or helper functions in your profile.

Re Convert. We have another definition Get-Verb convert | ft -Wrap in PowerShell Core. In any case PowerShell Committee should make the conclusion.

In any case PowerShell Committee should make the conclusion.

Oh absolutely - I prefer having a discussion on it first, in order to inspire that conclusion, but there's no question about who has the final voice on it :)

Re Convert. We have another definition Get-Verb convert | ft -Wrap in PowerShell Core

Thanks for pointing that out - hadn't noticed the description is built in there 馃憤 . Still, this wouldn't be a change of representation of the same data - it would be a change to that data, so I still claim the verb would not fit.

Discovery perspective is nothing. User discovery process is "How I can do this?" not "What?".

Having spent copious hours digging for "What?", having discussed it with other users at conferences and user groups and spending quite a bit of time teaching, my experience is that both happens. So I really disagree with a blank dismissal of one side of the process.

We have several discussions aliases. Aliases greatly depend on personal preferences and create conflicts. Most likely we will delete them all #5870 and allow users to add the ones they want. So I wouldn't consider aliases here at all.

Hm, can't say I fully agree with that position, but the solution @jaykul proposed makes sense (shipping them as/within automatically included modules, rather than core). In that context I totally _would_ consider aliases, with the mid-term goal of having them as part of one of those modules (Whether they are created right away or introduced as part of that implementation). Given the design goal of convenience, I believe aliases are certainly a justified part of the discussion/deliberations ( _especially_ in cases where the functionality directly maps a string operator).

@iSazonov Cmdlets are much more discoverable (see get-help *string*) than operators and parameter sets (though perhaps that's just because of the way we present them.) I don't see convert(to) being the right verb since it's usually convertto a format and joining a string isn't a format per se. Finally aliases are an important part of the shell experience so simply getting rid of them seems undesirable.

@FriedrichWeinmann I really like this proposal. I've always thought that we missed the boat by not providing core data manipulation commands. I started similar commands back in v1 but they didn't make it into the product. Over the years, we've periodically revisited them (see Join-Object) but again, they didn't get into the product. It would be nice to actually do it this time :-)

@powercode Would you be willing to contribute your Join-Item (though perhaps renamed Join-Object) ? I've seen a number of requests for this type of functionality on StackOverflow.

Other notes:

Split-String should probably have a -Count parameter like the -Split operator.
What about addressing .Substring() scenarios? Maybe have -start and -count parameters on split-string?

Heya Bruce, thank you for the encouragement :)
Also thank you for that -Count parameter that I missed when assembling the set 馃憤

Regarding SubString():
I'm not ... quite comfortable hitching that to Split-String since that verb kind of implies turning one into multiple objects. I'd be more comfortable with Remove-String, but that one already has -Start occupied with a clearly named precedence on the trim side.
How about placing it there and adding a second parameterset for substring, with the following names:
麓-SubStringStart(int, alias:-ss) -SubStringLength(int, alias:-sl`)

The thing about Remove is that it's a perspective i.e. what you want to remove instead of what you want to keep:

Remove-String -First 3  -Last 10

This certainly works for the trim scenario but for the positive perspective, how about Get-Substring? Different noun but still discoverable when looking for strings.

Get-Substring -Start 4 -Length 5

Of course this could also work with -Trim, -TrimStart, -TrimEnd. Or even -Remove taking a string, regex, or scriptblock.

Get-Substring -Remove {$_ -eq ' '}

I like the Get-Substring idea!
Also agree with rolling in the trim commands (basically removing Remove-String)

Don't think the -Remove parameter would make it though, considering what you can do with replace (and how it's already being used for this)
Would probably be also somewhat counter-intuitive for those used to the string method.

@BrucePay

Cmdlets are much more discoverable (see get-help string) than operators and parameter sets (though perhaps that's just because of the way we present them.)

We should improve this if we see bad UX. Seems we have some related Issues.

I'd prefer to have one power cmdlet for string manipulations in pipes then many ones. For Windows PowerShell I see hundreds of modules on my system and thousands of cmdlets. Finding the right cmdlets becomes a serious problem and adding new cmdlet bundles does not make the situation easier.

I'd prefer to have one power cmdlet for string manipulations in pipes then many ones. For Windows PowerShell I see hundreds of modules on my system and thousands of cmdlets. Finding the right cmdlets becomes a serious problem and adding new cmdlet bundles does not make the situation easier.

I agree with the drawback of command inflation being a consideration.

Still I disagree with rolling them into a single, big command. Frankly, I had hoped the times of netsh style syntax trees are over. Admittedly, this wouldn't be _that_ bad, but the usability would ... suck. To the point of defeating most of the purpose and value I see in this update.
Given the simplicity of the naming, I doubt discoverability of other commands will be affected though.

We should stay away from Join-Object in this context. These are string manipulations which is a very different use case. For Join-Object we want to combine 2 Objects together based on matching properties from each Object. That logic won't use any string functions like -join. A separate issue and RFC should be used for that discussion.

I would absolutely love to see PowerShell given more string manipulation features.

@iSazonov

I'd prefer to have one power cmdlet for string manipulations in pipes then many ones

Hiyo!

I might be misinterpreting things, but it seems like the idea of a variety of narrowly scoped, descriptive verb-noun commands stems from monad manifesto? From what I can tell, it's one of the reasons many find the language so approachable. I'll throw my vote in for independent functions/cmdlets any day.

Also - @FriedrichWeinmann awesome idea. Every time I find myself doing this at the CLI or in a script it feels obtuse and painful. Hope to see these or some of these built in : )

Cheers!

Alright, I've got a first version up and running:
Up for review in this PR: https://github.com/PowerShell/PowerShell/pull/6753

Notes on changes from original design:

  • Renamed a few parameters to be more in line with current naming practices
  • Renamed all instances of -Simple parameter with -DoNotUseRegex to be more self-explanatory about what they do.

Filed a proper RFC now, after being asked to on the PR linked above:
https://github.com/PowerShell/PowerShell-RFC/pull/127

Was this page helpful?
0 / 5 - 0 ratings