PS /home/user> '11111x22222y33333'.Split('xy')
11111x22222y33333
PS C:\Users\user> '11111x22222y33333'.Split('xy')
11111
22222
33333
Same issue on both pwsh 6.2.4 and 7.0.0-rc2
It does work well on Powershell 5.1
PS /home/user> '11111x22222y33333' -Split '[xy]'
11111
22222
33333
PS /home/user> $PSVersionTable
Name Value
---- -----
PSVersion 7.0.0-rc.2
PSEdition Core
GitCommitId 7.0.0-rc.2
OS Linux 4.15.0-76-generic #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020
Platform Unix
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
PS /Users/joelfrancis> "".Split
OverloadDefinitions
-------------------
string[] Split(char separator, System.StringSplitOptions options)
string[] Split(char separator, int count, System.StringSplitOptions options)
string[] Split(Params char[] separator)
string[] Split(char[] separator, int count)
string[] Split(char[] separator, System.StringSplitOptions options)
string[] Split(char[] separator, int count, System.StringSplitOptions options)
string[] Split(string separator, System.StringSplitOptions options)
string[] Split(string separator, int count, System.StringSplitOptions options)
string[] Split(string[] separator, System.StringSplitOptions options)
string[] Split(string[] separator, int count, System.StringSplitOptions options)
Looking at the overload definitions above, there is an overload for string[]
as the separator, which will be preferred over the char[]
overloads. The string overloads treat the input string as a single unit, whereas the char overloads would split on each character as you expect.
You can also force the correct overload with $string.Split([char[]]'xy')
or $string.Split('xy'.ToCharArray())
Ok. I learned something new. But are you saying this change from 5.1 is expected? Or this is a bug?
On 5.1 the overload def looks like this:
PS C:\Users\user> "".Split
OverloadDefinitions
-------------------
string[] Split(Params char[] separator)
string[] Split(char[] separator, int count)
string[] Split(char[] separator, System.StringSplitOptions options)
string[] Split(char[] separator, int count, System.StringSplitOptions options)
string[] Split(string[] separator, System.StringSplitOptions options)
string[] Split(string[] separator, int count, System.StringSplitOptions options)
That I'm less sure of. There definitely are different overload definitions available in .NET Core now, but I'm not really sure which the method binder would be preferring over another, or for what reason...
Looking at the documentation for the method, I'd guess that this overload is being chosen instead:
https://docs.microsoft.com/en-us/dotnet/api/system.string.split?view=netcore-3.1#System_String_Split_System_String_System_StringSplitOptions_
This is a new overload since the WinPS days, so I guess you could say it's somewhat by design, but ultimately I think the change in behaviour is not entirely intentional from the PS team. It's moreso just that the method binder is choosing the best available method that matches the arguments at runtime, and in recent versions of .NET Core, there is a more appropriate method overload available (one using string, rather than char[], so it's a closer match).
Returns a string array that contains the substrings in this instance that are delimited by elements of a specified string or Unicode character array.
There used to be an issue where @lzybkr provided a helpful explanation, but it appears to be gone; the gist of it was:
A new overload added to a .NET method may cause the PowerShell engine to select it in situations where it previously selected a _different_ overload, due to the new overload now being a _better_ fit, which is what happened here:
During overload resolution, the engine _can_ map a [string]
input to a [char[]]
method parameter, but if a [string]
parameter is (now) also present, it is chosen first.
This is an _unavoidable_ consequence of PowerShell being a _late-bound_ language, and it is why you should generally prefer PowerShell-native solutions to .NET method calls (-split
vs. .Split()
, for instance).
The - cumbersome and possibly non-obvious - alternative is to match the method signature _precisely_, using _casts_, which is the only way to guarantee longterm stability:
# OK even in PS Core, due to exact type match
'11111x22222y33333'.Split([char[]] 'xy')
@SteveL-MSFT
Please Reopen! This is not resolved yet.
Actual code use for repo:
foreach ($item in "1x2y3".Split("{'x', 'y'}"))
{ Write-Host($item); };
In Windows PowerShell, the following works as expected:
PS C:\Users\max_t> $PSVersionTable
Name Value
---- -----
PSVersion 5.1.19041.1
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.19041.1
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
PS C:\Users\max_t> foreach ($item in "1x2y3".Split("{'x', 'y'}"))
>> { Write-Host($item); };
1
2
3
PS C:\Users\max_t>
In PowerShell 7.0.3, is not providing the correct default:
PS C:\Users\max_t> $PSVersionTable
Name Value
---- -----
PSVersion 7.0.3
PSEdition Core
GitCommitId 7.0.3
OS Microsoft Windows 10.0.19041
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
PS C:\Users\max_t>
PS C:\Users\max_t> foreach ($item in "1x2y3".Split("{'x', 'y'}"))
>> { Write-Host($item); };
1x2y3
PS C:\Users\max_t>
In PowerShell Preview 7.1.0 rc.1, is not providing the correct results:
PS C:\Users\max_t> $PSVersionTable
Name Value
---- -----
PSVersion 7.1.0-rc.1
PSEdition Core
GitCommitId 7.1.0-rc.1
OS Microsoft Windows 10.0.19041
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
PS C:\Users\max_t>
PS C:\Users\max_t> foreach ($item in "1x2y3".Split("{'x', 'y'}"))
>> { Write-Host($item); };
1x2y3
PS C:\Users\max_t>
See #13726. This was not a regression in PowerShell, it's the result of an enhancement in .NET.
As is mentioned in that issue, if you need to force a certain overload, you can use a type cast.
@vexx32
Thanks! I just wanted to make sure.
Most helpful comment
There used to be an issue where @lzybkr provided a helpful explanation, but it appears to be gone; the gist of it was:
A new overload added to a .NET method may cause the PowerShell engine to select it in situations where it previously selected a _different_ overload, due to the new overload now being a _better_ fit, which is what happened here:
During overload resolution, the engine _can_ map a
[string]
input to a[char[]]
method parameter, but if a[string]
parameter is (now) also present, it is chosen first.This is an _unavoidable_ consequence of PowerShell being a _late-bound_ language, and it is why you should generally prefer PowerShell-native solutions to .NET method calls (
-split
vs..Split()
, for instance).The - cumbersome and possibly non-obvious - alternative is to match the method signature _precisely_, using _casts_, which is the only way to guarantee longterm stability: