Powershell: Sort is incorrect for strings containing the '-' character

Created on 25 Mar 2017  路  9Comments  路  Source: PowerShell/PowerShell

From UserVoice https://windowsserver.uservoice.com/forums/301869-powershell/suggestions/18580849-bug-sort-is-incorrect-for-strings-containing-the

Steps to reproduce

"somefile1","somefile2","s-abc","s-little","s-foo","s-poo","s-wtf" | sort

Expected behavior

s-abc
s-foo
s-little
s-poo
s-wtf
somefile1
somefile2

Actual behavior

s-abc
s-foo
s-little
somefile1
somefile2
s-poo
s-wtf

Environment data

> $PSVersionTable
Name                           Value
----                           -----
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
PSVersion                      6.0.0-alpha
PSEdition                      Core
BuildVersion                   3.0.0.0
SerializationVersion           1.1.0.1
PSRemotingProtocolVersion      2.3
CLRVersion
WSManStackVersion              3.0
GitCommitId                    v6.0.0-alpha.17
Area-Cmdlets-Utility Issue-Enhancement

Most helpful comment

I agree that we could resolve the Issue by means of adding a new parameter to set a comparer options. It seems we should be more general then StringComparer.

All 9 comments

It looks as .Net issue (tested on Windows PowerShell and PowerShell Core):

PS C:\WINDOWS\system32> using namespace System.Collections.Generic
PS C:\WINDOWS\system32> $a = New-Object List[string]
PS C:\WINDOWS\system32> "somefile1","somefile2","s-abc","s-little","s-foo","s-poo","s-wtf" | % {$a.Add($_)}
PS C:\WINDOWS\system32> $a
somefile1
somefile2
s-abc
s-little
s-foo
s-poo
s-wtf
PS C:\WINDOWS\system32> $a.Sort()
PS C:\WINDOWS\system32> $a
s-abc
s-foo
s-little
somefile1
somefile2
s-poo
s-wtf

Then it is Windows issue, because for displaying purposes Windows sort filenames in the same order.

PS C:\WINDOWS\system32> [string]::Compare("som","s-l")
1
PS C:\WINDOWS\system32> [string]::Compare("som","s-m")
1
PS C:\WINDOWS\system32> [string]::Compare("som","s-p")
-1

From MSDN String.Compare Method:

Character sets include ignorable characters. The Compare(String,鈥係tring,鈥侭oolean) method does not consider such characters when it performs a culture-sensitive comparison. For example, if the following code is run on the .NET Framework 4 or later, a culture-sensitive, case-insensitive comparison of "animal" with "Ani-mal" (using a soft hyphen, or U+00AD) indicates that the two strings are equivalent.

Unicode Default_Ignorable_Code_Point

Best Practices for Using Strings in the .NET Framework

Based on this we should re-label the problem as internal.

@joeyaiello @stephentoub The Issue is internal. It seems we need PowerShell-Committee review.

Agree that this is not a bug, but a design choice.
However it is confusing behaviour to many who don't expect this, and is not the desirable behaviour in a number of use cases.

Amending this default behaviour would be a breaking change.
However, adding a parameter to allow users to define the sort behaviour, or adding some field to the property parameter's hash table would resolve this limitation without negatively affecting existing behaviour, and would help people realise that the current behaviour is the designed behaviour.

Proposal

The Property parameter accepts a collection of hash tables, where the hash table accepts keys Expression, Ascending and Descending.
Adding another key, SortOrderComparer, which takes a property of type IComparer would allow custom sort behaviour to be specified for each property. Thus to get the behaviour most people would expect, they could do something like this:

[string[]]$list = @("somefile1","somefile2","s-abc","s-little","s-foo","s-poo","s-wtf") 
$list | sort -Property @{Expression={$_}; SortOrderComparer=[System.StringComparer]::Ordinal}

I agree that we could resolve the Issue by means of adding a new parameter to set a comparer options. It seems we should be more general then StringComparer.

Great idea, @JohnLBevan.

I suggest also supporting Comparison<T> delegates as the SortOrderComparer value (polymorphically), so you can pass script blocks directly (e.g., { param([string]$x, [string]$y) <# return -1, 0, or 1 #> } and perhaps shortening SortOrderComparer to Comparer.

With a Comparer key present, Expression should be optional and default to the whole input object.

Was this page helpful?
0 / 5 - 0 ratings