Vscode: Align explorer sorting with platform sorting

Created on 31 May 2017 · 29Comments · Source: microsoft/vscode

It looks like our file sorting in the explorer does not match platform beahviour in some cases.

Windows:

a file foo.ts is sorted before foo_test.ts but we sort it the other way around

Linux:

a file foo.ts is sorted before foo_test.ts but we sort it the other way around
a lowercase file seems to be sorted before an upper case file but we seem to mix the sorting independent of the casing (e.g. folders [out, outb, outd, Outa, Outc] are showing up as [out, OutA, outb, Outc, outd]

macOS:

seems to be OK

We use a JavaScript Collator for the comparing here.

Unfortunately I am not able to tweak the Collator options to bring me the desired result...

feature-request file-explorer

Source

bpasero

👍22

Most helpful comment

I want sort order like GitHub repo. Please!

comerc on 13 Aug 2017

👍12 ❤1

All 29 comments

I want sort order like GitHub repo. Please!

comerc on 13 Aug 2017

👍12 ❤1

screen shot 2017-08-29 at 1 20 55 pm
I have some YML files whose names are based on GUIDs. They aren't even close to being alpha sorted.

vscode 1.15.1, macOS Sierra 10.12.6

NickWest-appuri on 29 Aug 2017

👍5

Linux: a lowercase file seems to be sorted before an upper case file

Umm... I think it is the other way around, upper case first. Basically, just sorting using the ASCII value of each character.

dlech on 8 Jan 2018

@dlech on Linux, at least, it depends on the chosen 'locale' and the environment variable LC_COLLATE can be used to influence this behaviour (this influences for example the ls command, ).

If you consider to implement / support platform specific behaviour you may want to consider evaluating the locale setting on Linux, specifically LC_COLLATE (if set, otherwise fallback to the set local).

user@host -- ~/tmp/casetest $ LC_COLLATE='en_GB.UTF-8' ls -1
2
5
a
A
Aa
aB
AZ
C
user@host -- ~/tmp/casetest $ LC_COLLATE='en_EN.UTF-8' ls -1
2
5
A
AZ
Aa
C
a
aB
user@host -- ~/tmp/casetest $ LC_COLLATE='C' ls -1
2
5
A
AZ
Aa
C
a
aB

wildcart on 10 May 2018

👍1

It's called shortlex, or lexiographic sort order. The only thing you need to tweak is the string length. If A is shorter than B then A is smaller than B. This is not going to be covered by any collation. An alternative to this is to introduce padding (padding of the sort so that the comparison is less) but I don't think it's reasonably to do that due to the extra garbage generated.

To be more specific.

For two strings a and b of unequal length, you take the Math.min(a.length, b.length) of both strings and compare that using whatever compare you like to use. If they are equal, i.e. c = 0 then you use the string length to finalize the sort order. i.e. if a and b shared a common prefix but a is shorter, then a is smaller. etc.

@bpasero something like this:

const a = one || "";
const b = other || "";

const minLen = Math.min(a.length, b.length);

const result = intlFileNameCollator
  .getValue()
  .collator.compare(a.substr(0, minLen), b.substr(0, minLen));

if (
  result === 0
) {
  if (a.length < b.length) {
    return -1;
  }
  if (b.length < a.length) {
    return +1;
  }
  return 0;
}

return result;

I dropped the collatorIsNumeric stuff because it just adds confusion.

leidegre on 21 Jan 2019

Shortlex orders primarily by length, the code presented implements a different ordering. The example:

aggregate.go
aggregate_registry.go
event.go
event_registry.go
README.md
rehydration.go

as shortlex order would be:

event.go
README.md
aggregate.go
rehydration.go
event_registry.go
aggregate_registry.go

Windows Explorer uses Natural Sort, because length is not sufficient for good order, as an example:

action_5_example.txt
action_10_ex.txt

egonelbre on 21 Jan 2019

@egonelbre then I misunderstood the meaning of shortlex, I should have just said lexicographic. but the code does what it is supposed to do. that is, first sort up to X characters, then use the length as a discriminator.

leidegre on 22 Jan 2019

Why complicate this though. Lexicographic (not shortlex as I incorrect first called) it easy to implement and understand. This is not going to get done if we insist on extra work to align with something which is highly Windows Explorer specific. That should not be the high water mark here.

leidegre on 22 Jan 2019

It's not really Explorer specific, you can read more about it in https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/. I mentioned it because your other issue showed that as an example.

The reason you don't want to use lexicographic first is due to the last example.

action_5_example.txt
action_10_ex.txt

Sorted lexicographically is:

action_10_ex.txt
action_5_example.txt

egonelbre on 22 Jan 2019

@egonelbre as a programmer, I don't care. But as a user of Windows Explorer, I could see why adhering to the natural sort order would seem more natural.

leidegre on 24 Jan 2019

I tried making this: https://github.com/microsoft/vscode/issues/75415 but was closed. There is currently no option to make the explorer sort native.

and no-one writes 5, 10 we write 05,10, this is a fundamental fact.

fenchu on 14 Jun 2019

👍4

vs code sorts like this,

log-10

I'd rather it sort like this,

log-2

It is very jarring for me when vs code does not sort lexicographically.

Almost every other tool I use sorts lexicographically. When vs code tries to be different, it just confuses me for a short moment. Repeatedly. And it adds up.

Like @fenchu said, if I wanted to sort by numeric values, I'd zero-pad those numbers to the desired length.

AnyhowStep on 1 Mar 2020

👍3

@bpasero I'm hoping you can advise and save me some time if this doesn't make sense or is unlikely to be accepted as a pull request.

I'm considering creating a pull request that would add a new setting - explorer.sortCaseSensitive with a default value of false.

I considered adding one or more options to the existing explorer.sortOrder setting, but case sensitivity seems to be orthogonal to those options - and (nearly) doubling the number of options to add case sensitive versions doesn't seem like the best idea:

'explorer.sortOrder': {
  'type': 'string',
  'enum': [SortOrder.Default, SortOrder.Mixed, SortOrder.FilesFirst, SortOrder.Type, SortOrder.Modified],
  'default': SortOrder.Default,
  'enumDescriptions': [
    nls.localize('sortOrder.default', 'Files and folders are sorted by their names, in alphabetical order. Folders are displayed before files.'),
    nls.localize('sortOrder.mixed', 'Files and folders are sorted by their names, in alphabetical order. Files are interwoven with folders.'),
    nls.localize('sortOrder.filesFirst', 'Files and folders are sorted by their names, in alphabetical order. Files are displayed before folders.'),
    nls.localize('sortOrder.type', 'Files and folders are sorted by their extensions, in alphabetical order. Folders are displayed before files.'),
    nls.localize('sortOrder.modified', 'Files and folders are sorted by last modified date, in descending order. Folders are displayed before files.')
  ],
  'description': nls.localize('sortOrder', "Controls sorting order of files and folders in the explorer.")
},

This change would only partially address this open issue, since it:

only addresses case sensitivity
doesn't default to align with platform case sensitivity

I think it would probably satisfy a lot of people though, and TBH I'm not sure that aligning with platform case sensitivity is the best option. For example, I noticed a number of comments on this and related issues were asking to align the file sort order with github, which always does a case sensitive sort.

Anyway, by providing a setting, people can choose, and by defaulting to the current behavior nobody will be affected by the change unless they want to be.

Also, if the future default behavior is changed, this setting will still be useful for people who want to override that default behavior.

What do you think? Should I go ahead?

If yes, is this the right issue to reference in the PR or should I create a separate issue that just links to this one?

Thanks in advance!

leilapearson on 14 Apr 2020

👍1

@leilapearson thanks for the offer.
Ideally we would just align explorer sorting with platform sorting without any option. @bpasero already provided a code pointer where he is doing the comparing.

If that is not possible only then we can look into adding more settings.

An alternative is to look to open this up to extensions and then extensions could control this and satisfy the 20 different sorting styles that users want.

isidorn on 15 Apr 2020

@isidorn thanks for the reply. That's why I asked before doing anything other than taking a look at the code.

Opening this to extensions is an interesting option. At the same time, I would still think that offering control over whether the sort is case sensitive or not should be a core option and not require an extension.

It isn't easy on some platforms to adjust the sort order - and having to figure out how to get your whole platform to sort case sensitive in order for VS Code to sort case sensitive seems a bit awkward? Especially if you primarily develop on one platform and only spend a bit of time developing on other platforms.

Also, I find that programming on a platform is a different context than using a platform for office work. Having different sort orders apply to the different contexts often makes sense.

For example, I tend to sort things by most recently modified when I'm working on documents and the like - so this is my default in file explorer and google docs. On the other hand, I don't want my code sorted by modification date and I'm happy with how things are sorted in my terminal - but unfortunately not so happy with how they are sorted in VS Code.

I do agree that too many settings can be a bad thing, but I'm curious if you agree or not that a setting to control case sensitivity would make sense regardless?

leilapearson on 15 Apr 2020

P.S. An example of how hard it can be to change the sort (collate) order on a platform is OSX - which doesn't expose any nice way to do that it seems:

https://apple.stackexchange.com/questions/34054/case-insensitive-ls-sorting-in-mac-osx

leilapearson on 15 Apr 2020

Ok, makes sense. I would be open to a lean and nice PR that controls if sorting is case sensitive or not.
Thanks

isidorn on 15 Apr 2020

Thanks @isidorn. Before I go ahead, a bit of extra context and one more question...

There are actually a total of 4 options for defining what "alphabetically" means. Per the ECMA 402 standard:

The sensitivity of collator is interpreted as follows:

base: Only strings that differ in base letters compare as unequal. Examples: a ≠ b, a = á, a = A.

accent: Only strings that differ in base letters or accents and other diacritic marks compare as unequal. Examples: a ≠ b, a ≠ á, a = A.

case: Only strings that differ in base letters or case compare as unequal. Examples: a ≠ b, a = á, a ≠ A.

variant: Strings that differ in base letters, accents and other diacritic marks, or case compare as unequal. Other differences may also be taken into consideration. Examples: a ≠ b, a ≠ á, a ≠ A.

NOTE In some languages, certain letters with diacritic marks are considered base letters. For example, in Swedish, "ö" is a base letter that's different from "o".

Instead of just exposing case sensitivity as a true or false option, I'm thinking it would be best to allow any of the 4 options. base would be the default value since that's what's hardcoded into the current code.

Any objection to offering all 4 options?

leilapearson on 19 Apr 2020

@leilapearson this makes sense. However to simplify this a bit I suggest the following:

we introduce a explorer.sortCaseSensitive setting with string values "on" and "off"
if users request for this other options we can add them later (since we sue a string setting this should not be a problem)

The reason why I prefer this solution is simplicity and I think it covers the 99% use case.
Let me know what you think.

isidorn on 20 Apr 2020

Sounds good. Thanks @isidorn .

leilapearson on 20 Apr 2020

Well @isidorn, that was a bit trickier than expected, but I think I have something that is almost ready to submit. I'll take one last look tomorrow to make sure I haven't missed anything.

Unfortunately the solution I was originally picturing didn't work. It turns out that grouping by case is a very different thing than comparing by case! :-)

There were also some special cases that I needed to adjust for - including the one that @leidegre pointed out. I can describe them next to the relevant code when I submit the PR.

Since the solution was different than I was imagining, the new setting is different than we discussed, but I think it's still simple to understand and use. Let me know if you have any concerns or comments.

Here's what it looks like now:

'explorer.sortOption': {
  'type': 'string',
  'enum': [SortOption.Numeric, SortOption.Upper, SortOption.Lower, SortOption.Mixed],
  'default': SortOption.Numeric,
  'enumDescriptions': [
    nls.localize('sortOption.numeric', 'Mixes uppercase and lowercase names together. Numbers are sorted numerically, not alphabetically.'),
    nls.localize('sortOption.upper', 'Groups uppercase names before lowercase names. Numbers are sorted alphabetically.'),
    nls.localize('sortOption.lower', 'Groups lowercase names before uppercase names. Numbers are sorted alphabetically.'),
    nls.localize('sortOption.mixed', 'Mixes uppercase and lowercase names together. Numbers are sorted alphabetically.')
  ],
  'description': nls.localize('SortOption', "Further specifies the file and directory sort order.")
}

I'm very happy to say that with this new setting and some small tweaks in the code I believe that the whole problem might be solved - or at least solved enough to satisfy most people.

Namely:

String comparisons use the platform locale
aggregate.go and aggregate_repo.go sort as expected
filenames that start with a dot (hidden files) sort as expected
between the existing SortOrder and the new SortOption setting, users should be able to emulate the most popular filename grouping options - including the one that github uses.

Whew!

leilapearson on 23 Apr 2020

❤2

Actually I just realized one more option would be good to add - namely a simple unicode sort.

It seems that the terminal on a Mac uses a simple unicode sort. The file explorer on a Mac uses a localized sort though.

I think a lot of people might want to match their sort order to their terminal, and the other options don't give you that.

It would be trivial to add a SortOption.Unicode to the list.

leilapearson on 23 Apr 2020

Sound good. Once you submit a PR feel free to ping me @isidorn on it and we can continue the discussion there. Thanks

isidorn on 23 Apr 2020

Perfect @isidorn . Thanks!

leilapearson on 24 Apr 2020

Any progress?

Sytten on 7 Jul 2020

@Sytten some sort order edge cases were addressed in #97200 and that change is available in vscode 1.46.0. See the PR for a detailed description.

I also have an open PR #97272 - old now and sure to need an update - to add some additional lexicographic options to allow sorting in unicode order, locale order with uppercase first, or locale order with lowercase first.

Which specific functionality were you hoping to see addressed?

leilapearson on 8 Jul 2020

On MacOS the files are sorted case insensitive and I didnt find a way to sort them case sensitive (without affecting the order files/folders).

Sytten on 8 Jul 2020

PR #97272 adds the option to group files and folders by case, but that PR was submitted at a time when the reviewers weren't available and is out of date now. I'll take a look at resurrecting it.

leilapearson on 8 Jul 2020

👍4 ❤2

Just wanted to provide a couple of quick updates for anyone watching this issue.

PR #104528 was recently merged. This PR changes how aggregate.go and aggregate_repo.go are sorted. See Issue #99955 if you want more details.
PR #97272 - which adds settings to group names by case and to sort in unicode order - has been updated, but will be left on hold for now.

leilapearson on 25 Aug 2020

Was this page helpful?

0 / 5 - 0 ratings