Syncthing: Possibly wrong Turkish character test case in lib/fs/folding_test.go

Created on 24 Oct 2020  Â·  3Comments  Â·  Source: syncthing/syncthing

I am assuming the file lib/fs/folding_test.go) tests case folding using function func TestUnicodeLowercase(t *testing.T)(line 49) via var caseCases = [][2]string (line 13).

While I was looking at most recent commits for fun, I noticed this file has these lines (32-36):

// The Turks do their thing with the Is.... Like the Greek example
// we pick just the one canonicalized "i" although you can argue
// with this... From what I understand most operating systems don't
// get this right anyway.
{"İI", "ii"},

In Turkish, we have these mappings: I -> ı and İ -> i. Note the dots. Just for completeness, all non-English character mappings are:

Ç -> ç
Äž -> ÄŸ
I -> ı
İ -> i
Ö -> ö
Åž -> ÅŸ
Ü -> ü

This made me wonder if this would pose a problem in my scenario:

Desktop: Windows 8.1 (Turkish, not even remotely up-to-date), running SyncTrazor v1.1.24 (Syncthing v1.10.0), using NTFS
Laptop: Arch Linux (English, up-to-date) running Syncthing v1.10.0 and Syncthing-GTK, using NTFS/fuseblk and ext4
Both synchronising to/from a single folder, corresponding to their local directories with English names
I almost always use English in my computers (programs, folders, files etc.) which means I don't experience or know of any language problems in my environment.

Story time. Please skip these italic parts if you don't want to hear why and how I could not test my theory.

_I have tried to test my theory (of this going south, badly) by creating a file named ÇçĞğIıİiÖöŞşÜü.txt on my desktop._

_It synched perfectly. I got the exact same filename on my laptop, NTFS/fuseblk partition. I am perplexed._

_Back to work, I have created another folder on my laptop, in another partition which is ext4. Copied my weird-named file to it, and nothing. BTW on my laptop, i have_

$ cat /sys/fs/ext4/features/casefold
supported

_I have also tried on my laptop creating two other files in both partitions/folders with names ççğğııiiööşşüü.txt. Synchronising these files were also a breeze. No errors, correct file names on desktop. Another test with file names ÇÇĞĞIIİİÖÖŞŞÜÜ.txt also passed. Synchronising to my Desktop went perfectly._

_I now have 3 files each in two folders at both computers. I can't see any problem._

This test is wrong (and incomplete). So either it does not pass (I can't check that) or there is a problem lying around undetected. I could not produce an error though.

Nevertheless, I think having extensive test cases is nice. So I propose changing line 36 of the test file as

{"ÇĞIİÖŞÜ", "çğıiöşü"},

and see if this breaks things.

I could not do this myself because I am not qualified in Go language and could not test this change before commit. Though I hope this would give some pointers if there is any issue regarding this.

All in all, thanks for the good job.

bug needs-triage

All 3 comments

Not at a computer at the moment and will take a look later, but just so we're on the same page our case folding is only for equality comparisons in deciding whether two file names can exist beside each other in the same directory on a case insensitive file system. The handling of Turkish "I":s may not be consistent between systems, but the safe assumption from our side is that all the case and dot variants are equivalent. That doesn't mean we won't preserve the precise character you used in the name.

So indeed it fails, as the comment in the test predicts:

--- FAIL: TestUnicodeLowercase (0.00s)
    folding_test.go:54: UnicodeLowercase("ÇĞIİÖŞÜ") => "çğiiöşü", expected "çğıiöşü"

As the same comment says: We just use one canonical lower-case, even if that might not be the entirely correct thing to do. And the comment that many systems get this wrong anyway, is an argument to use such a canocical lower-case: This means we might get a false positive case-conflict, however if we do use different lower-cases as proposed here, and the OS doesn't, we get a false negative which might endanger data.

Thus I'd say this is working as expected right now.

 --- FAIL: TestUnicodeLowercase (0.00s)
    folding_test.go:54: UnicodeLowercase("ÇĞIİÖŞÜ") => "çğiiöşü", expected "çğıiöşü"

ouch!

My "intuition" got the best of me. I can see it clearly now -after struggling about an hour to get into grips with the reality. I even thought at one point that I have found a bug in golang's unicode package. Go figure.

For future me coming here for a laugh or anyone thinking there are some mysterious things going on; THIS was MY problem:
_We are not doing case-folding for a letter that we know for/in which language it is written. I may have a file "INVOICES.PDF" next to another named "MARTI.TXT" (martı = seagull in Turkish). I know both of these words by heart and see no problem translating them as "invoices.pdf" and "martı.txt". But how can Syncthing know that? It is safer -as our friends noted- to have a false positive here._

Sorry for taking your time & thanks for your patient explanations.

Cheers

Was this page helpful?
0 / 5 - 0 ratings

Related issues

norgeous picture norgeous  Â·  3Comments

tomasz1986 picture tomasz1986  Â·  3Comments

kakra picture kakra  Â·  3Comments

gabriel-fallen picture gabriel-fallen  Â·  4Comments

trymeouteh picture trymeouteh  Â·  4Comments