Search…

X3 Photo Gallery Support Forums

Search…
 
User avatar
amwpsaa
Experienced
Topic Author
Posts: 53
Joined: 29 Jul 2021, 08:46

Sorting is inconsistent in some locales when files have mixed character-sets

22 Nov 2021, 02:21

Sorry, I misunderstood my translation before,
The video has been re-recorded, you check your email 
 
 
User avatar
amwpsaa
Experienced
Topic Author
Posts: 53
Joined: 29 Jul 2021, 08:46

Re: Sorting is inconsistent in some locales when files have mixed character-sets

09 Dec 2021, 23:30

mjau-mjau wrote:Replied.
When can this problem be fixed? Very important.
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13993
Joined: 30 Sep 2006, 03:37

Re: Sorting is inconsistent in some locales when files have mixed character-sets

10 Dec 2021, 00:25

amwpsaa wrote:When can this problem be fixed? Very important.
Need to finish some Files app documentation first. Will release a new version by end of next week with a solution.
 
rampageX
Posts: 7
Joined: 25 Nov 2021, 00:32

Re: Sorting is inconsistent in some locales when files have mixed character-sets

22 Dec 2021, 09:28

mjau-mjau wrote:
amwpsaa wrote:When can this problem be fixed? Very important.
Need to finish some Files app documentation first. Will release a new version by end of next week with a solution.
Not press, just want to know progress...
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13993
Joined: 30 Sep 2006, 03:37

Re: Sorting is inconsistent in some locales when files have mixed character-sets

22 Dec 2021, 23:14

rampageX wrote:Not press, just want to know progress...
Sorry, finishing the new Files website and docs currently, should be out in a few days. I'll look into the update (with sorting) from next week, but it might be 2 weeks until I can get anything out now with Christmas ...
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13993
Joined: 30 Sep 2006, 03:37

Re: Sorting is inconsistent in some locales when files have mixed character-sets

01 Jan 2022, 02:28

rampageX wrote:Not press, just want to know progress...
Can I ask for testing purpose, is the issue resolved if you add this at top of the files app index.php?
Code
setlocale(LC_CTYPE, "UTF8", "en_US.UTF-8");
It seems to fix another issues for Chinese user, see Github issue.

Sorry for the delay, as I was busy with the new website www.files.gallery.
 
rampageX
Posts: 7
Joined: 25 Nov 2021, 00:32

Re: Sorting is inconsistent in some locales when files have mixed character-sets

01 Jan 2022, 10:08

mjau-mjau wrote:
rampageX wrote:Not press, just want to know progress...
Can I ask for testing purpose, is the issue resolved if you add this at top of the files app index.php?
Code
setlocale(LC_CTYPE, "UTF8", "en_US.UTF-8");
It seems to fix another issues for Chinese user, see Github issue.

Sorry for the delay, as I was busy with the new website www.files.gallery.
I add the code, but the sorting issue still there.
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13993
Joined: 30 Sep 2006, 03:37

Re: Sorting is inconsistent in some locales when files have mixed character-sets

27 Jan 2022, 23:58

Hi. Sorry for the delay.

I am working on a sorting fix now, but I need to test differences between sorting methods. Can you (@ramapageX or @amwpsaa) provide me with three words in Chinese with correct sorting? I want to prepare some simple output tests with the following words:

20
100
aaa
AAAA
中文字 1
中文字 2
中文字 3

XXX
xxxx
æææ

Please replace the words in bold with Chinese words in correct order. Thanks.
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13993
Joined: 30 Sep 2006, 03:37

Re: Sorting is inconsistent in some locales when files have mixed character-sets

28 Jan 2022, 05:36

I have done some research, studying the code in Files app and looking at the videos forwarded by @amwpsaa. I have also searched Google about localeCompare(), and there seems to be reports specifically about failure to compare MIXED character sets on Chinese lang/OS.
https://stackoverflow.com/questions/594 ... th-english
https://stackoverflow.com/questions/659 ... g-to-my-lo
https://titanwolf.org/Network/Articles/ ... 4f56920af8

Furthermore, I have tested from here on Mac, Windows and iPhone, and sorting with localeCompare() is behaving as expected, no matter if I click to other folders or change sorting back and forth. This is an issue with localeCompare() in some OS set to specific languages (in this case, Chinese).
[screenshot]

In conclusion, it's not possible to get localeCompare() to work with MIXED character names in your OS. With default non-specified language, it seems to return 0 when comparing mixed items. You can specify a language to compare with, but then it will not work properly for MIXED character sets (for example English and Chinese). You might get it to work by setting a specific language, but this seems random.

What is happening in Files app?
I had to do some guessing here, because I cannot recreate the issue, but I have looked at videos from @amwpsaa and compared with the code. When a folder first loads and sorting is by NAME, it will be correct, sorted by PHP strnatcasecmp(). In fact, Files app will also do sorting on top of this with localeCompare() (for consistency), but since it sorting returns 0 on your OS, sorting will remain the same (correct) as from PHP. Once you change to another sort and then back to NAME, it will partially fail and inherit some sort orders from previous sort method (because localeCompare() fails at least for some sort comparisons).

But it worked in earlier version ...?
In previous versions we simply compared by a.toLowerCase() < b.toLowerCase() ? -1 : 1, and this seems to work nicely when comparing names case-insensitive. The only problem with this simple method, is that it doesn't sort numbered names correctly. For example "2name" should come before "10name", but it will sort "10name" first because first character "1" comes before "2".

But it works fine in datatables.js and tinyfilemanager ... ?
It "works" in these scripts only because they are using the same simple sorting method as noted above. It does not sort numbered names "naturally".

On first load, sorted correctly by PHP:
https://dsh.re/6d24bd

When sorting dynamically from Javascript, it's WRONG:
https://dsh.re/539e4

So what can we do?
In next release, I will add some new options for sorting mechanism. For example, you can revert to basic sorting as used earlier, which mostly works although it does not understand the logic for numbered names "2foldername" < "10foldername".

Unless someone can show me otherwise, there does not seem to be a single Javascript code that can sort "natural", case-insensitive with mixed character-sets across all OS/devices without character/language-specific settings.
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13993
Joined: 30 Sep 2006, 03:37

Re: Sorting is inconsistent in some locales when files have mixed character-sets

28 Jan 2022, 10:21

What do you see in these tests? Please check:
https://jsbin.com/vatoyuboka/edit?js,console

It should look like below, although I suspect you will see a different output in Chinese OS. The bottom 3x "ok" arrays all have latin characters correctly sorted, although the first one is set to "zh-CN" so it will prioritize Chinese before latin. Which ones sort the two Chinese words correctly?
Image

The 1,-1 output logs compare() in sorting, and I'm wondering if you get 0 in some cases.
 
rampageX
Posts: 7
Joined: 25 Nov 2021, 00:32

Re: Sorting is inconsistent in some locales when files have mixed character-sets

01 Feb 2022, 23:51

Image
Chinese OS,  the bottom 3x arrays all sort the two Chinese words correctly.  But the whole actual ordering should be:

2name    10name    aa    BBB     xx       非洲     中国

dont know how 'zèbre' 'écureuil' will be sort, but i am sure all the latin words will be front of chinese words.
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13993
Joined: 30 Sep 2006, 03:37

Re: Sorting is inconsistent in some locales when files have mixed character-sets

02 Feb 2022, 04:35

Thanks for reply  :clap: Unfortunately, this is disappointing results  :expressionless:
rampageX wrote:Chinese OS,  the bottom 3x arrays all sort the two Chinese words correctly.
Strange. In array #3, there is the SAME sorting method as we used in previous Files app version, which Chinese users reported was correct. I think you also reported it was sorting correctly. The below Chinese sorting words is wrong?
Code
["10name", "2name", "aa", "BBB", "xx", "zèbre", "écureuil", "中国", "非洲"]
This means that it's not able to sort Chinese if there are mixed character sets. I have tested tinyfilemanager.github.io and datatables.js, and it sorts Chinese characters words the same as above (because it uses the same sorting function).
Image
rampageX wrote:dont know how 'zèbre' 'écureuil' will be sort
This is solved correctly by localeCompare() used in current Files app version. As you can see in last three arrays (both mine and yours), "écureuil" correctly comes after "BBB", before "xx".
Code
"BBB", "écureuil", "xx"
rampageX wrote:
Code
2name    10name    aa    BBB     xx       非洲     中国
..., but i am sure all the latin words will be front of chinese words.
I don't see how that is possible with any Javascript code. Anyone please feel free to try from my jsbin:
https://jsbin.com/vatoyuboka/edit?js,console

Conclusion
Unless anyone has any technical suggestions (or examples) for correctly sorting arrays of mixed Chinese and latin from Javascript, then I can only add additional OPTIONS in next release.

1. Option localeCompare
This will be kept as default sorting option, because it sorts correctly in all cases except those reported by Chinese language users. Not only does it sort numbers naturally (2<10), but it also sorts unicode correctly (at least latin and/or in most OS/browsers). On questions about the "correct way" to sort alphanumeric strings, localeCompare() is considered the correct answer for modern browsers.
https://stackoverflow.com/a/38641281/3040364

There are a few Javascript sorting libraries available [1,https://github.com/javve/natural-sort,3,4,5], but they are no longer updated because they have been considered replaced by localeCompare() for modern browsers. This has been tested in depth:
https://stackoverflow.com/a/26295229/3040364

Perhaps one of the libs above might work for Chinese/mixed? :thinking: I don't know, but it's backwards to use non-native and slow custom sorting functions when there is already a native option in place.

2. Custom collator() function
I will expose the Intl.Collator() function used in Files app for localeCompare(), so that anyone can edit parameters from config if they find a solution that works for them. In Files app, it looks like this (language undefined = automatic).
Code
new Intl.Collator(undefined, { numeric: true, sensitivity: 'base' });
User @amwpsaa tested that it worked in Chinese when setting the locale to 'de', although we don't know why and it's not logical.
Code
new Intl.Collator('de', { numeric: true, sensitivity: 'base' });
3. New option basic sort
This is the sorting method used in previous Files app, and was reported working fine with Chinese by both @amwpsaa and @rampageX. Basically it just involves lowercase a < b sorting, which resolves nicely in modern browsers and is fast. What this sort method does NOT do: 1. It does not understand that 2<10 (it just looks at first letter), and 2. It cannot sort mixed unicode (it will sort ["a", "z", "é"] instead of ["a", "é", "z"]. In many cases however, this will work nice and fast for many users who are not picky about these two flaws.

4. Custom sort function
I was considering allowing custom function for "alphanumeric" sort, so that users could write their own function. However, I don't see much point in this if nobody can write a Javascript sort function that resolves the issue with Chinese, noted in this post.

So for next release coming soon, expect #1-3. Unless anyone has and further suggestions :pray:

Happy Chinese New Year (yesterday)! :flag_cn:🧧🧨:dragon::tangerine:
 
User avatar
mjau-mjau
X3 Wizard
Posts: 13993
Joined: 30 Sep 2006, 03:37

Re: Sorting is inconsistent in some locales when files have mixed character-sets

02 Feb 2022, 23:35

* continued from my last post

5. PHP sort
I notice that strnatcasecmp() sorting from PHP might offer better sorting in some cases. It's difficult to use PHP for "live" sorting from the interface (without re-loading from server), but I could perhaps store the PHP sort on initial folder load and use that as reference when sorting "live". I don't know how well this will work, but I might include as an option.
 
rampageX
Posts: 7
Joined: 25 Nov 2021, 00:32

Re: Sorting is inconsistent in some locales when files have mixed character-sets

03 Feb 2022, 22:28

Pure chinese words sorting is difficult, the 'perfect' way is:
1. Sort chinese words by 'PINYIN' first, as '非洲'=FeiZhou, '中国'=ZhongGuo,  F < Z, then 非洲 will be front of 中国;
2. if chinese wors have same 'PINGYIN', as '中国'=ZhongGuo, tha same as '中过'=ZhongGuo, we must sort these two words with Unicode Table.

But i think we dont  need this strict chinese words sort, we just dont want sorting chaos caused by mixed characters( with chinese words).

I was tried TinyFileManager, the Chinese words sort was wrong too, but there were always stay behind all the latins, and all latins sort keep correct.