为什么附加公共后缀会颠倒en_US语言环境中的归类顺序？

Question

为什么附加公共后缀会颠倒en_US语言环境中的归类顺序？

以下代码

#!/usr/bin/perl

use strict;
use warnings;

my $s1 = 'aaa2000@yahoo.com';
my $s2 = 'aaa_2000@yahoo.com';
my $s3 = 'aaa2000';
my $s4 = 'aaa_2000';

no locale;

print "\nNO Locale:\n\n";

if ($s1 gt $s2) {print "$s1 is > $s2\n";}
if ($s1 lt $s2) {print "$s1 is < $s2\n";}
if ($s1 eq $s2) {print "$s1 is = $s2\n";}

if ($s3 gt $s4) {print "$s3 is > $s4\n";}
if ($s3 lt $s4) {print "$s3 is < $s4\n";}
if ($s3 eq $s4) {print "$s3 is = $s4\n";}

use locale;

print "\nWith 'use locale;':\n\n";

if ($s1 gt $s2) {print "$s1 is > $s2\n";}
if ($s1 lt $s2) {print "$s1 is < $s2\n";}
if ($s1 eq $s2) {print "$s1 is = $s2\n";}

if ($s3 gt $s4) {print "$s3 is > $s4\n";}
if ($s3 lt $s4) {print "$s3 is < $s4\n";}
if ($s3 eq $s4) {print "$s3 is = $s4\n";}

Run Code Online (Sandbox Code Playgroud)

打印出来

NO Locale:

aaa2000@yahoo.com is < aaa_2000@yahoo.com
aaa2000 is < aaa_2000

With 'use locale;':

aaa2000@yahoo.com is > aaa_2000@yahoo.com
aaa2000 is < aaa_2000

Run Code Online (Sandbox Code Playgroud)

我不能真正遵循:同时,在使用locale,有一个<b AND a@yahoo.com> b@yahoo.com？!!

我错过了一些或多或少的东西,或者这是一个错误？其他人可以确认看到同样的行为吗？

Locale is $ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Run Code Online (Sandbox Code Playgroud)

提前致谢.

Answer 1

Pet*_*aut 4

启用区域设置后，将分多次完成排序。每个角色都有四个权重，在连续的过程中进行比较。和符号与大多数标点符号一样，没有第一、第二或第三权重，因此它们仅在第四遍中发挥作用@。_所以，对于你的例子

aaa2000@yahoo.com > aaa_2000@yahoo.com

Run Code Online (Sandbox Code Playgroud)

在第一遍中，它确实是比较

aaa2000yahoocom = aaa2000yahoocom

Run Code Online (Sandbox Code Playgroud)

然后在第四遍（第二遍和第三遍中没有区分因素）

@. > _@.

Run Code Online (Sandbox Code Playgroud)

因为@恰好大于_该区域设置中的值。（这只是区域设置定义所做的选择，大概基于某些 ISO 标准或其他标准。）

您可以查看它的实现细节。支持区域设置的比较最终在 C 库中实现为strxfrm(A) cmp strxfrm(B). 运行这个程序：

use POSIX;

my $s1 = 'aaa2000@yahoo.com';
my $s2 = 'aaa_2000@yahoo.com';

foreach ($s1, $s2) {
    printf "%s =>\t%v02x\n", $_, POSIX::strxfrm($_);
}

Run Code Online (Sandbox Code Playgroud)

我得到：

aaa2000@yahoo.com =>    0c.0c.0c.04.02.02.02.24.0c.13.1a.1a.0e.1a.18.01.08.08.08.08.08.08.08.08.08.08.08.08.08.08.08.01.02.02.02.02.02.02.02.02.02.02.02.02.02.02.02.01.08.5d.06.44
# explanation:           a  a  a  2  0  0  0  y  a  h  o  o  c  o  m DIV secondary weights ...                       DIV tertiary weights ...                        DIV  @     .
aaa_2000@yahoo.com =>   0c.0c.0c.04.02.02.02.24.0c.13.1a.1a.0e.1a.18.01.08.08.08.08.08.08.08.08.08.08.08.08.08.08.08.01.02.02.02.02.02.02.02.02.02.02.02.02.02.02.02.01.04.36.05.5d.06.44
# explanation:           a  a  a  2  0  0  0  y  a  h  o  o  c  o  m DIV secondary weights ...                       DIV tertiary weights ...                        DIV  _     @     .

Run Code Online (Sandbox Code Playgroud)

这些数字的导出方式是一个实现细节；它们只需要通过字节比较就能产生所需的最终结果。但在所有具有支持区域设置的排序的现代编程环境中，这个概念都是相同的。

归档时间：	15 年，4 月前
查看次数：	259 次
最近记录：	14 年，6 月前