在Perl中对utf8字符串使用"sort"

And*_*wby 1 perl utf-8

我试图弄清楚如何在Perl中按字母顺序对数组进行排序.这是我的英语工作正常:

   # List of countries (kept like this to keep clean, as its re-used in other places)
    my $countries = {
        'AT' => "íAustria",
        'AU' => "Australia",
        'BE' => "Belgium",
        'BG' => "Bulgaria",
        'CA' => "Canada",
        'CY' => "Cyprus",
        'CZ' => "Czech Republic",
        'DK' => "Denmark",
        'EN' => "England",
        'EE' => "Estonia",
        'FI' => "Finland",
        'FR' => "France",
        'DE' => "Germany",
        'GB' => "Great Britain",
        'GR' => "Greece",
        'HU' => "Hungary",
        'IE' => "Ireland",
        'IT' => "Italy",
        'LV' => "Latvia",
        'LT' => "Lithuania",
        'LU' => "Luxembourg",
        'MT' => "Malta",
        'NZ' => "New Zealand",
        'NL' => "Netherlands",
        'PL' => "Poland",
        'PT' => "Portugal",
        'RO' => "Romania",
        'SK' => "Slovakia",
        'SI' => "Slovenia",
        'ES' => "Spain",
        'SE' => "Sweden",
        'CH' => "Switzerland",
        'SC' => "Scotland",
        'UK' => "United Kingdom",
        'US' => "USA",
        'TK' => "Turkey",
        'NO' => "Norway",
        'MX' => "Mexico",
        'IL' => "Israel",
        'IN' => "India",
        'IS' => "Iceland",
        'CN' => "China",
        'JP' => "Japan",
        'VN' => "áVietnamí"
    };
   # Populate the original loop with "name" and "code"
    my @country_loop_orig;
    print $IN->header;
    foreach (keys %{$countries}) {
      push @country_loop_orig, {
        name => $countries->{$lang}->{$_},
        code => $_
      }
    }

   # sort it alphabetically
   my @country_loop = sort { lc($a->{name}) cmp lc($b->{name})  } @country_loop_orig;
Run Code Online (Sandbox Code Playgroud)

这适用于英文版本:

Australia
Austria
Belgium
Bulgaria
Canada
China
Cyprus
Czech Republic
Denmark
England
Estonia
Finland
France
Germany
Great Britain
Greece
Hungary
Iceland
India
Ireland
Israel
Italy
Japan
Latvia
Lithuania
Luxembourg
Malta
Mexico
Netherlands
New Zealand
Norway
Poland
Portugal
Romania
Scotland
Slovakia
Slovenia
Spain
Sweden
Switzerland
Turkey
United Kingdom
USA
Vietnam
Run Code Online (Sandbox Code Playgroud)

...但是当你尝试使用诸如íé等的utf8时,它不起作用:

Australia
Belgium
Bulgaria
Canada
China
Cyprus
Czech Republic
Denmark
England
Estonia
Finland
France
Germany
Great Britain
Greece
Hungary
Iceland
India
Ireland
Israel
Italy
Japan
Latvia
Lithuania
Luxembourg
Malta
Mexico
Netherlands
New Zealand
Norway
Poland
Portugal
Romania
Scotland
Slovakia
Slovenia
Spain
Sweden
Switzerland
Turkey
United Kingdom
USA
áVietnam
íAustria
Run Code Online (Sandbox Code Playgroud)

你是如何实现这一目标的?我找到了Sort::Naturally::XS,但无法让它发挥作用.

zdi*_*dim 7

统一::整理 应该帮助与此有关.

一个简单的示例,对您的上一个列表进行排序

use warnings;
use strict;
use feature 'say';

use Unicode::Collate;

use open ":std", ":encoding(UTF-8)";

open my $fh, '<', "country_list.txt";
my @list = <$fh>;
chomp @list;

my $uc  = Unicode::Collate->new();
my @sorted = $uc->sort(@list);

say for @sorted;
Run Code Online (Sandbox Code Playgroud)

但是,在某些语言中,非ascii字符可能具有非常特殊的可接受位置,并且该问题不提供任何细节.那么也许Unicode :: Collat​​e :: Locale可能有所帮助.

请参阅(研究)这篇perl.com文章这篇文章(T. Christiansen),以及 这篇有效的文章.


如果要排序的数据在复杂的数据结构中,则cmp方法用于单独比较

my @sorted = map { $uc->cmp($a, $b) } @list;
Run Code Online (Sandbox Code Playgroud)

  • @AndrewNewby酷:)刚为它添加了一个注释 (2认同)