我有一个逗号分隔的文本文件。我想先按第 3 列对文件进行排序,然后是第 2 列,然后是第 1 列。
但是,我希望第 3 列按字母顺序排序,首先是最长的值。
例如,AAA,然后是 AA,然后是 A,然后是 BBB,然后是 BB,然后是 B,然后是 CCC,然后是 CC,依此类推。
输入(alpha-sort-test2.txt):
JOHN,1,A
MARY,3,AA
FRED,5,BBB
SAM,7,A
JOHN,3,AAA
JOHN,2,AAA
BETTY,2,AAA
JARROD,7,AAA
JOANNE,2,BB
AMANDA,2,DD
AMY,5,B
PETE,7,CC
MATT,4,B
SARAH,3,CCC
GEORGE,3,CC
AMANDA,3,AAA
Run Code Online (Sandbox Code Playgroud)
到目前为止,我拥有的 Perl 代码如下:
$infile = "alpha-sort-test2.txt";
$outfile = "alpha-sort-test-sorted2.txt";
open (INFILE, "<$infile") or die "Could not open file $infile $!";
open (OUTFILE, ">$outfile");
my @array = sort howtosort <INFILE>;
foreach (@array)
{
chomp;
print "$_\n";
print OUTFILE "$_\n";
}
sub howtosort
{
my @flds_a = split(/,/, $a);
my @flds_b = split(/,/, $b);
$flds_a[2] cmp $flds_b[2];
}
close INFILE;
close OUTFILE;
Run Code Online (Sandbox Code Playgroud)
当前输出(alpha-sort-test-sorted2.txt):
JOHN,1,A
SAM,7,A
MARY,3,AA
AMANDA,3,AAA
JOHN,3,AAA
JOHN,2,AAA
BETTY,2,AAA
JARROD,7,AAA
AMY,5,B
MATT,4,B
JOANNE,2,BB
FRED,5,BBB
PETE,7,CC
GEORGE,3,CC
SARAH,3,CCC
AMANDA,2,DD
Run Code Online (Sandbox Code Playgroud)
期望的输出:
BETTY,2,AAA
JOHN,2,AAA
AMANDA,3,AAA
JOHN,3,AAA
JARROD,7,AAA
MARY,3,AA
JOHN,1,A
SAM,7,A
FRED,5,BBB
JOANNE,2,BB
MATT,4,B
AMY,5,B
SARAH,3,CCC
GEORGE,3,CC
PETE,7,CC
AMANDA,2,DD
Run Code Online (Sandbox Code Playgroud)
提前致谢。
对于第三个字段,该标准有点复杂。
字典比较逐个字符进行,因此abc小于ax但更长的字符串更大,其他所有条件都相同。所以ab小于b但ab大于a。
因此,对第三个字段的要求混合了这两件事,并cmp在中间分解了。如果我们要使用cmpthenab在之前b(正确)但aa在之后a(不需要)。我根本不知道如何使用cmp来满足该要求。
所以这是它的一个非常基本的实现,对于这些标准
use warnings;
use strict;
use feature 'say';
use Path::Tiny qw(path); # convenience
my $file = shift // die "Usage: $0 file\n";
my @lines = path($file)->lines({ chomp => 1 });
my @sorted =
map { $_->[0] }
sort { custom_sort($a, $b) }
map { [$_, split /,/] }
@lines;
say for @sorted;
sub custom_sort {
my ($aa, $bb) = @_;
# Last field for both terms, their lengths
my ($af, $bf) = map { $_->[-1] } $aa, $bb;
my ($len_a, $len_b) = map { length } $af, $bf;
# Strip and return first characters and compare them lexicographically
# Then compare lengths of original strings if needed
# Keep going until difference is found or one string is depleted
while (
(my $ca = substr $af, 0, 1, "") and
(my $cb = substr $bf, 0, 1, "") )
{
if ($ca gt $cb) {
return 1
}
elsif ($ca lt $cb) {
return -1;
}
elsif ($len_a < $len_b) {
return 1
}
elsif ($len_a > $len_b) {
return -1
}
}
# Still here, so third field was the same; use other two criteria
return
$aa->[2] <=> $bb->[2]
||
$aa->[1] cmp $bb->[1];
}
Run Code Online (Sandbox Code Playgroud)
这将打印出所需的列表。
一些评论
Before invoking sort we first form an arrayref, with the whole string and its individual fields, so that the string need not be split later on every single comparison; this is Schwartzian transform
Criterion for the third-field: compare character by character alphabetically until a difference is found; if one string is contained in the other then the longer one wins. So the char-by-char comparison of abc and ab stops at b and abc 'wins'
The (optional) fourth argument in substr is the replacement for the returned substring, found per the second and third argument. So here an empty string replaces one-long substring that starts at 0 -- it removes and returns the first character. This is quite like using shift on an array
如果第三个字段完全相同,则按数字比较第二个字段,如果它们相同,则按字母顺序比较第一个字段
比较后,我们从排序的 arrayrefs 中检索原始字符串