在 Perl 中使用自定义条件按三列对逗号分隔的文件进行排序

Bil*_*y J 2 sorting perl

我有一个逗号分隔的文本文件。我想先按第 3 列对文件进行排序,然后是第 2 列,然后是第 1 列。

但是,我希望第 3 列按字母顺序排序,首先是最长的值。

例如,AAA,然后是 AA,然后是 A,然后是 BBB,然后是 BB,然后是 B,然后是 CCC,然后是 CC,依此类推。

输入(alpha-sort-test2.txt):

JOHN,1,A
MARY,3,AA
FRED,5,BBB
SAM,7,A
JOHN,3,AAA
JOHN,2,AAA
BETTY,2,AAA
JARROD,7,AAA
JOANNE,2,BB
AMANDA,2,DD
AMY,5,B
PETE,7,CC
MATT,4,B
SARAH,3,CCC
GEORGE,3,CC
AMANDA,3,AAA
Run Code Online (Sandbox Code Playgroud)

到目前为止,我拥有的 Perl 代码如下:

$infile = "alpha-sort-test2.txt";
$outfile = "alpha-sort-test-sorted2.txt";

open (INFILE, "<$infile") or die "Could not open file $infile $!";
open (OUTFILE, ">$outfile");

my @array = sort howtosort <INFILE>;

foreach (@array)
{
   chomp;
   print "$_\n";
   print OUTFILE "$_\n"; 
}

sub howtosort 
{
   my @flds_a = split(/,/, $a);
   my @flds_b = split(/,/, $b);

   $flds_a[2] cmp $flds_b[2]; 
}

close INFILE;
close OUTFILE; 
Run Code Online (Sandbox Code Playgroud)

当前输出(alpha-sort-test-sorted2.txt):

JOHN,1,A
SAM,7,A
MARY,3,AA
AMANDA,3,AAA
JOHN,3,AAA
JOHN,2,AAA
BETTY,2,AAA
JARROD,7,AAA
AMY,5,B
MATT,4,B
JOANNE,2,BB
FRED,5,BBB
PETE,7,CC
GEORGE,3,CC
SARAH,3,CCC
AMANDA,2,DD
Run Code Online (Sandbox Code Playgroud)

期望的输出:

BETTY,2,AAA
JOHN,2,AAA
AMANDA,3,AAA
JOHN,3,AAA
JARROD,7,AAA
MARY,3,AA
JOHN,1,A
SAM,7,A
FRED,5,BBB
JOANNE,2,BB
MATT,4,B
AMY,5,B
SARAH,3,CCC
GEORGE,3,CC
PETE,7,CC
AMANDA,2,DD
Run Code Online (Sandbox Code Playgroud)

提前致谢。

zdi*_*dim 5

对于第三个字段,该标准有点复杂。

字典比较逐个字符进行,因此abc小于ax但更长的字符串更大,其他所有条件都相同。所以ab小于bab大于a

因此,对第三个字段的要求混合了这两件事,并cmp在中间分解了。如果我们要使用cmpthenab在之前b(正确)但aa在之后a(不需要)。我根本不知道如何使用cmp来满足该要求。

所以这是它的一个非常基本的实现,对于这些标准

use warnings;
use strict;
use feature 'say';
use Path::Tiny qw(path);  # convenience

my $file = shift // die "Usage: $0 file\n";
my @lines = path($file)->lines({ chomp => 1 });

my @sorted =
    map { $_->[0] }
    sort { custom_sort($a, $b) }
    map { [$_, split /,/]  }
    @lines;

say for @sorted;


sub custom_sort {
    my ($aa, $bb) = @_;

    # Last field for both terms, their lengths
    my ($af, $bf) = map { $_->[-1] } $aa, $bb;
    my ($len_a, $len_b) = map { length } $af, $bf;

    # Strip and return first characters and compare them lexicographically
    # Then compare lengths of original strings if needed
    # Keep going until difference is found or one string is depleted
    while (
        (my $ca = substr $af, 0, 1, "")  and
        (my $cb = substr $bf, 0, 1, "")    )
    {
        if ($ca gt $cb) {
            return 1
        }
        elsif ($ca lt $cb) {
            return -1;
        }
        elsif ($len_a < $len_b) {
            return 1
        }
        elsif ($len_a > $len_b) {
            return -1
        }
    }

    # Still here, so third field was the same; use other two criteria
    return
        $aa->[2] <=> $bb->[2]
            ||
        $aa->[1] cmp $bb->[1];
}
Run Code Online (Sandbox Code Playgroud)

这将打印出所需的列表。

一些评论

  • Before invoking sort we first form an arrayref, with the whole string and its individual fields, so that the string need not be split later on every single comparison; this is Schwartzian transform

  • Criterion for the third-field: compare character by character alphabetically until a difference is found; if one string is contained in the other then the longer one wins. So the char-by-char comparison of abc and ab stops at b and abc 'wins'

  • The (optional) fourth argument in substr is the replacement for the returned substring, found per the second and third argument. So here an empty string replaces one-long substring that starts at 0 -- it removes and returns the first character. This is quite like using shift on an array

  • 如果第三个字段完全相同,则按数字比较第二个字段,如果它们相同,则按字母顺序比较第一个字段

  • 比较后,我们从排序的 arrayrefs 中检索原始字符串