找到与其他单词共同的大多数字母的单词

Keb*_*man 6 perl letter find

我希望Perl(5.8.8)找出哪个单词与数组中的其他单词具有最多共同字母 - 但只能找到位于同一位置的字母.(最好不使用libs.)

以这个单词列表为例:

  • BAKER
  • SALER
  • BALER
  • CARER
  • RUFFR

她的BALER是与其他人共有最多字母的词.它匹配BAKER中的BAxER,SALER中的xALER,CARER中的xAxER和RUFFR中的xxxxR.

我希望Perl能够在具有相同长度和大小写的任意单词列表中为我找到这个单词.似乎我在这里碰到了墙,所以非常感谢帮助!

我到现在为止所尝试过的

目前没有太多的脚本:

use strict;
use warnings; 
my @wordlist = qw(BAKER SALER MALER BARER RUFFR);
foreach my $word (@wordlist) {
    my @letters = split(//, $word);
    # now trip trough each iteration and work magic...
}
Run Code Online (Sandbox Code Playgroud)

在评论的地方,我尝试了几种代码,包括for循环和++变量.到目前为止,我没有尝试过我需要做的事情.

因此,为了更好地解释:我需要的是逐字逐句地测试,对于每个字母位置,找到与列表中其他字母具有最多共同字母的单词,在该字母的位置.

一种可能的方法是首先检查哪个单词在字母位置0处最常见,然后测试字母位置1,依此类推,直到找到总和中具有最多字母的单词.列表中的其他单词.然后我想打印列表,就像一个矩阵,每个字母位置的分数加上每个单词的总分,与DavidO建议的不同.

你实际上最终得到的是每个单词的矩阵,每个字母位置的分数,以及矩阵中每个单词前面的总分.

该计划的目的

呵呵,我不妨这么说:该程序适用于游戏"辐射3"中的黑客终端.:D我的想法是,这是学习Perl同时享受有趣游戏的好方法.

这是我用于研究的Fallout 3终端黑客教程之一:FALLOUT 3:Hacking FAQ v1.2,我已经制作了一个缩短单词列表的程序,如下所示:

#!/usr/bin/perl
# See if one word has equal letters as the other, and how many of them are equal
use strict;
use warnings; 

my $checkword = "APPRECIATION"; # the word to be checked
my $match = 4; # equal to the match you got from testing your checkword
my @checkletters = split(//, $checkword); #/

my @wordlist = qw(
    PARTNERSHIPS
    REPRIMANDING
    CIVILIZATION
    APPRECIATION
    CONVERSATION
    CIRCUMSTANCE
    PURIFICATION
    SECLUSIONIST
    CONSTRUCTION
    DISAPPEARING
    TRANSMISSION
    APPREHENSIVE
    ENCOUNTERING
);

print "$checkword has $match letters in common with:\n";

foreach my $word (@wordlist) {
    next if $word eq $checkword;
    my @letters = split(//, $word);
    my $length = @letters; # determine length of array (how many letters to check)

    my $eq_letters = 0; # reset to 0 for every new word to be tested
    for (my $i = 0; $i < $length; $i++) {
        if ($letters[$i] eq $checkletters[$i]) {
            $eq_letters++;
        }
    }
    if ($eq_letters == $match) {
        print "$word\n";
    }
}
# Now to make a script on to find the best word to check in the first place...
Run Code Online (Sandbox Code Playgroud)

这个脚本将产生CONSTRUCTIONTRANSMISSION作为其结果,就像在游戏FAQ中一样.原始问题的诀窍(以及我自己无法找到的东西)是如何找到最好的词来尝试,即APPRECIATION.

好的,我现在已根据您的帮助提供了我自己的解决方案,并考虑关闭此线程.很多,非常感谢所有的贡献者.你帮助很大,而且我也学到了很多东西.:d

Dav*_*idO 7

这是一种方式.重新阅读你的规范几次,我认为这是你正在寻找的.

值得一提的是,有可能会有不止一个单词具有相同的最高分.从你的列表中只有一个赢家,但有可能在更长的列表中,会有几个同样获胜的单词.该解决方案涉及到这一点.另外,据我所知,只有当字母匹配出现在每个单词的同一列时,才会计算字母匹配.如果是这样的话,这是一个有效的解决方案:

use 5.012;
use strict;
use warnings;
use List::Util 'max';

my @words = qw/
    BAKER
    SALER
    BALER
    CARER
    RUFFR
/;

my @scores;
foreach my $word ( @words ) {
    my $score;
    foreach my $comp_word ( @words ) {
        next if $comp_word eq $word;
        foreach my $pos ( 0 .. ( length $word ) - 1 ) {
            $score++ if substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
        }
    }
    push @scores, $score;
}
my $max = max( @scores );
my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;

say "Words with most matches:";
say for @words[@max_ixs];
Run Code Online (Sandbox Code Playgroud)

此解决方案计算每个字母列的每个字母与其他字词匹配的次数.例如:

Words:     Scores:       Because:
ABC        1, 2, 1 = 4   A matched once,  B matched twice, C matched once.
ABD        1, 2, 1 = 4   A matched once,  B matched twice, D matched once.
CBD        0, 2, 1 = 3   C never matched, B matched twice, D matched once.
BAC        0, 0, 1 = 1   B never matched, A never matched, C matched once.
Run Code Online (Sandbox Code Playgroud)

这将为您提供ABC和ABD的获胜者,每个人都有四个位置匹配的分数.即,第一列,第一行与第一行第二行,第三行和第四行匹配的累积时间,以及后续列的依此类推.它可能能够进一步优化,并且重新措辞更短,但我试图保持逻辑相当容易阅读.请享用!

更新/编辑 我想到了它,并意识到尽管我现有的方法完全按照原始问题的要求进行,但它在O(n ^ 2)时间内完成,这相对较慢.但是如果我们对每个列的字母使用哈希键(每个键一个字母),并计算每个字母在列中出现的次数(作为哈希元素的值),我们可以在O(1)中进行求和. )时间,以及我们在O(n*c)时间内遍历列表(其中c是列数,n是单词数).还有一些设置时间(创建哈希).但我们仍然有很大的进步.这是每种技术的新版本,以及每种技术的基准比较.

use strict;
use warnings;
use List::Util qw/ max sum /;
use Benchmark qw/ cmpthese /;

my @words = qw/
    PARTNERSHIPS
    REPRIMANDING
    CIVILIZATION
    APPRECIATION
    CONVERSATION
    CIRCUMSTANCE
    PURIFICATION
    SECLUSIONIST
    CONSTRUCTION
    DISAPPEARING
    TRANSMISSION
    APPREHENSIVE
    ENCOUNTERING
/;


# Just a test run for each solution.
my( $top, $indexes_ref );

($top, $indexes_ref ) = find_top_matches_force( \@words );
print "Testing force method: $top matches.\n";
print "@words[@$indexes_ref]\n";

( $top, $indexes_ref ) = find_top_matches_hash( \@words );
print "Testing hash  method: $top matches.\n";
print "@words[@$indexes_ref]\n";



my $count = 20000;
cmpthese( $count, {
    'Hash'  => sub{ find_top_matches_hash( \@words ); },
    'Force' => sub{ find_top_matches_force( \@words ); },
} );


sub find_top_matches_hash {
    my $words = shift;
    my @scores;
    my $columns;
    my $max_col = max( map { length $_ } @{$words} ) - 1;
    foreach my $col_idx ( 0 .. $max_col ) {
        $columns->[$col_idx]{ substr $_, $col_idx, 1 }++ 
            for @{$words};
    }
    foreach my $word ( @{$words} ) {
        my $score = sum( 
            map{ 
                $columns->[$_]{ substr $word, $_, 1 } - 1
            } 0 .. $max_col
        );
        push @scores, $score;
    }
    my $max = max( @scores );
    my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
    return(  $max, \@max_ixs );
}


sub find_top_matches_force {
    my $words = shift;
    my @scores;
    foreach my $word ( @{$words} ) {
        my $score;
        foreach my $comp_word ( @{$words} ) {
            next if $comp_word eq $word;
            foreach my $pos ( 0 .. ( length $word ) - 1 ) {
                $score++ if 
                    substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
            }
        }
        push @scores, $score;
    }
    my $max = max( @scores );
    my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
    return( $max, \@max_ixs );
}
Run Code Online (Sandbox Code Playgroud)

输出是:

Testing force method: 39 matches.
APPRECIATION
Testing hash  method: 39 matches.
APPRECIATION
        Rate Force  Hash
Force 2358/s    --  -74%
Hash  9132/s  287%    --
Run Code Online (Sandbox Code Playgroud)

在您看到提供的其他选项后,我意识到您的原始规格发生了变化,这在某种程度上是创新的本质,但这个难题仍然存在于我的脑海中.如您所见,我的哈希方法比原始方法快287%.在更短的时间内更有趣!


yst*_*sth 5

作为起点,您可以有效地检查他们共有多少个字母:

$count = ($word1 ^ $word2) =~ y/\0//;
Run Code Online (Sandbox Code Playgroud)

但这只有在你遍历所有可能的单词对时才有用,在这种情况下这是不必要的:

use strict;
use warnings;
my @words = qw/
    BAKER
    SALER
    BALER
    CARER
    RUFFR
/;

# you want a hash to indicate which letters are present how many times in each position:

my %count;
for my $word (@words) {
    my @letters = split //, $word;
    $count{$_}{ $letters[$_] }++ for 0..$#letters;
}

# then for any given word, you get the count for each of its letters minus one (because the word itself is included in the count), and see if it is a maximum (so far) for any position or for the total:

my %max_common_letters_count;
my %max_common_letters_words;
for my $word (@words) {
    my @letters = split //, $word;
    my $total;
    for my $position (0..$#letters, 'total') {
        my $count;
        if ( $position eq 'total' ) {
            $count = $total;
        }
        else {
            $count = $count{$position}{ $letters[$position] } - 1;
            $total += $count;
        }
        if ( ! $max_common_letters_count{$position} || $count >= $max_common_letters_count{$position} ) {
            if ( $max_common_letters_count{$position} && $count == $max_common_letters_count{$position} ) {
                push @{ $max_common_letters_words{$position} }, $word;
            }
            else {
                $max_common_letters_count{$position} = $count;
                $max_common_letters_words{$position} = [ $word ];
            }
        }
    }
}

# then show the maximum words for each position and in total: 

for my $position ( sort { $a <=> $b } grep $_ ne 'total', keys %max_common_letters_count ) {
    printf( "Position %s had a maximum of common letters of %s in words: %s\n",
        $position,
        $max_common_letters_count{$position},
        join(', ', @{ $max_common_letters_words{$position} })
    );
}
printf( "The maximum total common letters was %s in words(s): %s\n",
    $max_common_letters_count{'total'},
    join(', ', @{ $max_common_letters_words{'total'} })
);
Run Code Online (Sandbox Code Playgroud)