我希望Perl(5.8.8)找出哪个单词与数组中的其他单词具有最多共同字母 - 但只能找到位于同一位置的字母.(最好不使用libs.)
以这个单词列表为例:
她的BALER是与其他人共有最多字母的词.它匹配BAKER中的BAxER,SALER中的xALER,CARER中的xAxER和RUFFR中的xxxxR.
我希望Perl能够在具有相同长度和大小写的任意单词列表中为我找到这个单词.似乎我在这里碰到了墙,所以非常感谢帮助!
目前没有太多的脚本:
use strict;
use warnings;
my @wordlist = qw(BAKER SALER MALER BARER RUFFR);
foreach my $word (@wordlist) {
my @letters = split(//, $word);
# now trip trough each iteration and work magic...
}
Run Code Online (Sandbox Code Playgroud)
在评论的地方,我尝试了几种代码,包括for循环和++变量.到目前为止,我没有尝试过我需要做的事情.
因此,为了更好地解释:我需要的是逐字逐句地测试,对于每个字母位置,找到与列表中其他字母具有最多共同字母的单词,在该字母的位置.
一种可能的方法是首先检查哪个单词在字母位置0处最常见,然后测试字母位置1,依此类推,直到找到总和中具有最多字母的单词.列表中的其他单词.然后我想打印列表,就像一个矩阵,每个字母位置的分数加上每个单词的总分,与DavidO建议的不同.
你实际上最终得到的是每个单词的矩阵,每个字母位置的分数,以及矩阵中每个单词前面的总分.
呵呵,我不妨这么说:该程序适用于游戏"辐射3"中的黑客终端.:D我的想法是,这是学习Perl同时享受有趣游戏的好方法.
这是我用于研究的Fallout 3终端黑客教程之一:FALLOUT 3:Hacking FAQ v1.2,我已经制作了一个缩短单词列表的程序,如下所示:
#!/usr/bin/perl
# See if one word has equal letters as the other, and how many of them are equal
use strict;
use warnings;
my $checkword = "APPRECIATION"; # the word to be checked
my $match = 4; # equal to the match you got from testing your checkword
my @checkletters = split(//, $checkword); #/
my @wordlist = qw(
PARTNERSHIPS
REPRIMANDING
CIVILIZATION
APPRECIATION
CONVERSATION
CIRCUMSTANCE
PURIFICATION
SECLUSIONIST
CONSTRUCTION
DISAPPEARING
TRANSMISSION
APPREHENSIVE
ENCOUNTERING
);
print "$checkword has $match letters in common with:\n";
foreach my $word (@wordlist) {
next if $word eq $checkword;
my @letters = split(//, $word);
my $length = @letters; # determine length of array (how many letters to check)
my $eq_letters = 0; # reset to 0 for every new word to be tested
for (my $i = 0; $i < $length; $i++) {
if ($letters[$i] eq $checkletters[$i]) {
$eq_letters++;
}
}
if ($eq_letters == $match) {
print "$word\n";
}
}
# Now to make a script on to find the best word to check in the first place...
Run Code Online (Sandbox Code Playgroud)
这个脚本将产生CONSTRUCTION
并TRANSMISSION
作为其结果,就像在游戏FAQ中一样.原始问题的诀窍(以及我自己无法找到的东西)是如何找到最好的词来尝试,即APPRECIATION
.
好的,我现在已根据您的帮助提供了我自己的解决方案,并考虑关闭此线程.很多,非常感谢所有的贡献者.你帮助很大,而且我也学到了很多东西.:d
这是一种方式.重新阅读你的规范几次,我认为这是你正在寻找的.
值得一提的是,有可能会有不止一个单词具有相同的最高分.从你的列表中只有一个赢家,但有可能在更长的列表中,会有几个同样获胜的单词.该解决方案涉及到这一点.另外,据我所知,只有当字母匹配出现在每个单词的同一列时,才会计算字母匹配.如果是这样的话,这是一个有效的解决方案:
use 5.012;
use strict;
use warnings;
use List::Util 'max';
my @words = qw/
BAKER
SALER
BALER
CARER
RUFFR
/;
my @scores;
foreach my $word ( @words ) {
my $score;
foreach my $comp_word ( @words ) {
next if $comp_word eq $word;
foreach my $pos ( 0 .. ( length $word ) - 1 ) {
$score++ if substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
}
}
push @scores, $score;
}
my $max = max( @scores );
my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
say "Words with most matches:";
say for @words[@max_ixs];
Run Code Online (Sandbox Code Playgroud)
此解决方案计算每个字母列的每个字母与其他字词匹配的次数.例如:
Words: Scores: Because:
ABC 1, 2, 1 = 4 A matched once, B matched twice, C matched once.
ABD 1, 2, 1 = 4 A matched once, B matched twice, D matched once.
CBD 0, 2, 1 = 3 C never matched, B matched twice, D matched once.
BAC 0, 0, 1 = 1 B never matched, A never matched, C matched once.
Run Code Online (Sandbox Code Playgroud)
这将为您提供ABC和ABD的获胜者,每个人都有四个位置匹配的分数.即,第一列,第一行与第一行第二行,第三行和第四行匹配的累积时间,以及后续列的依此类推.它可能能够进一步优化,并且重新措辞更短,但我试图保持逻辑相当容易阅读.请享用!
更新/编辑 我想到了它,并意识到尽管我现有的方法完全按照原始问题的要求进行,但它在O(n ^ 2)时间内完成,这相对较慢.但是如果我们对每个列的字母使用哈希键(每个键一个字母),并计算每个字母在列中出现的次数(作为哈希元素的值),我们可以在O(1)中进行求和. )时间,以及我们在O(n*c)时间内遍历列表(其中c是列数,n是单词数).还有一些设置时间(创建哈希).但我们仍然有很大的进步.这是每种技术的新版本,以及每种技术的基准比较.
use strict;
use warnings;
use List::Util qw/ max sum /;
use Benchmark qw/ cmpthese /;
my @words = qw/
PARTNERSHIPS
REPRIMANDING
CIVILIZATION
APPRECIATION
CONVERSATION
CIRCUMSTANCE
PURIFICATION
SECLUSIONIST
CONSTRUCTION
DISAPPEARING
TRANSMISSION
APPREHENSIVE
ENCOUNTERING
/;
# Just a test run for each solution.
my( $top, $indexes_ref );
($top, $indexes_ref ) = find_top_matches_force( \@words );
print "Testing force method: $top matches.\n";
print "@words[@$indexes_ref]\n";
( $top, $indexes_ref ) = find_top_matches_hash( \@words );
print "Testing hash method: $top matches.\n";
print "@words[@$indexes_ref]\n";
my $count = 20000;
cmpthese( $count, {
'Hash' => sub{ find_top_matches_hash( \@words ); },
'Force' => sub{ find_top_matches_force( \@words ); },
} );
sub find_top_matches_hash {
my $words = shift;
my @scores;
my $columns;
my $max_col = max( map { length $_ } @{$words} ) - 1;
foreach my $col_idx ( 0 .. $max_col ) {
$columns->[$col_idx]{ substr $_, $col_idx, 1 }++
for @{$words};
}
foreach my $word ( @{$words} ) {
my $score = sum(
map{
$columns->[$_]{ substr $word, $_, 1 } - 1
} 0 .. $max_col
);
push @scores, $score;
}
my $max = max( @scores );
my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
return( $max, \@max_ixs );
}
sub find_top_matches_force {
my $words = shift;
my @scores;
foreach my $word ( @{$words} ) {
my $score;
foreach my $comp_word ( @{$words} ) {
next if $comp_word eq $word;
foreach my $pos ( 0 .. ( length $word ) - 1 ) {
$score++ if
substr( $word, $pos, 1 ) eq substr( $comp_word, $pos, 1);
}
}
push @scores, $score;
}
my $max = max( @scores );
my ( @max_ixs ) = grep { $scores[$_] == $max } 0 .. $#scores;
return( $max, \@max_ixs );
}
Run Code Online (Sandbox Code Playgroud)
输出是:
Testing force method: 39 matches.
APPRECIATION
Testing hash method: 39 matches.
APPRECIATION
Rate Force Hash
Force 2358/s -- -74%
Hash 9132/s 287% --
Run Code Online (Sandbox Code Playgroud)
在您看到提供的其他选项后,我意识到您的原始规格发生了变化,这在某种程度上是创新的本质,但这个难题仍然存在于我的脑海中.如您所见,我的哈希方法比原始方法快287%.在更短的时间内更有趣!
作为起点,您可以有效地检查他们共有多少个字母:
$count = ($word1 ^ $word2) =~ y/\0//;
Run Code Online (Sandbox Code Playgroud)
但这只有在你遍历所有可能的单词对时才有用,在这种情况下这是不必要的:
use strict;
use warnings;
my @words = qw/
BAKER
SALER
BALER
CARER
RUFFR
/;
# you want a hash to indicate which letters are present how many times in each position:
my %count;
for my $word (@words) {
my @letters = split //, $word;
$count{$_}{ $letters[$_] }++ for 0..$#letters;
}
# then for any given word, you get the count for each of its letters minus one (because the word itself is included in the count), and see if it is a maximum (so far) for any position or for the total:
my %max_common_letters_count;
my %max_common_letters_words;
for my $word (@words) {
my @letters = split //, $word;
my $total;
for my $position (0..$#letters, 'total') {
my $count;
if ( $position eq 'total' ) {
$count = $total;
}
else {
$count = $count{$position}{ $letters[$position] } - 1;
$total += $count;
}
if ( ! $max_common_letters_count{$position} || $count >= $max_common_letters_count{$position} ) {
if ( $max_common_letters_count{$position} && $count == $max_common_letters_count{$position} ) {
push @{ $max_common_letters_words{$position} }, $word;
}
else {
$max_common_letters_count{$position} = $count;
$max_common_letters_words{$position} = [ $word ];
}
}
}
}
# then show the maximum words for each position and in total:
for my $position ( sort { $a <=> $b } grep $_ ne 'total', keys %max_common_letters_count ) {
printf( "Position %s had a maximum of common letters of %s in words: %s\n",
$position,
$max_common_letters_count{$position},
join(', ', @{ $max_common_letters_words{$position} })
);
}
printf( "The maximum total common letters was %s in words(s): %s\n",
$max_common_letters_count{'total'},
join(', ', @{ $max_common_letters_words{'total'} })
);
Run Code Online (Sandbox Code Playgroud)