Perl脚本问题

Kal*_*son 2 arrays perl hash initialization

该脚本的目的是处理文件中的所有单词并输出最多出现的所有单词.因此,如果有3个单词每次出现10次,程序应输出所有单词.

脚本现在运行了,感谢我在这里得到的一些提示.但是,它不处理大型文本文件(即新约).我不确定这是我的错还是仅限于代码.我相信该程序还有其他几个问题,所以任何帮助都将不胜感激.

#!/usr/bin/perl -w
require 5.10.0;

print "Your file: " . $ARGV[0] . "\n";
#Make sure there is only one argument
if ($#ARGV == 0){

    #Make sure the argument is actually a file
    if (-f $ARGV[0]){

        %wordHash = ();     #New hash to match words with word counts
        $file=$ARGV[0];     #Stores value of argument
        open(FILE, $file) or die "File not opened correctly.";

        #Process through each line of the file
        while (<FILE>){
            chomp;
            #Delimits on any non-alphanumeric
            @words=split(/[^a-zA-Z0-9]/,$_);
            $wordSize = @words;

            #Put all words to lowercase, removes case sensitivty
            for($x=0; $x<$wordSize; $x++){
                $words[$x]=lc($words[$x]);
            }

            #Puts each occurence of word into hash
            foreach $word(@words){
                $wordHash{$word}++;
            }
        }
        close FILE;

        #$wordHash{$b} <=> $wordHash{$a};
        $wordList="";
        $max=0;

        while (($key, $value) = each(%wordHash)){
            if($value>$max){
                $max=$value;
            }
            }

        while (($key, $value) = each(%wordHash)){
            if($value==$max && $key ne "s"){
                $wordList.=" " . $key;
            }
            }       

        #Print solution
        print "The following words occur the most (" . $max . " times): " . $wordList . "\n";
    }
    else {
        print "Error. Your argument is not a file.\n";
    }
}
else {
    print "Error. Use exactly one argument.\n";
}
Run Code Online (Sandbox Code Playgroud)

TLP*_*TLP 6

您的问题在于脚本顶部的两行缺失:

use strict;
use warnings;
Run Code Online (Sandbox Code Playgroud)

如果他们在那里,他们会报告这样的很多行:

Argument "make" isn't numeric in array element at ...

来自这一行:

$list[$_] = $wordHash{$_} for keys %wordHash;
Run Code Online (Sandbox Code Playgroud)

数组元素只能是数字,因为你的键是单词,所以不起作用.这里发生的是,任何随机字符串都被强制转换为数字,对于任何不以数字开头的字符串,都是如此0.

您的代码可以很好地读取数据,尽管我会以不同的方式编写它.只有在那之后你的代码变得笨拙.

就像我所知,你正在尝试打印出最常出现的单词,在这种情况下你应该考虑以下代码:

use strict;
use warnings;

my %wordHash;
#Make sure there is only one argument
die "Only one argument allowed." unless @ARGV == 1;
while (<>) {    # Use the diamond operator to implicitly open ARGV files
    chomp;
    my @words = grep $_,           # disallow empty strings
        map lc,                    # make everything lower case
            split /[^a-zA-Z0-9]/;  # your original split
    foreach my $word (@words) {
        $wordHash{$word}++;
    }
}

for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) {
    printf "%-6s %s\n", $wordHash{$word}, $word;
}
Run Code Online (Sandbox Code Playgroud)

如您所知,您可以根据哈希值进行排序.