计算包含单词的行

Question

计算包含单词的行

我有一个多行的文件。我想知道，对于整个文件中出现的每个单词，有多少行包含该单词，例如：

0 hello world the man is world
1 this is the world
2 a different man is the possible one

Run Code Online (Sandbox Code Playgroud)

我期待的结果是：

0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2

Run Code Online (Sandbox Code Playgroud)

请注意，“world”的计数是 2，而不是 3，因为该词出现在 2 行上。因此，将空白转换为换行符并不是正确的解决方案。

Answer 1

ste*_*ver 5

另一个 Perl 变体，使用List::Util

$ perl -MList::Util=uniq -alne '
  map { $h{$_}++ } uniq @F }{ for $k (sort keys %h) {print "$k: $h{$k}"}
' file
0: 1
1: 1
2: 1
a: 1
different: 1
hello: 1
is: 3
man: 2
one: 1
possible: 1
the: 3
this: 1
world: 2

Run Code Online (Sandbox Code Playgroud)

Answer 2

gle*_*man 5

bash 中的直截了当：

declare -A wordcount
while read -ra words; do 
    # unique words on this line
    declare -A uniq
    for word in "${words[@]}"; do 
        uniq[$word]=1
    done
    # accumulate the words
    for word in "${!uniq[@]}"; do 
        ((wordcount[$word]++))
    done
    unset uniq
done < file

Run Code Online (Sandbox Code Playgroud)

看数据：

$ declare -p wordcount
declare -A wordcount='([possible]="1" [one]="1" [different]="1" [this]="1" [a]="1" [hello]="1" [world]="2" [man]="2" [0]="1" [1]="1" [2]="1" [is]="3" [the]="3" )'

Run Code Online (Sandbox Code Playgroud)

并根据需要进行格式化：

$ printf "%s\n" "${!wordcount[@]}" | sort | while read key; do echo "$key:${wordcount[$key]}"; done
0:1
1:1
2:1
a:1
different:1
hello:1
is:3
man:2
one:1
possible:1
the:3
this:1
world:2

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，8 月前
查看次数：	1099 次
最近记录：	6 年，8 月前