相关疑难解决方法(0)

如何创建文件中每个单词的频率列表？

我有这样一个文件:

This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.

Run Code Online (Sandbox Code Playgroud)

我想生成一个两列列表.第一列显示出现的单词,第二列显示出现的频率,例如:

this@1
is@1
a@1
file@1
with@1
many@1
words3
some@2
of@2
the@2
only@1
appear@2
more@1
than@1
one@1
once@1
time@1

Run Code Online (Sandbox Code Playgroud)

为了使这项工作更简单,在处理列表之前,我将删除所有标点符号,并将所有文本更改为小写字母.
除非有一个简单的解决方案,words并且word可以算作两个单独的单词.

到目前为止,我有这个:

sed -i "s/ /\n/g" ./file1.txt # put all words on a new line
while read line
do
     count="$(grep -c $line file1.txt)"
     echo $line"@"$count >> file2.txt # add word and frequency to …

Run Code Online (Sandbox Code Playgroud)

bash file-io grep sed

Vil*_*age

2014 12-15

33
推荐指数

5
解决办法

5万
查看次数

标签统计

bash ×1

file-io ×1

grep ×1

sed ×1

如何创建文件中每个单词的频率列表？

标签 统计

标签统计