尝试使用脚本查找文件中单词的频率

Question

尝试使用脚本查找文件中单词的频率

我拥有的文件被调用test，它包含以下几行：

This is a test Test test test There are multiple tests.

Run Code Online (Sandbox Code Playgroud)

我希望输出是：

test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1

Run Code Online (Sandbox Code Playgroud)

我有以下脚本：

 cat $1 | tr ' ' '\n' > temp # put all words to a new line
    echo -n > file2.txt # clear file2.txt
    for line in $(cat temp)  # trace each line from temp file
    do
    # check if the current line is visited
     grep -q $line file2.txt 
     if [ $line==$temp] 
     then
    count= expr `$count + 1` #count the number of words
     echo $line"@"$count >> file2.txt # add word and frequency to file
     fi
    done

Run Code Online (Sandbox Code Playgroud)

Answer 1

cho*_*oba 5

使用sort | uniq -c | sort -n创建一个频率表。需要进行更多调整才能获得所需的格式。

 tr ' ' '\n' < "$1" \
 | sort \
 | uniq -c \
 | sort -rn \
 | awk '{print $2"@"$1}' \
 | tr '\n' ' '

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，7 月前
查看次数：	6669 次
最近记录：	7 年，7 月前