我拥有的文件被调用test
,它包含以下几行:
This is a test Test test test There are multiple tests.
Run Code Online (Sandbox Code Playgroud)
我希望输出是:
test@3 tests@1 multiple@1 is@1 are@1 a@1 This@1 There@1 Test@1
Run Code Online (Sandbox Code Playgroud)
我有以下脚本:
cat $1 | tr ' ' '\n' > temp # put all words to a new line
echo -n > file2.txt # clear file2.txt
for line in $(cat temp) # trace each line from temp file
do
# check if the current line is visited
grep -q $line file2.txt
if [ $line==$temp]
then
count= expr `$count + 1` #count the number of words
echo $line"@"$count >> file2.txt # add word and frequency to file
fi
done
Run Code Online (Sandbox Code Playgroud)
使用sort | uniq -c | sort -n
创建一个频率表。需要进行更多调整才能获得所需的格式。
tr ' ' '\n' < "$1" \
| sort \
| uniq -c \
| sort -rn \
| awk '{print $2"@"$1}' \
| tr '\n' ' '
Run Code Online (Sandbox Code Playgroud)