从文件中提取单词

And*_*ock 13 unix shell scripting

我正在尝试从文件集合中创建单词词典.是否有一种简单的方法可以打印文件中的所有单词,每行一个?

ram*_*ion 25

你可以使用grep:

  • -E '\w+' 搜索单词
  • -o 仅打印匹配行的部分
% cat temp
Some examples use "The quick brown fox jumped over the lazy dog,"
rather than "Lorem ipsum dolor sit amet, consectetur adipiscing elit"
for example text.
# if you don't care whether words repeat
% grep -o -E '\w+' temp
Some
examples
use
The
quick
brown
fox
jumped
over
the
lazy
dog
rather
than
Lorem
ipsum
dolor
sit
amet
consectetur
adipiscing
elit
for
example
text

如果您只想打印每个单词一次,无论如何,您都可以使用 sort

  • -u 只打印一次单词
  • -f告诉sort在比较单词时忽略大小写
# if you only want each word once
% grep -o -E '\w+' temp | sort -u -f
adipiscing
amet
brown
consectetur
dog
dolor
elit
example
examples
for
fox
ipsum
jumped
lazy
Lorem
over
quick
rather
sit
Some
text
than
The
use