如何使用 grep 搜索包含两个单词之一但不是两个单词的行?

Luk*_*ali 13 grep text-processing

我想在文本文件中搜索带有 'word1' XOR 'word2' 的行。所以它应该输出带有 word1、word2 的行,而不是带有这两个词的行。我想使用 XOR 但我不知道如何在 linux 命令行中编写它。

我试过:

grep 'word1\|word2' text.txt
grep word1 word2 text.txt
grep word1 text.txt | grep word2
grep 'word1\^word2' text.txt
Run Code Online (Sandbox Code Playgroud)

还有更多,但无法成功。

Sté*_*las 17

使用 GNU awk

$ printf '%s\n' {foo,bar}{bar,foo} neither | gawk 'xor(/foo/,/bar/)'
foofoo
barbar
Run Code Online (Sandbox Code Playgroud)

或者便携:

awk '((/foo/) + (/bar/)) % 2'
Run Code Online (Sandbox Code Playgroud)

随着grep用于支持-P(PCRE):

grep -P '^((?=.*foo)(?!.*bar)|(?=.*bar)(?!.*foo))'
Run Code Online (Sandbox Code Playgroud)

sed

sed '
  /foo/{
    /bar/d
    b
  }
  /bar/!d'
Run Code Online (Sandbox Code Playgroud)

如果您只想考虑整个单词(例如既没有foo也没有barinfoobarbarbar),您需要决定如何分隔这些单词。如果它是由字母、数字和下划线以外的任何字符组成的,就像-w许多grep实现的选项一样,那么您可以将它们更改为:

gawk 'xor(/\<foo\>/,/\<bar\>/)'
awk '((/(^|[^[:alnum:]_)foo([^[:alnum:]_]|$)/) + \
      (/(^|[^[:alnum:]_)bar([^[:alnum:]_]|$)/)) % 2'
grep -P '^((?=.*\bfoo\b)(?!.*\bbar\b)|(?=.*\bbar\b)(?!.*\bfoo\b))'
Run Code Online (Sandbox Code Playgroud)

因为sed这会变得有点复杂,除非你有一个sed像 GNUsed 这样支持\</\>作为像 GNU 那样的词边界的awk实现。

  • Stephane,请写一本关于 shell 脚本的书! (6认同)

Gil*_*il' 9

grep 'word1\|word2' text.txt searches for lines containing word1 or word2. This includes lines that contain both.

grep word1 text.txt | grep word2 searches for lines containing word1 and word2. The two words can overlap (e.g. foobar contains foo and ob). Another way to search for lines containing both words, but only in a non-overlapping way, is to search for them in either order: grep 'word1.*word2\|word2.*word1' text.txt

grep word1 text.txt | grep -v word2 searches for lines containing word1 but not word2. The -v option tells grep to keep non-matching lines and remove matching lines, instead of the opposite. This gives you half the results you wanted. By adding the symmetric search, you get all the lines containing exactly one of the words.

grep word1 text.txt | grep -v word2
grep word2 text.txt | grep -v word1
Run Code Online (Sandbox Code Playgroud)

Alternatively, you can start from the lines containing either word, and remove the lines containing both words. Given the building blocks above, this is easy if the words don't overlap.

grep 'word1\|word2' text.txt | grep -v 'word1.*word2\|word2.*word1'
Run Code Online (Sandbox Code Playgroud)