为什么 `^[ ]{0,}` 不能与 Linux grep 一起使用？

Question

为什么 `^[ ]{0,}` 不能与 Linux grep 一起使用？

这是我的示例文本。grep w，grep ^w并且grep '^[ ]w'工作得很好。

[user@linux ~]$ grep w text.txt
whitespace 0
 whitespace 1
  whitespace 2
[user@linux ~]$

[user@linux ~]$ grep ^w text.txt
whitespace 0
[user@linux ~]$

Run Code Online (Sandbox Code Playgroud)

有1个空格

[user@linux ~]$ grep '^[ ]w' text.txt
 whitespace 1
[user@linux ~]$

Run Code Online (Sandbox Code Playgroud)

有 2 个空格，但得到相同的输出

[user@linux ~]$ grep '^[  ]w' text.txt
 whitespace 1
[user@linux ~]$

Run Code Online (Sandbox Code Playgroud)

根据https://regex101.com/，^[ ]{0,}是在行首查找空格的正确语法。然而，它在 Linux 上不能很好地与 GNU grep 配合使用。我收到错误Invalid regular expression：

[user@linux ~]$ grep ^[ ]{0,}w text.txt
grep: Invalid regular expression
[user@linux ~]$

Run Code Online (Sandbox Code Playgroud)

这些根本不返回任何东西

[user@linux ~]$ grep '^[ ]{0}w' text.txt
[user@linux ~]$ grep '^[ ]{1}w' text.txt
[user@linux ~]$ grep '^[ ]{2}w' text.txt
[user@linux ~]$ grep '^[ ]{0,}w' text.txt
[user@linux ~]$

Run Code Online (Sandbox Code Playgroud)

问题：可以^[ ]{0,}与 GNU grep 一起使用吗？如果是，我以前的语法有什么问题？

Answer 1

ter*_*don 5

这里有各种各样的问题。首先，该表达式的^[ ]w意思是：找到行的开头，然后正好是一个空格，然后是一个w。所以它实际上工作得很好。如果你想让它匹配一个或多个空格，你需要在[ ]字符类中添加一个限定符：

  $ grep '^[  ]\+w' text.txt
 whitespace 1
  whitespace 2

Run Code Online (Sandbox Code Playgroud)

意思是+“一个或多个”。使用的默认正则表达式风格grep称为 BRE（基本正则表达式），在该正则表达式风格中，+需要转义，因此\+上面的^*。或者，您可以通过传递标志来使用 ERE（扩展正则表达式）-E，或通过传递标志来使用 PCRE（Perl 兼容正则表达式）-P。使用这些正则表达式风格，您不需要转义+它即可充当量词：

$ grep -P '^[  ]+w' text.txt
 whitespace 1
  whitespace 2
$ grep -E '^[  ]+w' text.txt
 whitespace 1
  whitespace 2

Run Code Online (Sandbox Code Playgroud)

下一个问题，也是更重要的一个问题，是您没有引用正则表达式。必须使用引号来确保正则表达式按grep 原样传递，而不是首先由 shell 解释。但是，由于您没有引用它，因此它在传递给之前会被 shell 扩展grep。您可以使用set -x选项让 shell 打印它正在执行的操作来检查这一点：

$ set -x
$ grep ^[ ]{0,}w text.txt
+ grep '^[' ']0w' ']w' text.txt
grep: Invalid regular expression

Run Code Online (Sandbox Code Playgroud)

^[首先，因为和之间有一个空格]，shell 将其解释为两个单独的参数：^[和]{0,}w。但它们{}在 shell 中用于大括号扩展。例如：

$ echo foo{a,b}
fooa foob

Run Code Online (Sandbox Code Playgroud)

但是当扩展的第二部分为空时，您会得到：

$ echo foo{a,}
fooa foo

Run Code Online (Sandbox Code Playgroud)

所以，展开式]{0,}w就变成：

$ echo ]{0,}w
]0w ]w

Run Code Online (Sandbox Code Playgroud)

结果，正如您在set -x上面的输出中看到的，这三个参数是实际传递给的grep：

'^[' ']0w' ']w'

Run Code Online (Sandbox Code Playgroud)

但如果你确实引用了它们，那么在使用 BRE 时将需要对它们进行转义，就像+上面一样：

$ grep '^[ ]\{2\}w' text.txt
  whitespace 2

Run Code Online (Sandbox Code Playgroud)

最后一点：[ ]与完全相同，对单个字符使用字符类是没有意义的。



将所有这些放在一起，为了精确匹配行开头的一个空格，请使用：

$ grep '^ w' text.txt 
 whitespace 1
Run Code Online (Sandbox Code Playgroud)

要匹配一个或多个，请使用：

$ grep '^ \+w' text.txt 
 whitespace 1
  whitespace 2
Run Code Online (Sandbox Code Playgroud)

或者：

$ grep -E '^ +w' text.txt 
 whitespace 1
  whitespace 2
Run Code Online (Sandbox Code Playgroud)

或者

$ grep -P '^ +w' text.txt 
 whitespace 1
  whitespace 2
Run Code Online (Sandbox Code Playgroud)

要匹配特定的数字范围（例如 0、1 或 2 个空格）：

$ grep '^ \{0,3\}w' text.txt 
whitespace 0
 whitespace 1
  whitespace 2
Run Code Online (Sandbox Code Playgroud)

或者

$ grep -P '^ {0,3}w' text.txt 
whitespace 0
 whitespace 1
  whitespace 2
Run Code Online (Sandbox Code Playgroud)

或者

$ grep -E '^ {0,3}w' text.txt 
whitespace 0
 whitespace 1
  whitespace 2
Run Code Online (Sandbox Code Playgroud)

{}要匹配特定数字，请按如上所示设置该数字，或者仅重复该字符 N 次：

$ grep '^ \{2\}w' text.txt
  whitespace 2
$ grep '^ w' text.txt
 whitespace 1
$ grep '^  w' text.txt
  whitespace 2
Run Code Online (Sandbox Code Playgroud)

并始终引用您的正则表达式！



^*_{实际上，在 POSIX BRE 中，+没有特殊含义，但 GNU 实现的 BREgrep确实可以识别它，如果它被转义的话。}

归档时间：	6 年，4 月前
查看次数：	1049 次
最近记录：	6 年，4 月前