用 \s grep 最多 3 个空格

Question

用 \s grep 最多 3 个空格

Wol*_*olf 3 grep text-processing regular-expression

根据以下教程

https://linuxize.com/post/regular-expressions-in-grep/

\s 匹配一个空格。

和

https://www.guru99.com/linux-regular-expressions.html

一些区间正则表达式是：

表达式描述

{n} 精确匹配出现 'n' 次的前一个字符

{n,m} 匹配前面出现 'n' 次但不超过 m 的字符

{n, } 仅当前面的字符出现 'n' 次或更多次时才匹配

示例文件

wolf@linux:~$ cat space.txt
0space
1 spaces
2  spaces
3   spaces
4    spaces
wolf@linux:~$

Run Code Online (Sandbox Code Playgroud)

我只想 grep 最多 3 个空格，最少 1 个空格，最多 3 个空格不幸的是，它并没有真正按预期工作。

wolf@linux:~$ cat space.txt | grep -P '\s{1,3}'
1 spaces
2  spaces
3   spaces
4    spaces
wolf@linux:~$ 

wolf@linux:~$ cat space.txt | grep -P '\s{3}'
3   spaces
4    spaces
wolf@linux:~$ 

wolf@linux:~$ cat space.txt | grep -P '\s{3,3}'
3   spaces
4    spaces
wolf@linux:~$ 

wolf@linux:~$ cat space.txt | grep -P '\s{0,3}'
0space
1 spaces
2  spaces
3   spaces
4    spaces
wolf@linux:~$

Run Code Online (Sandbox Code Playgroud)

期望输出

wolf@linux:~$ cat space.txt | grep -P '\s{0,3}' <- need to fix it here
1 spaces
2  spaces
3   spaces
wolf@linux:~$

Run Code Online (Sandbox Code Playgroud)

Answer 1

αғs*_*нιη 8

你需要：

grep -P '\S\s{1,3}\S' infile

Run Code Online (Sandbox Code Playgroud)

\s匹配一个空白字符，而不仅仅是一个空格。
\S匹配一个非空白字符

在您的尝试中，您并没有限制匹配之前和之后不应该是空格。

要仅过滤空间并避免使用 PCRE，您可以执行以下操作：

grep '[^ ] \{1,3\}[^ ]' infile

Run Code Online (Sandbox Code Playgroud)

或处理具有前导/尾随 1~3 个空格的行：

grep '\([^ ]\|^\) \{1,3\}\([^ ]\|$\)' infile

Run Code Online (Sandbox Code Playgroud)

输入数据 ( cat -e infile):

0space$
1 spaces$
2  spaces$
3   spaces$
4    spaces$
   3spaces$
    4space$
3spaces   $
4spaces    $

Run Code Online (Sandbox Code Playgroud)

输出：

1 spaces$
2  spaces$
3   spaces$
   3spaces$
3spaces   $

Run Code Online (Sandbox Code Playgroud)

您的 `\S\s{1,3}\S` 不会匹配行首或行尾的三个空格。 (3认同)
谢谢@αғsнιη，我知道了。问题是由于非空白字符，它是空白之前的数字，不是吗？ (2认同)
@Wolf 不，问题是“`你没有限制你的匹配之前和之后（最小 1 和最大 3 个空格在这里）不应该是空格`”并且你的 grep 匹配行`4\ \ \ \空格`，以及，因为它匹配条件“_line with at least 1 and up to 3 whitespaces_” (2认同)

Answer 2

Sté*_*las 8

如果您想匹配 1 到 3 个没有被空格包围的空格字符的序列，那么您可以使用 Perl 环视运算符：

grep -P '(?<!\s)\s{1,3}(?!\s)'

Run Code Online (Sandbox Code Playgroud)

它匹配：

         1
1234567890123456789
    a b  c   d    e
     ^ ^^ ^^^

Run Code Online (Sandbox Code Playgroud)

使用 standard grep，您可以通过以下方式实现相同的效果：

grep -E '(^|[^[:space:]])[[:space:]]{1,3}([^[:space:]]|$)'

Run Code Online (Sandbox Code Playgroud)

这次我们匹配 1 到 3 个空白字符的序列和任一侧（或主题的开始 ( ^) 或结束 ( $)）的非空白字符。

         1
1234567890123456789
   a b  c   d    e
^^^^ ^^^^

Run Code Online (Sandbox Code Playgroud)

（使用-o（GNU 扩展），您会发现它不会报告a b为a之前已经匹配的；当搜索更多匹配项时，它从最后一个匹配项之后的下一个字符开始）。

没有-E，您将获得没有交替运算符的基本正则表达式（尽管某些grep实现支持\|将其作为扩展），但通常情况下，您仍然可以这样做：

grep -x '\(.*[^[:space:]]\)\{0,1\}[[:space:]]\{1,3\}\([^[:space:]].*\)\{0,1\}'

Run Code Online (Sandbox Code Playgroud)

这一次，正则表达式匹配整行，包括 1 到 3 个空格和一个可选的（\{0,1\}相当于 ERE ?）前导部分以非空格结尾，后面的可选部分以非空格开头。

         1
1234567890123456789
   a b  c   d    e
^^^^^^^^^^^^^^^^^^

Run Code Online (Sandbox Code Playgroud)

在任何情况下，这些仍然会返回包含 4 个或更多空格序列的行，只要它们还包含 1 到 3 个没有被空格包围的空格序列。

如果要排除包含 4 个或更多空格序列的行，那么它就是：

grep -vE '[[:space:]]{4}'

Run Code Online (Sandbox Code Playgroud)

或者，如果您仍然需要至少一个空格，或者换句话说，该行包含一个或多个空格字符序列，所有这些空格字符都至少有一个空格但不超过 3 个：

grep -vE -e '[[:space:]]{4}' -e '^[^[:space:]]*$'

Run Code Online (Sandbox Code Playgroud)

即返回除包含 4 个空格序列的行和仅由非空格组成的行之外的所有行。

或者再次使用 Perl 的操作符查看：

grep -P '^(?=.*\s)(?!.*\s{4})'

Run Code Online (Sandbox Code Playgroud)

这与行的开头匹配，前提是它后跟任意数量的字符和一个空格，并且后面没有任何数量的字符和 4 个空格的序列。

尽管使用sed或awk在同一调用中可以同时进行正匹配和负匹配的位置会更清晰：

awk '/[[:space:]]/ && ! /[[:space:]]{4}/'

Run Code Online (Sandbox Code Playgroud)

sed '/[[:space:]]/!d; /[[:space:]]\{4\}/d'

Run Code Online (Sandbox Code Playgroud)

Answer 3

nez*_*dka 6

你可以从对面过来。排除子字符串中空格超过 3 个的行。

grep -Ev '\s{4,}'

Run Code Online (Sandbox Code Playgroud)

-v反转匹配的意义，以选择不匹配的行。
您可以将锚点插入为非空白字符

grep -E '\S\s{1,3}\S'

Run Code Online (Sandbox Code Playgroud)

谢谢@nezabudka，这是正确的。然而，我刚刚意识到我想要的输出是错误的。我已经更新了。知道如何使用“-v”删除第一行吗？ (2认同)

归档时间：	5 年，6 月前
查看次数：	912 次
最近记录：	5 年，6 月前