正则表达式:排除匹配,没有预见 - 是否可能？

Question

正则表达式:排除匹配,没有预见 - 是否可能？

在一些正则表达式中,不支持[负]零宽度断言(前瞻/后瞻).

这使得排除非常困难(不可能？).例如"每条线上没有 "foo"就可以了",如下所示:

^((?!foo).)*$

Run Code Online (Sandbox Code Playgroud)

如果不使用环视(目前预留的复杂性和性能问题),可以实现同样的目标吗？

Answer 1

jfs*_*jfs 30

更新:正如@Ciantic在评论中指出的那样, "在oo之前有两个ff"失败.

^(f(o[^o]|[^o])|[^f])*$

Run Code Online (Sandbox Code Playgroud)

注意:只是在客户端否定匹配而不是使用上面的正则表达式要容易得多.

正则表达式假设每行都以换行符结尾,如果不是那么请参阅C++和grep的正则表达式.

Perl,Python,C++中的示例程序grep都提供相同的输出.

perl的

#!/usr/bin/perl -wn
print if /^(f(o[^o]|[^o])|[^f])*$/;

Run Code Online (Sandbox Code Playgroud)

蟒蛇

#!/usr/bin/env python
import fileinput, re, sys
from itertools import ifilter

re_not_foo = re.compile(r"^(f(o[^o]|[^o])|[^f])*$")
for line in ifilter(re_not_foo.match, fileinput.input()):
    sys.stdout.write(line)

Run Code Online (Sandbox Code Playgroud)

C++

#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main()
{
  boost::regex re("^(f(o([^o]|$)|([^o]|$))|[^f])*$");
  //NOTE: "|$"s are there due to `getline()` strips newline char

  std::string line;
  while (std::getline(std::cin, line)) 
    if (boost::regex_match(line, re))
      std::cout << line << std::endl;
}

Run Code Online (Sandbox Code Playgroud)

grep的

$ grep "^\(f\(o\([^o]\|$\)\|\([^o]\|$\)\)\|[^f]\)*$" in.txt

Run Code Online (Sandbox Code Playgroud)

样本文件:

foo
'foo'
abdfoode
abdfode
abdfde
abcde
f

fo
foo
fooo
ofooa
ofo
ofoo

Run Code Online (Sandbox Code Playgroud)

输出:

abdfode
abdfde
abcde
f

fo
ofo

Run Code Online (Sandbox Code Playgroud)

这个正则表达式不正确.它与`f`,`fo`或`barf`不匹配.但是这个:`(f(o([^ o] | $)| [^ o] | $)| [^ f])*$` (2认同)
在oo之前,答案似乎不适用于带有两个ff的`somethingffoosomething`. (2认同)
不错的答案，但`foo` 有 2 个相似的字符这一事实并不能使问答通用。用`abc`会更好 (2认同)

Answer 2

小智 5

遇到了这个问题，并认为没有一个完整的正则表达式是个人挑战。我相信我已经成功地创建一个正则表达式是不会对所有输入的工作-只要你可以使用原子团/占有欲量词。

当然，我不知道是否有是允许原子团而不是环视任何口味，但问题问，如果它在正则表达式可能状态，而不环视排除，它是技术上是可行的：

\A(?:$|[^f]++|f++(?:[^o]|$)|(?:f++o)*+(?:[^o]|$))*\Z

Run Code Online (Sandbox Code Playgroud)

解释：

\A                         #Start of string
(?:                        #Non-capturing group
    $                      #Consume end-of-line. We're not in foo-mode.
    |[^f]++                #Consume every non-'f'. We're not in foo-mode.
    |f++(?:[^o]|$)          #Enter foo-mode with an 'f'. Consume all 'f's, but only exit foo-mode if 'o' is not the next character. Thus, 'f' is valid but 'fo' is invalid.
    |(?:f++o)*+(?:[^o]|$)  #Enter foo-mode with an 'f'. Consume all 'f's, followed by a single 'o'. Repeat, since '(f+o)*' by itself cannot contain 'foo'. Only exit foo-mode if 'o' is not the next character following (f+o). Thus, 'fo' is valid but 'foo' is invalid.
)*                         #Repeat the non-capturing group
\Z                         #End of string. Note that this regex only works in flavours that can match $\Z

Run Code Online (Sandbox Code Playgroud)

如果出于某种原因，您可以使用原子分组但不能使用所有格量词或环视，则可以使用：

\A(?:$|(?>[^f]+)|(?>f+)(?:[^o]|$)|(?>(?:(?>f+)o)*)(?:[^o]|$))*\Z

Run Code Online (Sandbox Code Playgroud)

但是，正如其他人指出的那样，通过其他方式否定匹配可能更实用。

归档时间：	16 年，9 月前
查看次数：	15265 次
最近记录：	7 年，5 月前