Tom*_*lak 27 regex regex-negation
在一些正则表达式中,不支持[负]零宽度断言(前瞻/后瞻).
这使得排除非常困难(不可能?).例如"每条线上没有 "foo"就可以了",如下所示:
^((?!foo).)*$
Run Code Online (Sandbox Code Playgroud)
如果不使用环视(目前预留的复杂性和性能问题),可以实现同样的目标吗?
jfs*_*jfs 30
更新:正如@Ciantic在评论中指出的那样, "在oo之前有两个ff"失败.
^(f(o[^o]|[^o])|[^f])*$
Run Code Online (Sandbox Code Playgroud)
注意:只是在客户端否定匹配而不是使用上面的正则表达式要容易得多.
正则表达式假设每行都以换行符结尾,如果不是那么请参阅C++和grep的正则表达式.
Perl,Python,C++中的示例程序grep都提供相同的输出.
#!/usr/bin/perl -wn
print if /^(f(o[^o]|[^o])|[^f])*$/;
Run Code Online (Sandbox Code Playgroud)#!/usr/bin/env python
import fileinput, re, sys
from itertools import ifilter
re_not_foo = re.compile(r"^(f(o[^o]|[^o])|[^f])*$")
for line in ifilter(re_not_foo.match, fileinput.input()):
sys.stdout.write(line)
Run Code Online (Sandbox Code Playgroud)C++
#include <iostream>
#include <string>
#include <boost/regex.hpp>
int main()
{
boost::regex re("^(f(o([^o]|$)|([^o]|$))|[^f])*$");
//NOTE: "|$"s are there due to `getline()` strips newline char
std::string line;
while (std::getline(std::cin, line))
if (boost::regex_match(line, re))
std::cout << line << std::endl;
}
Run Code Online (Sandbox Code Playgroud)$ grep "^\(f\(o\([^o]\|$\)\|\([^o]\|$\)\)\|[^f]\)*$" in.txt
Run Code Online (Sandbox Code Playgroud)样本文件:
foo
'foo'
abdfoode
abdfode
abdfde
abcde
f
fo
foo
fooo
ofooa
ofo
ofoo
Run Code Online (Sandbox Code Playgroud)
输出:
abdfode
abdfde
abcde
f
fo
ofo
Run Code Online (Sandbox Code Playgroud)
小智 5
遇到了这个问题,并认为没有一个完整的正则表达式是个人挑战。我相信我已经成功地创建一个正则表达式是不会对所有输入的工作-只要你可以使用原子团/占有欲量词。
当然,我不知道是否有是允许原子团而不是环视任何口味,但问题问,如果它在正则表达式可能状态,而不环视排除,它是技术上是可行的:
\A(?:$|[^f]++|f++(?:[^o]|$)|(?:f++o)*+(?:[^o]|$))*\Z
Run Code Online (Sandbox Code Playgroud)
解释:
\A #Start of string
(?: #Non-capturing group
$ #Consume end-of-line. We're not in foo-mode.
|[^f]++ #Consume every non-'f'. We're not in foo-mode.
|f++(?:[^o]|$) #Enter foo-mode with an 'f'. Consume all 'f's, but only exit foo-mode if 'o' is not the next character. Thus, 'f' is valid but 'fo' is invalid.
|(?:f++o)*+(?:[^o]|$) #Enter foo-mode with an 'f'. Consume all 'f's, followed by a single 'o'. Repeat, since '(f+o)*' by itself cannot contain 'foo'. Only exit foo-mode if 'o' is not the next character following (f+o). Thus, 'fo' is valid but 'foo' is invalid.
)* #Repeat the non-capturing group
\Z #End of string. Note that this regex only works in flavours that can match $\Z
Run Code Online (Sandbox Code Playgroud)
如果出于某种原因,您可以使用原子分组但不能使用所有格量词或环视,则可以使用:
\A(?:$|(?>[^f]+)|(?>f+)(?:[^o]|$)|(?>(?:(?>f+)o)*)(?:[^o]|$))*\Z
Run Code Online (Sandbox Code Playgroud)
但是,正如其他人指出的那样,通过其他方式否定匹配可能更实用。