Mik*_*ton 19 python regex perl regex-lookarounds
我无法理解负前瞻正则表达式的更精细细节.在阅读了Regex前瞻,后视和原子组之后,当我找到这个描述时,我认为我对负向前瞻有了很好的总结:
(?!REGEX_1)REGEX_2仅在
REGEX_1不匹配时匹配; 检查后REGEX_1,搜索REGEX_2开始于同一位置.
希望我理解算法,我做了两句话侮辱; 我想找一个没有一个字的句子.特别...
侮辱: 'Yomama很难看.而且,她闻起来像一只湿狗.
要求:
- 测试1:返回一个没有'丑陋'的句子.
- 测试2:返回没有"外观"的句子.
- 测试3:返回没有'气味'的句子.
我将测试单词分配给了$arg,我过去常常(?:(?![A-Z].*?$arg.*?\.))([A-Z].*?\.)执行测试.
(?![A-Z].*?$arg.*?\.) 用测试词拒绝一个句子是一个消极的先行([A-Z].*?\.)匹配至少一个句子.关键部分似乎是在理解正则表达式引擎在处理负前瞻后开始匹配的位置.
预期成果:
- 测试1($ arg ="丑陋"):"而且,她闻起来像一只湿狗."
- 测试2($ arg ="看起来"):"Yomama很难看."
- 测试3($ arg ="气味"):"Yomama很难看."
实际结果:
- 测试1($ arg ="丑陋"):"而且,她闻起来像一只湿狗." (成功)
- 测试2($ arg ="看起来"):"Yomama很难看." (成功)
- 测试3($ arg ="气味"):失败,不匹配
起初我认为测试3失败了,因为([A-Z].*?\.)太贪心并且匹配两个句子; 但是,(?:(?![A-Z].*?$arg.*?\.))([A-Z][^\.]*?\.)也没有用.接下来我想知道python否定前瞻实现是否存在问题,但perl给了我完全相同的结果.
最后我找到了解决方案,我不得不.*?通过使用来拒绝表达式中的句点[^\.]*?; 所以这个正则表达式工作:(?:(?![A-Z][^\.]*?$arg[^\.]*?\.))([A-Z][^\.]*?\.)
但是,我有另一个问题; "Yomama很难看." 它里面没有"气味".所以,如果.*?应该是一个非贪婪的比赛,为什么我不能完成测试3 (?:(?![A-Z].*?$arg.*?\.))([A-Z].*?\.)?
根据@ bvr的优秀使用建议-Mre=debug,我会在下班后再考虑这个问题.看起来Seth的描述在这一点上看起来很准确.到目前为止我学到的是,即使我把非贪婪的.*?算子放在NLA中,负面的前瞻表达也会尽可能匹配.
import re
def test_re(arg, INSULTSTR):
mm = re.search(r'''
(?: # No grouping
(?![A-Z].*?%s.*?\.)) # Negative zero-width
# assertion: arg, followed by a period
([A-Z].*?\.) # Match a capital letter followed by a period
''' % arg, INSULTSTR, re.VERBOSE)
if mm is not None:
print "neg-lookahead(%s) MATCHED: '%s'" % (arg, mm.group(1))
else:
print "Unable to match: neg-lookahead(%s) in '%s'" % (arg, INSULTSTR)
INSULT = 'Yomama is ugly. And, she smells like a wet dog.'
test_re('ugly', INSULT)
test_re('looks', INSULT)
test_re('smells', INSULT)
Run Code Online (Sandbox Code Playgroud)
#!/usr/bin/perl
sub test_re {
$arg = $_[0];
$INSULTSTR = $_[1];
$INSULTSTR =~ /(?:(?![A-Z].*?$arg.*?\.))([A-Z].*?\.)/;
if ($1) {
print "neg-lookahead($arg) MATCHED: '$1'\n";
} else {
print "Unable to match: neg-lookahead($arg) in '$INSULTSTR'\n";
}
}
$INSULT = 'Yomama is ugly. And, she smells like a wet dog.';
test_re('ugly', $INSULT);
test_re('looks', $INSULT);
test_re('smells', $INSULT);
Run Code Online (Sandbox Code Playgroud)
neg-lookahead(ugly) MATCHED: 'And, she smells like a wet dog.'
neg-lookahead(looks) MATCHED: 'Yomama is ugly.'
Unable to match: neg-lookahead(smells) in 'Yomama is ugly. And, she smells like a wet dog.'
Run Code Online (Sandbox Code Playgroud)
#!/usr/bin/perl
sub test_re {
$arg = $_[0];
$INSULTSTR = $_[1];
$INSULTSTR =~ /(?:^|\.\s*)(?:(?![^.]*?$arg[^.]*\.))([^.]*\.)/;
if ($1) {
print "neg-lookahead($arg) MATCHED: '$1'\n";
} else {
print "Unable to match: neg-lookahead($arg) in '$INSULTSTR'\n";
}
}
$INSULT = 'Yomama is ugly. And, she smells like an wet dog.';
test_re('Yomama', $INSULT);
test_re('ugly', $INSULT);
test_re('looks', $INSULT);
test_re('And', $INSULT);
test_re('And,', $INSULT);
test_re('smells', $INSULT);
test_re('dog', $INSULT);
Run Code Online (Sandbox Code Playgroud)
结果:
neg-lookahead(Yomama) MATCHED: 'And, she smells like an wet dog.'
neg-lookahead(ugly) MATCHED: 'And, she smells like an wet dog.'
neg-lookahead(looks) MATCHED: 'Yomama is ugly.'
neg-lookahead(And) MATCHED: 'Yomama is ugly.'
neg-lookahead(And,) MATCHED: 'Yomama is ugly.'
neg-lookahead(smells) MATCHED: 'Yomama is ugly.'
neg-lookahead(dog) MATCHED: 'Yomama is ugly.'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3944 次 |
| 最近记录: |