Dan*_*ski 45 python regex perl
大约一年前我从Perl切换到Python,并没有回头.我发现只有一个习惯用法在Perl中比在Python中更容易做到:
if ($var =~ /foo(.+)/) {
# do something with $1
} elsif ($var =~ /bar(.+)/) {
# do something with $1
} elsif ($var =~ /baz(.+)/) {
# do something with $1
}
Run Code Online (Sandbox Code Playgroud)
相应的Python代码并不那么优雅,因为if语句不断嵌套:
m = re.search(r'foo(.+)', var)
if m:
# do something with m.group(1)
else:
m = re.search(r'bar(.+)', var)
if m:
# do something with m.group(1)
else:
m = re.search(r'baz(.+)', var)
if m:
# do something with m.group(2)
Run Code Online (Sandbox Code Playgroud)
有没有人有一种优雅的方式在Python中重现这种模式?我已经看过使用匿名函数调度表,但对于少数正则表达式来说,这些对我来说似乎有点笨拙......
Tho*_*ers 18
使用命名组和调度表:
r = re.compile(r'(?P<cmd>foo|bar|baz)(?P<data>.+)')
def do_foo(data):
...
def do_bar(data):
...
def do_baz(data):
...
dispatch = {
'foo': do_foo,
'bar': do_bar,
'baz': do_baz,
}
m = r.match(var)
if m:
dispatch[m.group('cmd')](m.group('data'))
Run Code Online (Sandbox Code Playgroud)
通过一些内省,您可以自动生成正则表达式和调度表.
Pat*_*otz 10
是的,这有点烦人.也许这适用于您的情况.
import re
class ReCheck(object):
def __init__(self):
self.result = None
def check(self, pattern, text):
self.result = re.search(pattern, text)
return self.result
var = 'bar stuff'
m = ReCheck()
if m.check(r'foo(.+)',var):
print m.result.group(1)
elif m.check(r'bar(.+)',var):
print m.result.group(1)
elif m.check(r'baz(.+)',var):
print m.result.group(1)
Run Code Online (Sandbox Code Playgroud)
编辑: Brian正确地指出我的第一次尝试不起作用.不幸的是,这种尝试更长.
Mar*_*rot 10
r"""
This is an extension of the re module. It stores the last successful
match object and lets you access it's methods and attributes via
this module.
This module exports the following additional functions:
expand Return the string obtained by doing backslash substitution on a
template string.
group Returns one or more subgroups of the match.
groups Return a tuple containing all the subgroups of the match.
start Return the indices of the start of the substring matched by
group.
end Return the indices of the end of the substring matched by group.
span Returns a 2-tuple of (start(), end()) of the substring matched
by group.
This module defines the following additional public attributes:
pos The value of pos which was passed to the search() or match()
method.
endpos The value of endpos which was passed to the search() or
match() method.
lastindex The integer index of the last matched capturing group.
lastgroup The name of the last matched capturing group.
re The regular expression object which as passed to search() or
match().
string The string passed to match() or search().
"""
import re as re_
from re import *
from functools import wraps
__all__ = re_.__all__ + [ "expand", "group", "groups", "start", "end", "span",
"last_match", "pos", "endpos", "lastindex", "lastgroup", "re", "string" ]
last_match = pos = endpos = lastindex = lastgroup = re = string = None
def _set_match(match=None):
global last_match, pos, endpos, lastindex, lastgroup, re, string
if match is not None:
last_match = match
pos = match.pos
endpos = match.endpos
lastindex = match.lastindex
lastgroup = match.lastgroup
re = match.re
string = match.string
return match
@wraps(re_.match)
def match(pattern, string, flags=0):
return _set_match(re_.match(pattern, string, flags))
@wraps(re_.search)
def search(pattern, string, flags=0):
return _set_match(re_.search(pattern, string, flags))
@wraps(re_.findall)
def findall(pattern, string, flags=0):
matches = re_.findall(pattern, string, flags)
if matches:
_set_match(matches[-1])
return matches
@wraps(re_.finditer)
def finditer(pattern, string, flags=0):
for match in re_.finditer(pattern, string, flags):
yield _set_match(match)
def expand(template):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.expand(template)
def group(*indices):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.group(*indices)
def groups(default=None):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.groups(default)
def groupdict(default=None):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.groupdict(default)
def start(group=0):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.start(group)
def end(group=0):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.end(group)
def span(group=0):
if last_match is None:
raise TypeError, "No successful match yet."
return last_match.span(group)
del wraps # Not needed past module compilation
Run Code Online (Sandbox Code Playgroud)
例如:
if gre.match("foo(.+)", var):
# do something with gre.group(1)
elif gre.match("bar(.+)", var):
# do something with gre.group(1)
elif gre.match("baz(.+)", var):
# do something with gre.group(1)
Run Code Online (Sandbox Code Playgroud)
我建议这样做,因为它使用最少的正则表达式来实现你的目标.它仍然是功能代码,但不比你的旧Perl差.
import re
var = "barbazfoo"
m = re.search(r'(foo|bar|baz)(.+)', var)
if m.group(1) == 'foo':
print m.group(1)
# do something with m.group(1)
elif m.group(1) == "bar":
print m.group(1)
# do something with m.group(1)
elif m.group(1) == "baz":
print m.group(2)
# do something with m.group(2)
Run Code Online (Sandbox Code Playgroud)
开始Python 3.8
,并引入赋值表达式(PEP 572)(:=
运算符),我们现在可以捕获re.search(pattern, text)
变量match
中的条件值,以便检查它是否不是None
,然后在条件体中重新使用它:
if match := re.search(r'foo(.+)', text):
# do something with match.group(1)
elif match := re.search(r'bar(.+)', text):
# do something with match.group(1)
elif match := re.search(r'baz(.+)', text)
# do something with match.group(1)
Run Code Online (Sandbox Code Playgroud)
感谢这个其他的SO问题:
import re
class DataHolder:
def __init__(self, value=None, attr_name='value'):
self._attr_name = attr_name
self.set(value)
def __call__(self, value):
return self.set(value)
def set(self, value):
setattr(self, self._attr_name, value)
return value
def get(self):
return getattr(self, self._attr_name)
string = u'test bar 123'
save_match = DataHolder(attr_name='match')
if save_match(re.search('foo (\d+)', string)):
print "Foo"
print save_match.match.group(1)
elif save_match(re.search('bar (\d+)', string)):
print "Bar"
print save_match.match.group(1)
elif save_match(re.search('baz (\d+)', string)):
print "Baz"
print save_match.match.group(1)
Run Code Online (Sandbox Code Playgroud)