你如何将这个正则表达式的习惯用法从Perl翻译成Python?

Dan*_*ski 45 python regex perl

大约一年前我从Perl切换到Python,并没有回头.我发现只有一个习惯用法在Perl中比在Python中更容易做到:

if ($var =~ /foo(.+)/) {
  # do something with $1
} elsif ($var =~ /bar(.+)/) {
  # do something with $1
} elsif ($var =~ /baz(.+)/) {
  # do something with $1
}
Run Code Online (Sandbox Code Playgroud)

相应的Python代码并不那么优雅,因为if语句不断嵌套:

m = re.search(r'foo(.+)', var)
if m:
  # do something with m.group(1)
else:
  m = re.search(r'bar(.+)', var)
  if m:
    # do something with m.group(1)
  else:
    m = re.search(r'baz(.+)', var)
    if m:
      # do something with m.group(2)
Run Code Online (Sandbox Code Playgroud)

有没有人有一种优雅的方式在Python中重现这种模式?我已经看过使用匿名函数调度表,但对于少数正则表达式来说,这些对我来说似乎有点笨拙......

Tho*_*ers 18

使用命名组和调度表:

r = re.compile(r'(?P<cmd>foo|bar|baz)(?P<data>.+)')

def do_foo(data):
    ...

def do_bar(data):
    ...

def do_baz(data):
    ...

dispatch = {
    'foo': do_foo,
    'bar': do_bar,
    'baz': do_baz,
}


m = r.match(var)
if m:
    dispatch[m.group('cmd')](m.group('data'))
Run Code Online (Sandbox Code Playgroud)

通过一些内省,您可以自动生成正则表达式和调度表.

  • 如果三个正则表达式不相同怎么办?喜欢/^foo(.*)/,/(.*)bar$/,/^(.*)baz(.*)$/? (2认同)

Pat*_*otz 10

是的,这有点烦人.也许这适用于您的情况.


import re

class ReCheck(object):
    def __init__(self):
        self.result = None
    def check(self, pattern, text):
        self.result = re.search(pattern, text)
        return self.result

var = 'bar stuff'
m = ReCheck()
if m.check(r'foo(.+)',var):
    print m.result.group(1)
elif m.check(r'bar(.+)',var):
    print m.result.group(1)
elif m.check(r'baz(.+)',var):
    print m.result.group(1)
Run Code Online (Sandbox Code Playgroud)

编辑: Brian正确地指出我的第一次尝试不起作用.不幸的是,这种尝试更长.


Mar*_*rot 10

r"""
This is an extension of the re module. It stores the last successful
match object and lets you access it's methods and attributes via
this module.

This module exports the following additional functions:
    expand  Return the string obtained by doing backslash substitution on a
            template string.
    group   Returns one or more subgroups of the match.
    groups  Return a tuple containing all the subgroups of the match.
    start   Return the indices of the start of the substring matched by
            group.
    end     Return the indices of the end of the substring matched by group.
    span    Returns a 2-tuple of (start(), end()) of the substring matched
            by group.

This module defines the following additional public attributes:
    pos         The value of pos which was passed to the search() or match()
                method.
    endpos      The value of endpos which was passed to the search() or
                match() method.
    lastindex   The integer index of the last matched capturing group.
    lastgroup   The name of the last matched capturing group.
    re          The regular expression object which as passed to search() or
                match().
    string      The string passed to match() or search().
"""

import re as re_

from re import *
from functools import wraps

__all__ = re_.__all__ + [ "expand", "group", "groups", "start", "end", "span",
        "last_match", "pos", "endpos", "lastindex", "lastgroup", "re", "string" ]

last_match = pos = endpos = lastindex = lastgroup = re = string = None

def _set_match(match=None):
    global last_match, pos, endpos, lastindex, lastgroup, re, string
    if match is not None:
        last_match = match
        pos = match.pos
        endpos = match.endpos
        lastindex = match.lastindex
        lastgroup = match.lastgroup
        re = match.re
        string = match.string
    return match

@wraps(re_.match)
def match(pattern, string, flags=0):
    return _set_match(re_.match(pattern, string, flags))


@wraps(re_.search)
def search(pattern, string, flags=0):
    return _set_match(re_.search(pattern, string, flags))

@wraps(re_.findall)
def findall(pattern, string, flags=0):
    matches = re_.findall(pattern, string, flags)
    if matches:
        _set_match(matches[-1])
    return matches

@wraps(re_.finditer)
def finditer(pattern, string, flags=0):
    for match in re_.finditer(pattern, string, flags):
        yield _set_match(match)

def expand(template):
    if last_match is None:
        raise TypeError, "No successful match yet."
    return last_match.expand(template)

def group(*indices):
    if last_match is None:
        raise TypeError, "No successful match yet."
    return last_match.group(*indices)

def groups(default=None):
    if last_match is None:
        raise TypeError, "No successful match yet."
    return last_match.groups(default)

def groupdict(default=None):
    if last_match is None:
        raise TypeError, "No successful match yet."
    return last_match.groupdict(default)

def start(group=0):
    if last_match is None:
        raise TypeError, "No successful match yet."
    return last_match.start(group)

def end(group=0):
    if last_match is None:
        raise TypeError, "No successful match yet."
    return last_match.end(group)

def span(group=0):
    if last_match is None:
        raise TypeError, "No successful match yet."
    return last_match.span(group)

del wraps  # Not needed past module compilation
Run Code Online (Sandbox Code Playgroud)

例如:

if gre.match("foo(.+)", var):
  # do something with gre.group(1)
elif gre.match("bar(.+)", var):
  # do something with gre.group(1)
elif gre.match("baz(.+)", var):
  # do something with gre.group(1)
Run Code Online (Sandbox Code Playgroud)

  • 这种方法的问题在于你只有一个全局"最后一个匹配".从多个线程中使用该模块将会破坏它,就像在信号处理程序中使用'gre'模块或从上面的'if'主体调用的代码一样.使用它时要特别小心,如果你坚持使用它. (11认同)

Jac*_* M. 9

我建议这样做,因为它使用最少的正则表达式来实现你的目标.它仍然是功能代码,但不比你的旧Perl差.

import re
var = "barbazfoo"

m = re.search(r'(foo|bar|baz)(.+)', var)
if m.group(1) == 'foo':
    print m.group(1)
    # do something with m.group(1)
elif m.group(1) == "bar":
    print m.group(1)
    # do something with m.group(1)
elif m.group(1) == "baz":
    print m.group(2)
    # do something with m.group(2)
Run Code Online (Sandbox Code Playgroud)


Xav*_*hot 7

开始Python 3.8,并引入赋值表达式(PEP 572):=运算符),我们现在可以捕获re.search(pattern, text)变量match中的条件值,以便检查它是否不是None,然后在条件体中重新使用它:

if match := re.search(r'foo(.+)', text):
  # do something with match.group(1)
elif match := re.search(r'bar(.+)', text):
  # do something with match.group(1)
elif match := re.search(r'baz(.+)', text)
  # do something with match.group(1)
Run Code Online (Sandbox Code Playgroud)

  • 10 年后,将其替换为公认的答案 - 万岁!这个_确切的例子_(在`elif`s中重复正则表达式搜索)实际上在PEP中被引用:https://www.python.org/dev/peps/pep-0572/#capturing-condition-values (2认同)

Cra*_*een 6

感谢这个其他的SO问题:

import re

class DataHolder:
    def __init__(self, value=None, attr_name='value'):
        self._attr_name = attr_name
        self.set(value)
    def __call__(self, value):
        return self.set(value)
    def set(self, value):
        setattr(self, self._attr_name, value)
        return value
    def get(self):
        return getattr(self, self._attr_name)

string = u'test bar 123'
save_match = DataHolder(attr_name='match')
if save_match(re.search('foo (\d+)', string)):
    print "Foo"
    print save_match.match.group(1)
elif save_match(re.search('bar (\d+)', string)):
    print "Bar"
    print save_match.match.group(1)
elif save_match(re.search('baz (\d+)', string)):
    print "Baz"
    print save_match.match.group(1)
Run Code Online (Sandbox Code Playgroud)