正则表达式替换(在Python中) - 一种更简单的方法?

Eva*_*ark 43 python regex

每当我想要替换一段文本时,我总是要做以下事情:

"(?P<start>some_pattern)(?P<replace>foo)(?P<end>end)"
Run Code Online (Sandbox Code Playgroud)

然后将该start组与新数据连接起来replace,然后将该end组连接起来.

有更好的方法吗?

小智 105

>>> import re
>>> s = "start foo end"
>>> s = re.sub("foo", "replaced", s)
>>> s
'start replaced end'
>>> s = re.sub("(?<= )(.+)(?= )", lambda m: "can use a callable for the %s text too" % m.group(1), s)
>>> s
'start can use a callable for the replaced text too end'
>>> help(re.sub)
Help on function sub in module re:

sub(pattern, repl, string, count=0)
    Return the string obtained by replacing the leftmost
    non-overlapping occurrences of the pattern in string by the
    replacement repl.  repl can be either a string or a callable;
    if a callable, it's passed the match object and must return
    a replacement string to be used.
Run Code Online (Sandbox Code Playgroud)


zen*_*azn 18

查看Python re文档中的lookaheads (?=...)和lookbehinds (?<=...)- 我很确定它们就是你想要的.它们匹配字符串,但不"消耗"它们匹配的字符串的位.


Ben*_*ank 11

简短的版本是你不能使用 Python的re模块在lookbehinds中使用可变宽度模式.没有办法改变这个:

>>> import re
>>> re.sub("(?<=foo)bar(?=baz)", "quux", "foobarbaz")
'fooquuxbaz'
>>> re.sub("(?<=fo+)bar(?=baz)", "quux", "foobarbaz")

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    re.sub("(?<=fo+)bar(?=baz)", "quux", string)
  File "C:\Development\Python25\lib\re.py", line 150, in sub
    return _compile(pattern, 0).sub(repl, string, count)
  File "C:\Development\Python25\lib\re.py", line 241, in _compile
    raise error, v # invalid expression
error: look-behind requires fixed-width pattern
Run Code Online (Sandbox Code Playgroud)

这意味着你需要解决它,最简单的解决方案与你现在正在做的非常相似:

>>> re.sub("(fo+)bar(?=baz)", "\\1quux", "foobarbaz")
'fooquuxbaz'
>>>
>>> # If you need to turn this into a callable function:
>>> def replace(start, replace, end, replacement, search):
        return re.sub("(" + re.escape(start) + ")" + re.escape(replace) + "(?=" + re.escape + ")", "\\1" + re.escape(replacement), search)
Run Code Online (Sandbox Code Playgroud)

这不具备外观解决方案的优雅,但它仍然是一个非常清晰,直接的单线程.如果你看看专家在这件事上有什么话要说(他说的是JavaScript,完全缺乏外观,但许多原则是相同的),你会发现他最简单的解决方案看起来很像这个.


Adr*_*ico 5

我认为最好的想法就是在组中捕获您想要替换的任何内容,然后使用捕获组的开始和结束属性来替换它。

\n\n

问候

\n\n

阿德里\xc3\xa1n

\n\n
#the pattern will contain the expression we want to replace as the first group\npat = "word1\\s(.*)\\sword2"   \ntest = "word1 will never be a word2"\nrepl = "replace"\n\nimport re\nm = re.search(pat,test)\n\nif m and m.groups() > 0:\n    line = test[:m.start(1)] + repl + test[m.end(1):]\n    print line\nelse:\n    print "the pattern didn\'t capture any text"\n
Run Code Online (Sandbox Code Playgroud)\n\n

这将打印:\n\'word1 will never be a word2\'

\n\n

要替换的组可以位于字符串的任何位置。

\n

  • @jae你仍然需要冒号,否则它不会拼接字符串。 (2认同)