模式的意外结束:Python Regex

thi*_*ool 1 python regex pattern-matching

当我使用下面的python正则表达式执行下面描述的功能时,我得到错误意外的结束模式.

正则表达式:

modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)
(CODE[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
Run Code Online (Sandbox Code Playgroud)

这个正则表达式的目的:

INPUT:

CODE876
CODE223
matchjustCODE657
CODE69743
code876
testing1CODE888
example2CODE098
http://replaced/CODE665
Run Code Online (Sandbox Code Playgroud)

应该匹配:

CODE876
CODE223
CODE657
CODE697
Run Code Online (Sandbox Code Playgroud)

并用.替换出现次数

http://productcode/CODE876
http://productcode/CODE223
matchjusthttp://productcode/CODE657
http://productcode/CODE69743
Run Code Online (Sandbox Code Playgroud)

不匹配:

code876
testing1CODE888
testing2CODE776
example3CODE654
example2CODE098
http://replaced/CODE665
Run Code Online (Sandbox Code Playgroud)

最终产出

http://productcode/CODE876
http://productcode/CODE223
matchjusthttp://productcode/CODE657
http://productcode/CODE69743
code876
testing1CODE888
example2CODE098
http://replaced/CODE665
Run Code Online (Sandbox Code Playgroud)

编辑和更新1

modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(CODE[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
Run Code Online (Sandbox Code Playgroud)

错误不再发生.但这与所需的任何模式都不匹配.匹配组或匹配本身是否存在问题.因为当我编译这个正则表达式时,我得不到我的输入.

编辑和更新2

f=open("/Users/mymac/Desktop/regex.txt")
s=f.read()

s1 = re.sub(r'((?!http://|testing[0-9]|example[0-9]).*?)(CODE[0-9]{3})(?!</a>)', 
            r'\g<1><a href="http://productcode/\g<2>">\g<2></a>', s)
print s1
Run Code Online (Sandbox Code Playgroud)

INPUT

CODE123 CODE765 testing1CODE123 example1CODE345 http://www.coding.com/CODE333 CODE345

CODE234

CODE333
Run Code Online (Sandbox Code Playgroud)

OUTPUT

<a href="http://productcode/CODE123">CODE123</a> <a href="http://productcode/CODE765">CODE765</a> testing1<a href="http://productcode/CODE123">CODE123</a> example1<a href="http://productcode/CODE345">CODE345</a> http://www.coding.com/<a href="http://productcode/CODE333">CODE333</a> <a href="http://productcode/CODE345">CODE345</a>

<a href="http://productcode/CODE234">CODE234</a>

<a href="http://productcode/CODE333">CODE333</a>
Run Code Online (Sandbox Code Playgroud)

正则表达式适用于Raw输入,但不适用于来自文本文件的字符串输入.

有关更多结果,请参阅输入4和5 http://ideone.com/3w1E3

Joh*_*hin 5

你的主要问题是关于(?-i)Python 2.7和3.2的一厢情愿的想法.有关详细信息,请参阅下文.

import re
# modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)
# (CODE[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
# observation 1: as presented, pattern has a line break in the middle, just after (?-i)
# ob 2: rather hard to read, should use re.VERBOSE
# ob 3: not obvious whether it's a complile-time or run-time problem
# ob 4: (?i) should be at the very start of the pattern (see docs)
# ob 5: what on earth is (?-i) ... not in 2.7 docs, not in 3.2 docs
pattern = r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)(CODE[0-9]{3})(?!</a>)'
#### rx = re.compile(pattern)
# above line failed with "sre_constants.error: unexpected end of pattern"
# try without the (?-i)
pattern2 = r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(CODE[0-9]{3})(?!</a>)'
rx = re.compile(pattern2)
# This works, now you need to work on observations 1 to 4,
# and rethink your CODE/code strategy
Run Code Online (Sandbox Code Playgroud)

看起来建议充耳不闻......以下是re.VERBOSE格式的模式:

pattern4 = r'''
    ^
    (?i)
    (
        (?:
            (?!http://)
            (?!testing[0-9])
            (?!example[0-9])
            . #### what is this for?
        )*?
    ) ##### end of capturing group 1
    (CODE[0-9]{3}) #### not in capturing group 1
    (?!</a>)
    '''
Run Code Online (Sandbox Code Playgroud)

  • @thinkcool:这个答案正确回答了你问的问题.不值得*至少*一个upvote? (2认同)