thi*_*ool 1 python regex pattern-matching
当我使用下面的python正则表达式执行下面描述的功能时,我得到错误意外的结束模式.
正则表达式:
modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)
(CODE[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
Run Code Online (Sandbox Code Playgroud)
这个正则表达式的目的:
INPUT:
CODE876
CODE223
matchjustCODE657
CODE69743
code876
testing1CODE888
example2CODE098
http://replaced/CODE665
Run Code Online (Sandbox Code Playgroud)
应该匹配:
CODE876
CODE223
CODE657
CODE697
Run Code Online (Sandbox Code Playgroud)
并用.替换出现次数
http://productcode/CODE876
http://productcode/CODE223
matchjusthttp://productcode/CODE657
http://productcode/CODE69743
Run Code Online (Sandbox Code Playgroud)
不匹配:
code876
testing1CODE888
testing2CODE776
example3CODE654
example2CODE098
http://replaced/CODE665
Run Code Online (Sandbox Code Playgroud)
最终产出
http://productcode/CODE876
http://productcode/CODE223
matchjusthttp://productcode/CODE657
http://productcode/CODE69743
code876
testing1CODE888
example2CODE098
http://replaced/CODE665
Run Code Online (Sandbox Code Playgroud)
编辑和更新1
modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(CODE[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
Run Code Online (Sandbox Code Playgroud)
错误不再发生.但这与所需的任何模式都不匹配.匹配组或匹配本身是否存在问题.因为当我编译这个正则表达式时,我得不到我的输入.
编辑和更新2
f=open("/Users/mymac/Desktop/regex.txt")
s=f.read()
s1 = re.sub(r'((?!http://|testing[0-9]|example[0-9]).*?)(CODE[0-9]{3})(?!</a>)',
r'\g<1><a href="http://productcode/\g<2>">\g<2></a>', s)
print s1
Run Code Online (Sandbox Code Playgroud)
INPUT
CODE123 CODE765 testing1CODE123 example1CODE345 http://www.coding.com/CODE333 CODE345
CODE234
CODE333
Run Code Online (Sandbox Code Playgroud)
OUTPUT
<a href="http://productcode/CODE123">CODE123</a> <a href="http://productcode/CODE765">CODE765</a> testing1<a href="http://productcode/CODE123">CODE123</a> example1<a href="http://productcode/CODE345">CODE345</a> http://www.coding.com/<a href="http://productcode/CODE333">CODE333</a> <a href="http://productcode/CODE345">CODE345</a>
<a href="http://productcode/CODE234">CODE234</a>
<a href="http://productcode/CODE333">CODE333</a>
Run Code Online (Sandbox Code Playgroud)
正则表达式适用于Raw输入,但不适用于来自文本文件的字符串输入.
有关更多结果,请参阅输入4和5 http://ideone.com/3w1E3
你的主要问题是关于(?-i)Python 2.7和3.2的一厢情愿的想法.有关详细信息,请参阅下文.
import re
# modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)
# (CODE[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
# observation 1: as presented, pattern has a line break in the middle, just after (?-i)
# ob 2: rather hard to read, should use re.VERBOSE
# ob 3: not obvious whether it's a complile-time or run-time problem
# ob 4: (?i) should be at the very start of the pattern (see docs)
# ob 5: what on earth is (?-i) ... not in 2.7 docs, not in 3.2 docs
pattern = r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i)(CODE[0-9]{3})(?!</a>)'
#### rx = re.compile(pattern)
# above line failed with "sre_constants.error: unexpected end of pattern"
# try without the (?-i)
pattern2 = r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(CODE[0-9]{3})(?!</a>)'
rx = re.compile(pattern2)
# This works, now you need to work on observations 1 to 4,
# and rethink your CODE/code strategy
Run Code Online (Sandbox Code Playgroud)
看起来建议充耳不闻......以下是re.VERBOSE格式的模式:
pattern4 = r'''
^
(?i)
(
(?:
(?!http://)
(?!testing[0-9])
(?!example[0-9])
. #### what is this for?
)*?
) ##### end of capturing group 1
(CODE[0-9]{3}) #### not in capturing group 1
(?!</a>)
'''
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6431 次 |
| 最近记录: |