Python正则表达式找到了

Question

Python正则表达式找到了

我试图在Python 2.7.2中使用正则表达式从字符串中提取所有出现的标记词.或者简单地说,我想提取[p][/p]标签内的每一段文字.这是我的尝试:

regex = ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(pattern, line)

Run Code Online (Sandbox Code Playgroud)

印刷person生产['President [P]', '[/P]', '[P] Bill Gates [/P]']

什么是正确的正则表达式:['[P] Barack Obama [/P]', '[P] Bill Gates [/p]'] 或['Barrack Obama', 'Bill Gates'].

谢谢.:)

Answer 1

unu*_*tbu 70

import re
regex = ur"\[P\] (.+?) \[/P\]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(regex, line)
print(person)

Run Code Online (Sandbox Code Playgroud)

产量

['Barack Obama', 'Bill Gates']

Run Code Online (Sandbox Code Playgroud)

正则表达式ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?"与unicode完全相同,u'[[1P].+?[/P]]+?'除非难以阅读.

第一个括号组[[1P]告诉重新任何列表中的字符['[', '1', 'P']应匹配,并且同样与第二组括号[/P]]你想在所有的东西.那不是.所以,

卸下外部封闭的方括号.(同时删除1前面的迷路P.)
要保护文字括号[P],请用反斜杠转义括号:\[P\].
要仅返回标记内的单词,请将分组括起来.+?.

Answer 2

Fai*_*Dev 14

试试这个 :

   for match in re.finditer(r"\[P[^\]]*\](.*?)\[/P\]", subject):
        # match start: match.start()
        # match end (exclusive): match.end()
        # matched text: match.group()

Run Code Online (Sandbox Code Playgroud)

归档时间：	13 年，10 月前
查看次数：	173742 次
最近记录：	9 年，1 月前