我有一个单词列表,
我正在根据这个单词列表创建一个正则表达式对象列表
import re
word = 'This is word of spy++'
wl = ['spy++','cry','fpp']
regobjs = [re.compile(r"\b%s\b" % word.lower() ) for word in wl]
for reobj in regobjs:
print re.search(regobj, word).group()
Run Code Online (Sandbox Code Playgroud)
但是我(error: multiple repeat)在创建正则表达式objs 时遇到错误因为符号++我如何使正则表达式处理单词列表中所有单词的情况?
requirements:
regex should detect the exact word from the given text
even if the word having non alpha numeric chars like (++) above code detect the exact words except those having ++ char.
Run Code Online (Sandbox Code Playgroud)
此外,re.escape()您还需要\b在非字母数字字符之前/之后删除单词边界,否则匹配将失败.
像这样的东西(不是很优雅,但我希望它能说明问题):
import re
words = 'This is word of spy++'
wl = ['spy++','cry','fpp']
regobjs = []
for word in wl:
eword = re.escape(word.lower())
if eword[0].isalnum() or eword[0]=="_":
eword = r"\b" + eword
if eword[-1].isalnum() or eword[-1]=="_":
eword = eword + r"\b"
regobjs.append(re.compile(eword))
for regobj in regobjs:
print re.search(regobj, words).group()
Run Code Online (Sandbox Code Playgroud)